kubernetes集群中夺命的5秒DNS延迟

如果在 公众号 文章发现状态为 已更新, 建议点击 查看原文 查看最新内容。

状态: 未更新

原文链接: https://typonotes.com/posts/2023/08/05/k8s-dns-5s-resolv/

kubernetes集群中夺命的5秒DNS延迟

问题原因

相关文章

kubernetes集群中夺命的5秒DNS延迟 破案:Kubernetes/Docker 上无法解释的连接超时

Weave works的工程师 Martynas Pumputis 对这个问题做了很详细的分析: https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

+ `conntract`: http://people.netfilter.org/pablo/docs/login.pdf

本质原因

DNS client (glibc 或 musl libc) 会并发请求 A 和 AAAA 记录,跟 DNS Server 通信自然会先 connect (建立 fd),后面请求报文使用这个 fd 来发送,由于 UDP 是无状态协议, connect 时并不会发包,也就不会创建 conntrack 表项, 而并发请求的 A 和 AAAA 记录默认使用同一个 fd 发包,send 时各自发的包它们源 Port 相同(因为用的同一个 socket 发送),当并发发包时,两个包都还没有被插入 conntrack 表项,所以 netfilter 会为它们分别创建 conntrack 表项,而集群内请求 kube-dns 或 coredns 都是访问的 CLUSTER-IP,报文最终会被 DNAT 成一个 endpoint 的 POD IP,当两个包恰好又被 DNAT 成同一个 POD IP 时,它们的五元组就相同了,在最终插入的时候后面那个包就会被丢掉,如果 dns 的 pod 副本只有一个实例的情况就很容易发生(始终被 DNAT 成同一个 POD IP),现象就是 dns 请求超时,client 默认策略是等待 5s 自动重试,如果重试成功,我们看到的现象就是 dns 请求有 5s 的延时。

划重点 1. AAAAA 2. 并发冲突

解决方式

通过 dnsOption 参数 解决并发解析

https://github.com/Azure/AKS/issues/667 https://studygolang.com/articles/25303

dnsConfig:  
  options:  
    - name: single-request-reopen

https://github.com/Azure/AKS/issues/667#issuecomment-425821085

CoreDNS 禁用 AAAA ipv6 解析

1. [k8s与dns--coredns的一些实战经验 - kubernetes solutions - SegmentFault 思否](https://segmentfault.com/a/1190000020403096)
2. [coredns plugin template](https://coredns.io/plugins/template/)

使用 NodeLocalDNS Cache 实现本地缓存

1.  [设置 NodeLocal DNSCache  |  Kubernetes Engine 文档  |  Google Cloud](https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache?hl=zh-cn)
2. https://cloud.google.com/kubernetes-engine/docs/concepts/service-discovery?hl=zh-cn
3. [在 Kubernetes 集群中使用 NodeLocal DNSCache-阳明的博客|Kubernetes|Istio|Prometheus|Python|Golang|云原生](https://www.qikqiak.com/post/use-nodelocal-dns-cache/)
	+ 但是到这里还没有完,如果 kube-proxy 组件使用的是 ipvs 模式的话我们还需要修改 kubelet 的 —cluster-dns 参数,将其指向 169.254.20.10,Daemonset 会在每个节点创建一个网卡来绑这个 IP,Pod 向本节点这个 IP 发 DNS 请求,缓存没有命中的时候才会再代理到上游集群 DNS 进行查询。 iptables 模式下 Pod 还是向原来的集群 DNS 请求,节点上有这个 IP 监听,会被本机拦截,再请求集群上游 DNS,所以不需要更改 —cluster-dns 参数

coredns.yaml

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# __MACHINE_GENERATED_WARNING__

apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
  labels:
      kubernetes.io/cluster-service: "true"
      addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
    addonmanager.kubernetes.io/mode: Reconcile
  name: system:coredns
rules:
- apiGroups:
  - ""
  resources:
  - endpoints
  - services
  - pods
  - namespaces
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
    addonmanager.kubernetes.io/mode: EnsureExists
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
- kind: ServiceAccount
  name: coredns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
  labels:
      addonmanager.kubernetes.io/mode: EnsureExists
data:
  Corefile: |
    pek3.qingcloud.com:53 {
      forward . 100.64.9.5 100.64.9.9
      # log
      # errors
      # cache 300 ## https://coredns.io/plugins/cache/
      # cancel 1s  ## https://coredns.io/plugins/cancel/

      #### working
      # loop
      # reload 5s
    }

    qingstor.com:53 {
      forward . 100.64.9.5 100.64.9.9
      # log
      # errors
      # loop
      # reload 5s
    }

    .:53 {

        # https://coredns.io/plugins/template/
        ## 遇到 ipv6 解析 立即返回找不到
        ## NXNOMAIN 意味着 「查找正确, 但无解析记录」
        ## alpine3.13 之后, 默认优先 ipv6。 会出现解析错误。
        # template ANY AAAA {
        #     rcode NXDOMAIN
        # }

        #### debug
        # whoami
        # errors
        # log . {"local":"{local}","client":"{remote}:{port}","id":"{>id}","type":"{type}","class":"{class}","name":"{name}","proto":"{proto}","size":{size},"do":"{>do}","bufsize":{>bufsize},"rflags":"{>rflags}","rsize":{rsize},"duration":"{duration}","rcode":"{rcode}"}
        # log

        #### working
        # loop
        reload 5s

        #### health check and performance
        health
        ready
        prometheus :9153

        kubernetes cluster.local. in-addr.arpa ip6.arpa {
            # https://coredns.io/plugins/kubernetes/
            pods insecure
            upstream
            fallthrough in-addr.arpa ip6.arpa
            ttl 1
        }

        # hosts in line
        # specify global hosts for containers
        # hosts {
        #     1.1.1.1 a.example.com
        #     2.2.2.2 b.example.com

        #     fallthrough # must be keep at bottom
        # }

        #### resolver
        # 如果是 ubuntu18.04 使用 /etc/resolv.conf 会出现无法解析的问题
        ## 因为 nameserver 127.0.0.53 
        forward . 114.114.114.114 
        # forward . /etc/resolv.conf
      
        # loadbalance ## https://coredns.io/plugins/loadbalance/

        #### cache
        cache 60   ## https://coredns.io/plugins/cache/
        cancel 1s  ## https://coredns.io/plugins/cancel/
    
    }    

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "CoreDNS"
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
      annotations:
        seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
    spec:
      priorityClassName: system-cluster-critical
      serviceAccountName: coredns
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      nodeSelector:
        beta.kubernetes.io/os: linux
      containers:
      - name: coredns
        image: coredns/coredns:1.6.2
        imagePullPolicy: IfNotPresent
        resources:
          limits:
            memory: 300Mi
          requests:
            cpu: 300m
            memory: 300Mi
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
          readOnly: true
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - all
          readOnlyRootFilesystem: true
      dnsPolicy: Default
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
            - key: Corefile
              path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    kubernetes.io/name: "CoreDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.66.0.2
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: metrics
    port: 9153
    protocol: TCP

其他文章

  1. 灵雀云的专家工程师刘梦馨,在《蓝鲸 X DeepFlow 可观测性 Meetup》 中的分享实录 记一次持续三个月的 K8s DNS 排障过程