K3s DNS 故障排除:Cloudflared 重啟
當關閉 Tailscale 後,K3s 的 Cloudflared 突然陷入 CrashLoopBackOff?本文記錄了如何從 DNS 解析錯誤中,揪出 systemd-resolved 127.0.0.53 導致 CoreDNS 產生無限迴圈 (Loop) 的根因與解決方案。
1. 故障現象 (Symptom)
關閉 tailscale 並重啟 k3s 後,發現 cloudflared Pod 會持續 CrashLoopBackOff,並拋出以下錯誤:
ERR Failed to fetch features, error="lookup cfd-features.argotunnel.com on 10.43.0.10:53: server misbehaving"2. 深入排查
透過檢查 CoreDNS 的 log, DNS 解析確實失敗了但為什麼?
DNS 解析流程
- k3s 內部 dns
- 自定義 host
/etc/resolv.conf
那最後這個 /etc/resolv.conf 怎麼來的
- CoreDNS pod yaml 裡面可以看到這樣一行
dnsPolicy: Default查閱 k8s 文件 會發現這個 policy 就是直接繼承 host 的/etc/reslov.conf - Host
/etc/reslov.conf
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
# Do not edit.
#
# This file might be symlinked as /etc/resolv.conf. If you're looking at
# /etc/resolv.conf and seeing this text, you have followed the symlink.
#
# This is a dynamic resolv.conf file for connecting local clients to the
# internal DNS stub resolver of systemd-resolved. This file lists all
# configured search domains.
#
# Run "resolvectl status" to see details about the uplink DNS servers
# currently in use.
#
# Third party programs should typically not access this file directly, but only
# through the symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a
# different way, replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.
nameserver 127.0.0.53? 為什麼指向 127.0.0.53
- ubuntu 有另一個服務
systemd-resolved會在你與真正的 dns 中間做解析(Stub 模式)
那這樣的設定檔進到 CoreDNS 後會怎麼樣
[FATAL] plugin/loop: Loop (127.0.0.1:55281 -> :53) detected for zone "."
為什麼會產生 Loop?
- 剛剛有提到解析流程最後一跳 CoreDNS 發現是自己
127.0.0.53 是甚麼
127.0.0.0/8 - This block is assigned for use as the Internet host
loopback address. A datagram sent by a higher-level protocol to an
address anywhere within this block loops back inside the host. This
is ordinarily implemented using only 127.0.0.1/32 for loopback. As
described in [RFC1122], Section 3.2.1.3, addresses within the entire
127.0.0.0/8 block do not legitimately appear on any network anywhere.
3. 為什麼關掉 Tailscale 就炸了?
- Tailscale 啟動時,會接管
/etc/resolv.conf裡面的 nameserver 會是100.100.100.100 - 當關掉之後,
/etc/resolv.conf就變成上面的那個樣子了
4. 解決方案 (Solutions)
- 將宿主機的 將
/etc/resolv.conf指向systemd-resolved真實的外部上游設定檔
sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
- 修改 CoreDNS ConfigMap (持久化!)
還沒改過
- 修改 K3s 啟動參數
在 K3s 啟動命令中加入 --resolv-conf 參數,指定一個不含 127.0.0.53 的檔案路徑。
REF:
pod 的 /etc/resolv.conf 生成机制
从源码角度来看 pod 的 /etc/resolv.conf 生成机制
DNS for Services and Pods
Your workload can discover Services within your cluster using DNS; this page explains how that works.

