Kubernetes: NodeLocal DNS and the “lookup istiod.istio-system.svc on lookup: no such host” error

By | 04/19/2021
 

In our Deployments, we are using the NodeLocal DNS as a local DNS cache to reduce requests number to the AWS VPC DNS, see the Kubernetes: load-testing and high-load tuning – problems and solutions for details.

Currently, a manifest looks like the next:

...
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 169.254.20.10
...

The problem is, that when running an Istio sidecar container, namely the istio-proxy, it can not resolve the istiod.istio-system.svc to retrieve config from a central Istio’s Pilot agent:

kk -n test-ns logs test-deploy-dns-f8f5659b5-jpx8g istio-proxy
...
2021-03-29T11:46:44.939174Z     warn    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 0 successful, 0 rejected; lds updates: 0 successful, 0 rejected
2021-03-29T11:46:45.409507Z     warn    sds     failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 169.254.20.10:53: no such host"
...

To solve it, add the searches to the dnsConfig with a Kubernetes’ cluster DNS name, see Pod’s DNS Config and Namespaces of Services:

...
      dnsPolicy: "None"
      dnsConfig:
        nameservers:
          - 169.254.20.10
        searches:
          - cluster.local
...

Re-deploy, and now istio-proxy up and running:

kk -n test-ns logs -f test-deploy-dns-5448c4996d-8m5rk istio-proxy
...
2021-03-29T12:44:28.853178Z     info    Proxy role      ips=[10.22.46.129] type=sidecar id=test-deploy-dns-5448c4996d-8m5rk.test-ns domain=test-ns.svc.cluster.local
2021-03-29T12:44:28.853187Z     info    JWT policy is third-party-jwt
2021-03-29T12:44:28.853198Z     info    Pilot SAN: [istiod.istio-system.svc]
2021-03-29T12:44:28.853206Z     info    CA Endpoint istiod.istio-system.svc:15012, provider Citadel
2021-03-29T12:44:28.853244Z     info    Using CA istiod.istio-system.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2021-03-29T12:44:28.853341Z     info    citadelclient   Citadel client using custom root cert: istiod.istio-system.svc:15012
2021-03-29T12:44:28.886213Z     info    ads     All caches have been synced up in 35.295685ms, marking server ready
2021-03-29T12:44:28.886528Z     info    sds     SDS server for workload certificates started, listening on "./etc/istio/proxy/SDS"
2021-03-29T12:44:28.886554Z     info    xdsproxy        Initializing with upstream address "istiod.istio-system.svc:15012" and cluster "Kubernetes"
2021-03-29T12:44:28.886943Z     info    Starting proxy agent
2021-03-29T12:44:28.887121Z     info    sds     Start SDS grpc server
2021-03-29T12:44:28.887211Z     info    Opening status port 15020
2021-03-29T12:44:28.887359Z     info    Received new config, creating new Envoy epoch 0
2021-03-29T12:44:28.887407Z     info    Epoch 0 starting
2021-03-29T12:44:28.892991Z     info    Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster test-app-dns.test-ns --service-node sidecar~10.22.46.129~test-deploy-dns-5448c4996d-8m5rk.test-ns~test-ns.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format %Y-%m-%dT%T.%fZ    %l      envoy %n        %v -l warning --component-log-level misc:error --concurrency 2]

Done.