Arch Linux: a DNS Mystery – VPN, systemd-resolved, and Unbound
0 (0)

By | 05/21/2026
Click to rate this post!
[Total: 0 Average: 0]

I’d been wrestling with the problem of accessing AWS EKS from the office for a long time – finally lost my patience and figured it out 🙂

Here’s the problem: there’s an AWS EKS cluster with both Public and Private endpoints for the API.

Working from my office laptop, sometimes requests to it go through fine – and sometimes they die with an “i/o timeout” error:

$ kk get pod
[...] Get \"https://F07***D78.gr7.us-east-1.eks.amazonaws.com/api?timeout=32s\": dial tcp 10.0.64.9:443: i/o timeout"
...

Let’s go digging – because there are nuances here both with DNS and with network routes.

AWS VPC DNS and my VPNs

In my case EKS has both Public and Private endpoints enabled – so DNS resolution uses split-horizon DNS:

  • for a request from the “public internet” AWS VPC DNS returns a Public IP
  • for a request from inside the VPC – it’ll be a Private IP

Next: I have two active VPN connections + the office WiFi, and the problem starts when I add AWS VPC DNS to resolv.conf, because:

  • there’s the project’s Pritunl/OpenVPN and there are AWS domains that need to resolve through 10.0.0.2 – AWS VPC DNS
  • there’s my own WireGuard and domains that need to resolve from my home MikroTik through 10.100.0.1 (see MikroTik: setting up WireGuard and connecting Linux peers)
  • and there are just public DNS zones that need to resolve through 1.1.1.1

In resolv.conf it looks like this:

nameserver 1.1.1.1      # CloudFlare DNS, returns EKS Endpoint Public IP
nameserver 10.100.0.1   # my MikroTik with WireGuard, returns EKS Endpoint Public IP
nameserver 10.0.0.2     # AWS VPC DNS via OpenVPN, returns EKS Endpoint Private IP EKS 10.0.64.9

The file is managed by openresolv, which WireGuard launches when the tunnel starts – WireGuard sets its own DNS:

$ sudo cat /etc/wireguard/wg0.conf
...
DNS = 10.100.0.1, 10.0.0.2, 192.168.0.1
...

In the timeout error we can see that the request to F07***D78.gr7.us-east-1.eks.amazonaws.com goes to IP 10.0.64.9 – meaning DNS resolution went through OpenVPN and AWS VPC DNS 10.0.0.2.

Linux DNS and systemd-resolved

Let’s check who’s actually responsible for DNS in the system – grep the /etc/nsswitch.conf file:

$ grep hosts /etc/nsswitch.conf
hosts: mymachines resolve [!UNAVAIL=return] files myhostname dns

Here the resolve option means using the nss-resolve module over D-Bus to systemd-resolved.

And it comes first, before the files parameters (nss-files and /etc/hosts) and dns (the nss-dns module and the “classic” glibc DNS resolver) – so requests go to systemd-resolved first.

See Domain name resolution on the Arch Wiki.

systemd-resolved, DNS resolution and network interfaces

Now the interesting part – exactly how systemd-resolved performs DNS resolution.

systemd-resolved uses openresolv – let’s look at its parameters:

$ resolvectl status
Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: foreign
  Current DNS Server: 10.0.0.2
         DNS Servers: 10.0.0.2 10.100.0.1 192.168.0.1
...

Next, let’s check what’s happening in the system – enable the debug log for openresolv:

$ sudo resolvectl log-level debug

Then in one window we open the logs:

$ sudo journalctl -u systemd-resolved -f | grep "F07***D78.gr7.us-east-1.eks.amazonaws.com"

Run kubectl get pod – and in the logs we see:

...
May 20 12:06:39 setevoy-office systemd-resolved[698]: varlink-28-28: Received message: {"method":"io.systemd.Resolve.ResolveHostname","parameters":{"name":"F07***D78.gr7.us-east-1.eks.amazonaws.com","flags":0,"ifindex":0}}
...
May 20 12:06:39 setevoy-office systemd-resolved[698]: varlink-28-28: Sending message: {"parameters":{"addresses":[{"ifindex":6,"family":2,"address":[10,0,64,9]},{"ifindex":6,"family":2,"address":[10,0,65,205]}],"name":"F07***D78.gr7.us-east-1.eks.amazonaws.com","flags":1048577}}

Here:

  • Received message: the request came in with the parameter "ifindex":0 – “don’t care where to look
  • Sending message: the response came back through "ifindex":6 – that’s tun0, OpenVPN and AWS VPC DNS

Let’s check the interfaces:

$ ip -o link | awk -F': ' '{print $1, $2}'
1 lo
2 enp0s31f6
4 wlan0
5 enp0s13f0u3u4u4
6 tun0
...

"ifindex":6 is the tun0 interface, the work OpenVPN, and the result returned from AWS VPC DNS – "address":[10,0,64,9], because AWS VPC DNS returns a private address.

We repeat the request – and now the result is different:

...
varlink-28-28: Sending message: {"parameters":{"addresses":[{"ifindex":4,"family":2,"address":[44,216,7,46]},{"ifindex":4,"family":2,"address":[3,***,***,161]}],"name":"F07***D78.gr7.us-east-1.eks.amazonaws.com","flags":8388609}}
...

This time the response is from ifindex":4wlan0, and we get a public IP.

Why – because in the same log we see:

...
Firing regular transaction 49587 ... IN A> scope dns on */* 
Firing regular transaction 59798 ... IN A> scope dns on wlan0/*
...

Here the first entry is a request through the global pool, to all the servers in it:

 $ resolvectl status
Global
           Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
    resolv.conf mode: foreign
  Current DNS Server: 10.0.0.2
         DNS Servers: 10.0.0.2 10.100.0.1 192.168.0.1
...

And the result comes back from whoever answers first, see systemd-resolved.service:

If lookups are routed to multiple interfaces, the first successful response is returned

And in this case it was wlan0:

...
Added positive ... cache entry ... on wlan0/INET/10.0.0.1
...

Since the request went through wlan0 – the response from AWS DNS for the EKS endpoint was a public IP.

While on the first attempt it was tun0:

...
Added positive ... cache entry ... on tun0/INET/10.0.0.2
...

And in response we got the private IP 10.0.64.9.

So:

  • systemd-resolved queries all available DNS servers
  • returns the result from whoever responds first
  • if the request is from wlan0, the office network – we get a public IP, and the connection goes through
  • if the request is from tun0, OpenVPN and AWS VPC DNS – we get a private IP, and the connection fails with a timeout

Don’t forget to set the log level back to info:

$ sudo resolvectl log-level info

Now let’s move on to routing – why exactly does the connection fail with a timeout error?

VPN and Linux IP routes mess

Let’s look at the routes on the work laptop:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.0.0.1        0.0.0.0         UG    600    0        0 wlan0
10.0.0.0        0.0.0.0         255.255.255.0   U     600    0        0 wlan0
10.0.0.2        172.16.0.1      255.255.255.255 UGH   0      0        0 tun0
10.0.6.162      172.16.0.1      255.255.255.255 UGH   0      0        0 tun0
10.0.32.0       172.16.0.1      255.255.240.0   UG    0      0        0 tun0
10.0.48.0       172.16.0.1      255.255.240.0   UG    0      0        0 tun0
10.0.66.0       172.16.0.1      255.255.255.0   UG    0      0        0 tun0
10.0.67.0       172.16.0.1      255.255.255.0   UG    0      0        0 tun0
10.100.0.0      0.0.0.0         255.255.255.0   U     0      0        0 wg0
...

Here:

  • 10.0.0.0 through wlan0: because this is the office network and we have MacMinis here that we need access to, plus internet access
  • 10.0.0.2, 10.0.32.0, 10.0.48.0 etc – through tun0: these are AWS VPC Private Subnets – requests are routed here through the work OpenVPN for access to AWS RDS and other private resources
  • 10.100.0.0 through wg0: this is my WireGuard network through MikroTik for access to my home networks

And now – here’s where the problem shows up: when the EKS endpoint resolves through AWS VPC DNS 10.0.0.2 – we get the private address 10.0.64.9.

But there’s no dedicated route for it through OpenVPN – so it gets routed through 10.0.0.1, the office router and the public internet:

$ kk get pod
[...] Get \"https://F07***D78.gr7.us-east-1.eks.amazonaws.com/api?timeout=32s\": dial tcp 10.0.64.9:443: i/o timeout"
...

We check the route itself – and we see it goes via 10.0.0.1, the office router, instead of tun0 and OpenVPN:

$ ip route get 10.0.64.9
10.0.64.9 via 10.0.0.1 dev wlan0 src 10.0.0.133 uid 1000 
    cache 

And of course traceroute:

$ traceroute 10.0.64.9
traceroute to 10.0.64.9 (10.0.64.9), 30 hops max, 60 byte packets
 1  office.example.dev (10.0.0.1)  14.261 ms  14.614 ms  14.592 ms
 2  * * *
 3  * * *
...

And obviously, a request to an address in a private subnet sent through the public internet just dies.

Possible solutions

There are a few options – either just add 10.0.64.9 to OpenVPN, or set up split-DNS – and resolve the domains correctly:

  • just add 10.0.64.9 to OpenVPN – then it’ll create a route through tun0 on top of the existing ones
  • you can set up split-DNS through systemd-resolved
  • you can set up split-DNS on a local Unbound or dnsmasq – and switch all DNS queries over to it

The option with 10.0.64.9 on OpenVPN is a hack.

Note: only after I’d written the whole post did I remember that the EKS Control Plane lives in its own VPC Subnets, and I could’ve just added those to OpenVPN the same way it’s done for RDS, but whatever – it turned out interesting anyway 🙂

The split-DNS solution through systemd-resolved looks kind of painful.

And I’d already run Unbound on FreeBSD for my home NAS (see FreeBSD: Home NAS, part 4 – a local DNS with Unbound), the config is simple and clear, and on top of that it kicks systemd-resolved with all its complexities out of the picture – a solid option.

Although dnsmasq might’ve been a better solution for a laptop – because the config is even simpler, but I really liked Unbound – so I went with it.

Arch Linux and Unbound

Install the package itself:

$ sudo pacman -S unbound

What we need to do:

  • route all queries for compute.internal (AWS EC2 etc) through OpenVPN and AWS VPC DNS
  • same for all queries to ops.example.com, because that’s where we have records for AWS RDS like db.prod.ops.example.com
  • route all queries for grafana.net.setevoy through MikroTik, because that’s my local zone for home hosts
  • everything else – send to 1.1.1.1 and 8.8.8.8

We write the /etc/unbound/unbound.conf file, describing three forward-zone blocks with our own DNS and one with public DNS:

server:
    interface: 127.0.0.1
    access-control: 127.0.0.0/8 allow
    do-ip6: no
    hide-identity: yes
    hide-version: yes
    prefetch: yes

# local homelab via MikroTik
forward-zone:
    name: "setevoy."
    forward-addr: 10.100.0.1
    forward-addr: 192.168.0.1

forward-zone:
    name: "compute.internal."
    forward-addr: 10.0.0.2

forward-zone:
    name: "ops.example.com."
    forward-addr: 10.0.0.2

# everything else
forward-zone:
    name: "."
    forward-addr: 1.1.1.1
    forward-addr: 8.8.8.8

Check the syntax:

$ sudo unbound-checkconf
unbound-checkconf: no errors in /etc/unbound/unbound.conf

Disabling systemd-resolved

In the post Arch Linux: WireGuard Peer for connecting to MikroTik I described a solution to a different problem, and there I added dns=systemd-resolved for NetworkManager.

If it’s there – remove it in /etc/NetworkManager/NetworkManager.conf, just set dns=none:

...

[main]
dns=none

Disable systemd-resolved (the internet will drop here – because there’s nowhere to send DNS):

$ sudo systemctl disable --now systemd-resolved systemd-resolved-monitor.socket systemd-resolved-varlink.socket

Restart NetworkManager:

$ sudo systemctl restart NetworkManager

Check port 53 – if systemd-resolve is still alive, that means something is triggering its startup:

$ sudo ss -tulpn | grep ':53'
...
tcp   LISTEN 0      4096   127.0.0.53%lo:53         0.0.0.0:*    users:(("systemd-resolve",pid=723720,fd=25))
tcp   LISTEN 0      4096      127.0.0.54:53         0.0.0.0:*    users:(("systemd-resolve",pid=723720,fd=27))

You can hard-block its startup with systemctl mask:

$ sudo systemctl mask systemd-resolved
$ sudo systemctl stop systemd-resolved

Check the ports once more, and if there’s no longer anyone on port 53 – start unbound.service:

$ sudo systemctl stop systemd-resolved
$ sudo systemctl enable --now unbound
Created symlink '/etc/systemd/system/multi-user.target.wants/unbound.service' → '/usr/lib/systemd/system/unbound.service'.
$ sudo ss -tulpn | grep ':53\b'
udp   UNCONN 0      0          127.0.0.1:53         0.0.0.0:*    users:(("unbound",pid=727532,fd=3))
tcp   LISTEN 0      256        127.0.0.1:53         0.0.0.0:*    users:(("unbound",pid=727532,fd=4))

Edit /etc/resolv.conf – point all DNS through it:

nameserver 127.0.0.1

And try something public:

$ dig google.com +short
216.58.207.14

Then the EKS endpoint – it should return public IPs:

$ dig F07***D78.gr7.us-east-1.eks.amazonaws.com +short
3.***.***.161
44.***.***.46

Try RDS – it should return private IPs from the VPC pool:

$ dig prod.db.kraken.ops.example.com +short
kraken-ops-rds-prod.***.us-east-1.rds.amazonaws.com.
10.0.66.14

Edit the WireGuard /etc/wireguard/wg0.conf – change the DNS parameter:

[Interface]
...
DNS = 127.0.0.1

...

Run sudo resolvconf -u, since we made changes to /etc/resolv.conf manually and WireGuard will complain.

Restart WireGuard:

$ sudo wg-quick down wg0 && sudo wg-quick up wg0

Check the file:

$ cat /etc/resolv.conf
# Generated by resolvconf
nameserver 127.0.0.1

And now everything works as it should.

Loading