Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS Resolution Fails on Subsequent Requests #12128

Open
Smana opened this issue May 14, 2024 · 4 comments
Open

DNS Resolution Fails on Subsequent Requests #12128

Smana opened this issue May 14, 2024 · 4 comments

Comments

@Smana
Copy link

Smana commented May 14, 2024

What is the issue?

Hello,

I'm using Tailscale to establish a private connection with my AWS infrastructure. The architecture is described here.

For the past few weeks, I've been experiencing intermittent DNS resolution failures. Specifically:

  1. The first DNS resolution attempt succeeds:

    sudo systemctl restart tailscaled.service 
    
    dig @100.100.100.100 vault.priv.cloud.ogenki.io
    
    ; <<>> DiG 9.18.26 <<>> @100.100.100.100 vault.priv.cloud.ogenki.io
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46127
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 4095
    ;; QUESTION SECTION:
    ;vault.priv.cloud.ogenki.io.    IN      A
    
    ;; ANSWER SECTION:
    vault.priv.cloud.ogenki.io. 60  IN      A       10.0.45.217
    
    ;; Query time: 23 msec
    ;; SERVER: 100.100.100.100#53(100.100.100.100) (UDP)
    ;; WHEN: Tue May 14 14:51:40 CEST 2024
    ;; MSG SIZE  rcvd: 71
  2. Subsequent DNS resolution attempts fail:

    dig @100.100.100.100 vault.priv.cloud.ogenki.io
    
    ; <<>> DiG 9.18.26 <<>> @100.100.100.100 vault.priv.cloud.ogenki.io
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 40377
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 1232
    ; EDE: 23 (Network Error): (205.251.194.58:53 rcode=REFUSED for vault.priv.cloud.ogenki.io A)
    ;; QUESTION SECTION:
    ;vault.priv.cloud.ogenki.io.    IN      A
    
    ;; AUTHORITY SECTION:
    cloud.ogenki.io.        899     IN      SOA     ns-1290.awsdns-33.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
    
    ;; Query time: 6 msec
    ;; SERVER: 100.100.100.100#53(100.100.100.100) (UDP)
    ;; WHEN: Tue May 14 14:52:35 CEST 2024
    ;; MSG SIZE  rcvd: 210

I have reviewed the logs, but it is not clear if this issue is related to rate limiting or another problem. Here are the relevant log entries:

May 14 14:51:44 ogenki tailscaled[142916]: LinkChange: major, rebinding. New state: interfaces.State{defaultRoute=enp0s20f0u1u1 ifs={br-66f05aa713fe:[172.18.0.1/16] docker0:[172.17.0.1/16 llu6] enp0s20f0u1u1:[192.168.1.94/24 2a02:842a:8136:9f01:2de8:b9e:a87e:f387/64 llu6] tailscale0:[100.121.75.20/32 fd7a:115c:a1e0::7f01:4b14/128 llu6] wlp0s20f3:[192.168.1.14/24 2a02:842a:8136>
May 14 14:51:44 ogenki tailscaled[142916]: dns: Set: {DefaultResolvers:[1.1.1.1 1.0.0.1 2606:4700:4700::1111 2606:4700:4700::1001 10.0.0.2] Routes:{tail9c382.ts.net.:[] ts.net.:[199.247.155.53 2620:111:8007::53]}+65arpa SearchDomains:[tail9c382.ts.net. eu-west-3.compute.internal. priv.cloud.ogenki.io.] Hosts:5}
May 14 14:51:44 ogenki tailscaled[142916]: dns: Resolvercfg: {Routes:{.:[1.1.1.1 1.0.0.1 2606:4700:4700::1111 2606:4700:4700::1001 10.0.0.2] ts.net.:[199.247.155.53 2620:111:8007::53]} Hosts:5 LocalDomains:[tail9c382.ts.net.]+65arpa}
May 14 14:51:44 ogenki tailscaled[142916]: dns: OScfg: {Nameservers:[100.100.100.100] SearchDomains:[tail9c382.ts.net. eu-west-3.compute.internal. priv.cloud.ogenki.io.] }
May 14 14:51:45 ogenki tailscaled[142916]: wgengine: set DNS config again after major link change
May 14 14:51:45 ogenki tailscaled[142916]: onPortUpdate(port=41641, network=udp6)
May 14 14:51:45 ogenki tailscaled[142916]: onPortUpdate(port=41641, network=udp4)
May 14 14:51:45 ogenki tailscaled[142916]: [RATELIMIT] format("onPortUpdate(port=%v, network=%s)")
May 14 14:51:45 ogenki tailscaled[142916]: Rebind; defIf="enp0s20f0u1u1", ips=[192.168.1.94/24 2a02:842a:8136:9f01:2de8:b9e:a87e:f387/64 fe80::ab73:167d:99a6:1c0/64]
May 14 14:51:45 ogenki tailscaled[142916]: magicsock: 1 active derp conns: derp-18=cr6s,wr286ms
May 14 14:51:45 ogenki tailscaled[142916]: post-rebind ping of DERP region 18 okay
May 14 14:51:46 ogenki tailscaled[142916]: open-conn-track: timeout opening (TCP 100.121.75.20:37778 => 95.x.x.x:80); no associated peer node
May 14 14:51:46 ogenki tailscaled[142916]: [RATELIMIT] format("open-conn-track: timeout opening %v; no associated peer node")
May 14 14:51:55 ogenki tailscaled[142916]: [RATELIMIT] format("open-conn-track: timeout opening %v; no associated peer node") (6 dropped)
May 14 14:51:55 ogenki tailscaled[142916]: open-conn-track: timeout opening (TCP 100.121.75.20:37756 => 95.x.x.x:80); no associated peer node
May 14 14:51:55 ogenki tailscaled[142916]: open-conn-track: timeout opening (TCP 100.121.75.20:37754 => 95.x.x.x:80); no associated peer node
May 14 14:51:55 ogenki tailscaled[142916]: [RATELIMIT] format("open-conn-track: timeout opening %v; no associated peer node")
May 14 14:52:03 ogenki tailscaled[142916]: [RATELIMIT] format("open-conn-track: timeout opening %v; no associated peer node") (9 dropped)
May 14 14:52:03 ogenki tailscaled[142916]: open-conn-track: timeout opening (TCP 100.121.75.20:54148 => 95.x.x.x:80); no associated peer node
May 14 14:52:03 ogenki tailscaled[142916]: open-conn-track: timeout opening (TCP 100.121.75.20:37770 => 95.x.x.x:80); no associated peer node
...
  • There aren't any error logs on the subnet router side.

I've seen many issues related to DNS but none that is really close to mine. Could you please help me to troubleshoot?

Regards,
Smana

Steps to reproduce

It can be reproduced using this Terraform configuration.

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

Archlinux on laptop (client), Ubuntu on subnet router

Tailscale version

1.66.3

Other software

No response

Bug report

No response

@Smana
Copy link
Author

Smana commented May 16, 2024

While waiting for your support I temporarily change my resolv.conf to point to the aws resolver. Which is not ideal.

nameserver 10.0.0.2
nameserver 100.100.100.100
search tail...

@Smana
Copy link
Author

Smana commented May 21, 2024

The latest version 1.66.4 seems to solve my issue.
Doing a few tests before closing

@Smana
Copy link
Author

Smana commented May 21, 2024

Confirmed

@Smana
Copy link
Author

Smana commented May 23, 2024

Reopening because I still face the exact same issue

@Smana Smana reopened this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants