Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't handle "Additional" section #90

Open
ggPeti opened this issue Jan 2, 2024 · 10 comments
Open

Can't handle "Additional" section #90

ggPeti opened this issue Jan 2, 2024 · 10 comments

Comments

@ggPeti
Copy link

ggPeti commented Jan 2, 2024

I have a router with built-in DNS server capabilities. It's behaving in the following way:

  1. When it's queried for a name that has multiple IP addresses, it returns all of them in the ANSWER section.
image
  1. When it's queried again in the following few seconds, it returns just a single answer in the ADDITIONAL section.
image

Now, I'm not sure why, but when an answer contains multiple responses, nsncd seems to repeat the query. If the DNS server responds again with multiple responses, everything is fine, and nsncd answers on its front-end socket with a single IP address. But when I use the DNS server described above, it returns a strange response upon receiving the upstream strange response:

image image

For comparison, here's a good response with google.com's IP address (it seems to always return just one IP address from both my strange DNS server, as well as 8.8.8.8):
image


I don't know much about rust, neither about nsncd in particular. I'm a NixOS user with access to this peculiar DNS server for only a few days. So anyone debugging this please do let me know if you want me to provide additional information.

@ggPeti
Copy link
Author

ggPeti commented Jan 2, 2024

@flokli you might want to know about this as the maintainer of nsncd as the new default nameserver daemon in NixOS

@flokli
Copy link
Contributor

flokli commented Jan 2, 2024

Just to confirm, did you check how glibc-nscd behaves?

@ggPeti
Copy link
Author

ggPeti commented Jan 2, 2024

@flokli good suggestion. Just checked and it returns almost exactly the same response upon receiving the "Additional" response, only with a single bit difference to nsncd's output:

image

This actually ends up being interpreted as "Unknown host" by the downstream - I'm checking with arp which presumably calls getaddrinfo. With that bit set by nsncd, it gives me "Unknown server error".

The other difference with glibc-nscd is that it doesn't repeat the DNS query if there are multiple answers in it, and just forwards those addresses to its output. So it doesn't trigger my DNS server's unusual behavior as readily.

I still wonder whether that "Additional" response is spec compliant.

@ggPeti
Copy link
Author

ggPeti commented Jan 2, 2024

I also wonder whether it would make sense for nsncd to look into the ADDITIONAL section - to me it looks like it wouldn't hurt. In my case the verbatim answer to my query is there, and only there.

@leifwalsh
Copy link
Collaborator

I don't know DNS well enough to have an opinion here but if someone wants to put up a PR that adds and passes tests I can look at the rust code and maybe try to put more cursed (DNS) knowledge in my brain

@ggPeti
Copy link
Author

ggPeti commented Apr 10, 2024

The same issue pops up for more people:

https://discourse.nixos.org/t/occasional-dns-problems/35824

https://discourse.nixos.org/t/something-is-footgunning-around-dns-lookups/41368/4

These both seem like the exact same issue: answer is in ADDITIONAL section and nsncd doesn't parse it.

I have no clue what CLASS1232 OPT is supposed to mean though.

@flokli
Copy link
Contributor

flokli commented Apr 11, 2024

Some breadcrumbs on the internet suggest it might be EDNS0. I however don't fully understand yet what happens over the wire.

@ggPeti
Copy link
Author

ggPeti commented Apr 11, 2024

I have the distinct suspicion that this DNS server behavior might be somewhat contrary to DNS spec, because it says:

The additional records section contains RRs which relate to the query, but are not strictly answers for the question.

In my case - and in the other 2 cases above - the additional section contained RRs which were strictly answers for the question.

I don't think it could hurt to merge the additional section into the answer section before parsing it for the resolved address.

@blinsay
Copy link
Collaborator

blinsay commented Apr 19, 2024

I have the distinct suspicion that this DNS server behavior might be somewhat contrary to DNS spec

Your guess sounds right to me.

I don't think it could hurt to merge the additional section into the answer section before parsing it for the resolved address.

As written we don't have the option - we're delegating all of DNS to the glibc that nsncd is built with. That's what ignoring the ADDITIONAL section and (I would guess) what's making multiple queries.

@flokli I think we have to figure out if the right nss modules are getting loaded or we're building nsncd in an odd way or something fun like that. An alternative is doing DNS resolution directly in nsncd with something like trust-dns but I think at that point I'd advocate for ripping out DNS support all together in favor of running a local stub resolver that handles all of this instead of using a more complex NSS config.

@flokli
Copy link
Contributor

flokli commented Apr 23, 2024

@flokli I think we have to figure out if the right nss modules are getting loaded or we're building nsncd in an odd way or something fun like that.

We're just running glibc codepaths, so it's using the same loading order, no?

An alternative is doing DNS resolution directly in nsncd with something like trust-dns but I think at that point I'd advocate for ripping out DNS support all together in favor of running a local stub resolver that handles all of this instead of using a more complex NSS config.

DNS is only one small part of host resolution. Your LDAP/WINS/avahi/… NSS module can also provide host lookups, Being able to have some NSS modules in the chain, without loading them directly inside the application is precisely why we use it in NixOS, as described in more detail in https://flokli.de/posts/2022-11-18-nsncd/.

I'd need to load some context again, but my preferred approach forward would be to look again at how we serialize responses back to the client, and make sure we behave the same way here as described in #90 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants