Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional logging to track mesh clients interacting with Namerd #2277

Closed
wants to merge 7 commits into from

Conversation

dadjeibaah
Copy link
Contributor

@dadjeibaah dadjeibaah commented May 23, 2019

There have been reports of Linkerd mesh clients not receiving updates from a Namerd server. There currently are watch state endpoints in Linkerd that tell you the last known value received from Namerd. This information is helpful in figuring out when Linkerd gets into this state. However, this information does not reveal the sequence of events that led up to that state.

This PR adds logging in the mesh client that logs every state change the Linkerd process is notified of i.e. Address set changes observed by Namerd. Given that this PR increases the number of log lines that will be available for diagnosis, each mesh client will now contain a unique identifier that will be injected when a watch request is initiated between Linkerd and Namerd for better analysis. This will help identify which client gets into a bad state when crawling through logs and will help in situations were remote IPs (NATed IPs) do not easily identify a Linkerd client in the Namerd logs.

fixes #2245

@dadjeibaah dadjeibaah requested review from olix0r and adleong May 23, 2019 21:56
Dennis Adjei-Baah added 6 commits May 23, 2019 15:02
@dadjeibaah dadjeibaah self-assigned this May 23, 2019
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the PR name and description to give some context and high level description of the approach here? What do you think about using the remote address (IP and ephemeral client port) in the server logs instead of generating a UUID client id? This would allow us to more easily associate log messages on the server with the client that initiated that request and would avoid the need for plumbing client ids all over the place.

val frames = h2.Stream()
def loop(): Future[Unit] =
msgs.recv().transform {
case Return(Stream.Releasable(s, release)) =>
log.trace(s"Streaming response metadata: ${reqMeta.mkString(":")}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says response metadata but it's printing the request headers, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent here? Each time there's a response message, the request headers get printed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct since the request headers may contain information useful for the particular request.


class HeaderInjector(injectHeaders: Map[String, String]) extends SimpleFilter[Request, Response] {
override def apply(request: Request, service: Service[Request, Response]): Future[Response] = {
val req = request.dup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? aren't headers mutable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember from my last run-in with H2 request mangling that I had to dup() this for some reason, alas, I do not remember. I think we don't need this.

@dadjeibaah dadjeibaah changed the title Dad/mesh trace logging Add additional logging to track mesh clients interacting with Namerd May 28, 2019
@adleong
Copy link
Member

adleong commented Jun 19, 2019

Thanks for putting this together, @dadjeibaah. I think a simpler solution here would be for Namerd to simply include the remote (client) address in its log messages rather than generating a UUID on the client. The UUID approach has some problematic drawbacks:

  • It's not immediately clear from the Namerd logs how to map a UUID to a Linkerd instance. To do this, you must scan all Linkerd instance logs for a matching UUID
  • Generating and propagating the UUID adds extra complexity to the code
  • Adding extra metadata through request headers breaks from idiomatic gRPC and requires us to bake in special handling for this in our gRPC code.

I can see why a UUID solution would be desirable in specific situations where the client address is not available (such as when the request goes through a NAT) but I think that requirement is specialized enough that it doesn't merit putting this logic in master.

@adleong adleong closed this Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support for trace logs for all outbound and inbound communication in linkerd and namerd
2 participants