Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research] Tracing for OpenFaaS Core services #1354

Open
alexellis opened this issue Oct 13, 2019 · 2 comments
Open

[Research] Tracing for OpenFaaS Core services #1354

alexellis opened this issue Oct 13, 2019 · 2 comments

Comments

@alexellis
Copy link
Member

[Research] Tracing for OpenFaaS Core services

There has been interest expressed in tracing from the community through Slack and issues over the last year and a half. At this time there doesn't seem to be any hard requirement or urgency so the feature hasn't been prioritised. This issue would look into the cost and scope of implementing tracing across the core services in OpenFaaS.

There are also some pros/cons to emitting spans, one of those is tightly coupling to an SDK from OpenTracing, OpenConsensus or, when it's ready OpenTelemetry, in every piece of code we write in the project.

Expected Behaviour

  1. Traces to be propagated, even if not emitted.
  2. Tracing to be available as a feature with a configuration for the "all in one" Jaeger etc image.

Current Behaviour

Unsure if traces are propagated.

Possible Solution

I'd like someone who has worked with OpenTracing or similar to provide a summary of changes and the pros/cons to implement the above. For those requesting this feature, please feel free to +1 or to leave your use-case in a comment.

Prometheus metrics are also emitted currently for core services and functions.

@LucasRoesler
Copy link
Member

We have been using OpenTracing at Contiamo for most of the last 2 years. Here are a couple of things we learned as we implemented it

  1. every function with ctx should immediately start a Span
  2. any function that does I/O should be passed a context as a parameter so that it can start a Span. It also allows for cancel propagation
    1. even if a function isn't doing IO, if a function is public and represents a major logic component of the application, we will often pass a context
  3. every time you feel like you might need to create more than one span in a single function usually indicates that you need a new function.
    1. if you do that, then it is easy to implement, every function with I/O or ctx starts a span and then immediately defers the span.Finish()
    2. if you can't defer the span.Finish(), it suggests that you really have two different functions merged into one and the logic to get the span.Finish right can easily be buggy or broken in a future change
      i have found that the end result is actually cleaner code, because it forces cleaner separation of concerns and much more straightforward architecture
    3. we don't usually directly defer the span.Finish(), instead we have a method that accepts FinishSpan(span, err) which allows us to close the span and also tag it and add the error content if it is not nil. Having this helper method have been very important for consistency and reducing boiler plate code
  4. having good helpers makes a big difference, use a middleware to parse the headers and start spans from all headers. We have a wrapper for db drivers to wrap our sql statements. Similarly, add the context propagation to the http Client constructor so that callers do not need to manually set the headers. Basically, most developers should not need to think about tracing other than making sure they start and finish the span.
    1. we actually still start new spans within the actual handler implementations because the middleware span will capture other possible middlewares as well as the handler implementation and we want to distinguish between the method we wrote and the code that wraps it

When adding tracing to an existing project we started by

  1. add a middleware for the https handlers (db driver etc), as many automatic/framework level traces that we can get
  2. find those places with IO and pass a context, use a context.TODO() context as a place holder until you can get a real one passed down
  3. visualize what it looks like and then start asking what you wish you could see in the trace
  4. add context to those areas,
  5. repeat steps 3) - 4) refactoring and adding context until you are happy the span visualization makes sense and can communicate bugs

Most of the above was summarized from a conversation that Alex and I had on Slack.

@mingfang
Copy link

any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants