-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Namerd: Improve /dtab/delegator.json for discovery health checking #2013
Comments
@mrezaei00 This is an interesting idea. The So I'm not sure if it makes sense to have an HTTP status code that describes the content of the delegation tree (ie whether the tree is neg or bound or fail). Here is the logic that we use to color the boxes in the delegator UI: https://github.com/linkerd/linkerd/blob/master/admin/src/main/resources/io/buoyant/admin/js/src/delegator.js It seems reasonable to move that logic onto the server side and have the status of each node returned as part of the json structure. Another possibility would be to encode the status (bound/neg/fail) of the tree in a response header. What do you think? |
@adleong I think moving the logic to the server is a great idea, but I still think a health check can use some boost from the return code, and my main reason is to avoid implementing a "health check wrapper" just to get the service health checked. I feel like that pattern may quickly diverge as the service is used more and more. If the health of the service is decided on the server, then everyone uses the same core health check. Another reason for including a corresponding response code would be the limitations that come with the monitoring systems out there. We use K8S in this case to do health checking through I also agree that 5xx is not the right choice, and 404(4xx) or new codes may be better choice(s). |
Issue Type:
What happened:
We've been building tools, checks, and monitoring for alerting on discovery problems, given a test prefix and service to resolve.
The
/dtab/delegator.json
is great, but doesn't concretely at the return code level identify issues with discovery, as does the admin UI (red vs. green):vs.
...and the response body needs to be parsed to tell red from green, which is particularly challenging when using Namerd in a horizontally scalable and containerized environment, i.e. Kubernetes.
What you expected to happen:
In the process of designing the "right" monitoring for the service, I've been treating
/dtab/delegator.json
as a health endpoint, and if that's the right way of health checking the discovery aspect of Namerd, does it make sense to return appropriate return codes for successful and unsuccessful discoveries? Right now for hits and misses the service returns200 OK
, but I expect the misses to return a non-200 response.How to reproduce it (as minimally and precisely as possible):
/<prefix>/<bogus_service_name>
--> red/<prefix>/<discovered_service_name>
--> greenAnything else we need to know?:
I'm no particularly married to any specific return codes, but I was thinking this could be returning a 5xx code to identify server side issue for discovery.
Another option could be to standardize service mesh discovery (error) codes, given the fast adaptation of service mesh by the SOA users, but I don't have any info on how a new return code space can be introduced and standardized.
Environment:
The text was updated successfully, but these errors were encountered: