Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Source send cancelled." #20305

Open
fvarg00 opened this issue Apr 15, 2024 · 5 comments
Open

"Source send cancelled." #20305

fvarg00 opened this issue Apr 15, 2024 · 5 comments
Labels
meta: awaiting author Pull requests that are awaiting their author. type: bug A code related bug.

Comments

@fvarg00
Copy link

fvarg00 commented Apr 15, 2024

A note for the community

No response

Problem

Hi, we see below error when there is high CPU load on vector pods. Do we know if this is a known problem?. Any help is appreciated. Thanks!

ERROR source{component_kind="source" component_id=datadog_agents component_type=datadog_agent}:http-request{method=POST path=/api/v2/series}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count= reason="Source send cancelled." internal_log_rate_limit=true

Configuration

No response

Version

0.28.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@fvarg00 fvarg00 added the type: bug A code related bug. label Apr 15, 2024
@jszwedko
Copy link
Member

Hi @fvarg00 ,

This appears to be an incomplete bug report. Do you mind filling out all of the fields (in particular how you ran into this situation)? It'll be difficult to reproduce otherwise or even understand if it is a bug or not.

@jszwedko jszwedko added the meta: awaiting author Pull requests that are awaiting their author. label Apr 16, 2024
@JustinJKelly
Copy link

JustinJKelly commented Apr 16, 2024

Hello @jszwedko,

Problem
Hi, we see below error when there is high CPU load on vector pods. Do we know if this is a known problem?. Any help is appreciated. Thanks!

ERROR source{component_kind="source" component_id=datadog_agents component_type=datadog_agent}:http-request{method=POST path=/api/v2/series}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count= reason="Source send cancelled." internal_log_rate_limit=true

Configuration
No response

Version
image: docker.io/timberio/vector:0.37.0-distroless-libc

Debug Output
N/A

Example Data
N/A

Additional Context

We are using DataDog Agent to send logs, metrics, traces to vector.

We use transforms to modify tags for every event that goes through vector, as well as route them to different sinks.

We use ClusterIP for the Kubernetes service and there is no explicit LoadBalancer to distribute traffic among vector pods.

For reference, here is an image regarding CPU/memory load from pods where the errors are coming from.
Screenshot 2024-04-16 at 10 40 13 AM

References
N/A

@jszwedko
Copy link
Member

Thanks @fvarg00 . I'm guessing what you are seeing is request timeouts from the client, which will cancel the send downstream. Can you share your configuration? I'm particularly interested if you are using the acknowledgements feature or not.

@JustinJKelly
Copy link

Hello @jszwedko, we are not using the acknowledgements feature. We see that acknowledgements field is deprecated. Is that a field something you think would cause this issue or a possible fix?

Which part of configuration would you need to see?

@jszwedko
Copy link
Member

Hello @jszwedko, we are not using the acknowledgements feature. We see that acknowledgements field is deprecated. Is that a field something you think would cause this issue or a possible fix?

Which part of configuration would you need to see?

Gotcha, if you aren't using the acknowledgements feature, than it seems likely that the topology is just applying back-pressure to the Datadog Agent source: that is, the downstream components aren't sending fast enough so data is buffering in the source. The fix would be to identify and resolve the bottleneck (in your case it seems like it might be CPU-bound). To identify the bottleneck you can use the utilization metric published by internal_metrics. Identifying the first component in the pipeline where the number is is 1 (or close to it) usually indicates the bottleneck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: awaiting author Pull requests that are awaiting their author. type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants