Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Memory Leak with Kafka source #20403

Closed
biuboombiuboom opened this issue Apr 30, 2024 · 9 comments
Closed

Possible Memory Leak with Kafka source #20403

biuboombiuboom opened this issue Apr 30, 2024 · 9 comments
Labels
domain: performance Anything related to Vector's performance type: bug A code related bug.

Comments

@biuboombiuboom
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We have a configuration with a kafka source and an clickhouse sink.
The memory often overflow。

Configuration

[sources.in]
session_timeout_ms = 180000
type = "kafka"
bootstrap_servers = "bootstrap.brokers.kafka:9092"
group_id = "kafka-consume-group"
topics = ["my-topic"]
librdkafka_options = { "auto.offset.reset" = "latest", "fetch.min.bytes" = "10240" }

[transforms.balance]
type = "remap"
inputs = ["in"]
drop_on_error = true
source = '.route_id=random_int(0,4)'

[transforms.route]
type = "route"
inputs = ["balance"]
reroute_unmatched = false
route.first = '.route_id==0'
route.second = '.route_id==1'
route.third = '.route_id==2'
route.fouth = '.route_id==3'


[sinks.out-clickhouse-1]
type = "clickhouse"
inputs = ["route.first"]
compression = "none"
endpoint = "http://localhost:8125"
database = "mytable"

batch.max_events = 800000
batch.max_bytes = 2147483648
batch.timeout_secs = 120

request.timeout_secs = 120
request.adaptive_concurrency.initial_concurrency=4
# request.adaptive_concurrency.max_concurrency_limit=4

buffer.type = "memory"
buffer.max_events = 400000
buffer.when_full = "block"

[sinks.out-clickhouse-2]
type = "clickhouse"
inputs = ["route.second"]
compression = "none"
endpoint = "http://localhost:8125"
database = "apmmytable

batch.max_events = 800000
batch.max_bytes = 2147483648
batch.timeout_secs = 120

request.timeout_secs = 120
request.adaptive_concurrency.initial_concurrency=4
# request.adaptive_concurrency.max_concurrency_limit=4

buffer.type = "memory"
buffer.max_events = 400000
buffer.when_full = "block"

[sinks.out-clickhouse-3]
type = "clickhouse"
inputs = ["route.third"]
compression = "none"
endpoint = "http://localhost:8125"
database = "mytable"

batch.max_events = 800000
batch.max_bytes = 2147483648
batch.timeout_secs = 120

request.timeout_secs = 120
request.adaptive_concurrency.initial_concurrency=4
# request.adaptive_concurrency.max_concurrency_limit=4

buffer.type = "memory"
buffer.max_events = 400000
buffer.when_full = "block"

[sinks.out-clickhouse-4]
type = "clickhouse"
inputs = ["route.fouth"]
compression = "none"
endpoint = "http://localhost:8125"
database = "mytable"

batch.max_events = 800000
batch.max_bytes = 2147483648
batch.timeout_secs = 120

request.timeout_secs = 120
request.adaptive_concurrency.initial_concurrency=4
# request.adaptive_concurrency.max_concurrency_limit=4

buffer.type = "memory"
buffer.max_events = 400000
buffer.when_full = "block"



[sources.vector_logs]
type = "internal_logs"

[sinks.log2file]
type = "file"
inputs = ["vector_logs"]
path = "/data/logs/vector-%Y-%m-%d.log"
encoding.codec = "logfmt"

[sources.vector_metrics]
type = "internal_metrics"

[sinks.prometheus_exporter]
inputs = ["vector_metrics"]
type = "prometheus_exporter"

Version

0.37.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@biuboombiuboom biuboombiuboom added the type: bug A code related bug. label Apr 30, 2024
@jszwedko
Copy link
Member

Hi @biuboombiuboom ,

There's not a lot of detail here. What leads you to believe there is a memory leak? If you are feeling motivated you could try running Vector under valgrind and providing the profile here.

@nsweeting
Copy link

We're actually seeing a similar issue running the latest version. With a kafka source and a clickhouse sink. A sustained flow of messages seems to inevitably result in OOM. You can see the behaviour in these charts. We have very bursty workloads. Each workload burst results in sustained CPU use (makes sense) as well as continued memory growth. If the work stops - the memory goes back down. But if the work continues... it will OOM.

Screenshot 2024-05-02 at 11 21 34 AM

@jszwedko jszwedko added the domain: performance Anything related to Vector's performance label May 3, 2024
@biuboombiuboom
Copy link
Author

biuboombiuboom commented May 7, 2024

image

image

ack_stream has too many FinalizerEntry not consumed.

Try sync store offsets,and disable BatchNotifier.

Or move ack_stream.next() before messages.next() & with biased.

@bruceg
Copy link
Member

bruceg commented May 7, 2024

I'm not clear what you mean by the following:

Try sync store offsets,and disable BatchNotifier.

The second suggestion is spot on and definitely worth a try. The ack_stream should always be checked before consuming new events, because eventually it will run dry to allow progress for new data.

@biuboombiuboom
Copy link
Author

I'm not clear what you mean by the following:

Try sync store offsets,and disable BatchNotifier.

The second suggestion is spot on and definitely worth a try. The ack_stream should always be checked before consuming new events, because eventually it will run dry to allow progress for new data.

Sorry ,I didn't express it clearly, please disregard the first suggestion

@bruceg
Copy link
Member

bruceg commented May 9, 2024

I have pushed up a possible fix for this issue. If you are able to build Vector from source, check out PR #20467 and try the resulting binary.

@bruceg
Copy link
Member

bruceg commented May 9, 2024

The proposed fix is now merged into the master branch. Please confirm if this resolves the memory leak for you.

@biuboombiuboom
Copy link
Author

The proposed fix is now merged into the master branch. Please confirm if this resolves the memory leak for you.

Yes, It is effective

@jszwedko
Copy link
Member

Thanks for confirming! And for suggesting the original fix. I'll close this issue out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

4 participants