New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant number of Clickhouse exceptions #4931
Comments
Thanks for opening this issue. A team member should give feedback soon. In the meantime, feel free to check out the contributing guidelines. |
This shouldn't be happening unless some data is corrupted. Can you share the number of records in the |
I ran some queries to hopefully help you out. I do not know if my application is returning negative latencies. The query below maybe says "no" if I understand the result. Having said that, I am not intentionally trying to return a negative latency. I am mostly using SigNoz to understand spans visually and review exceptions, so I don't spend any time looking at latency. Let me know if you want me to run any more queries on the database. The ones below were guesses based on your question and ChatGPT.
Command: docker exec -it signoz-clickhouse clickhouse-client --query="SELECT COUNT(*) AS total_records, SUM(count) AS total_count, MIN(min) AS minimum_latency, MAX(max) AS maximum_latency, MIN(max) AS minimum_of_max, MAX(min) AS maximum_of_min FROM signoz_metrics.exp_hist WHERE metric_name = 'signoz_latency'" Results:
|
Thanks. I am not sure what could be the reason. This exception is thrown when the serialized data is incorrect and can't be deserialized. From the logs containing the part, does it show any day other than |
Did start seeing them after after any specific update? What is your ClickHouse version? |
There is no change in the binary format. I suspect it more likely some data corruption issue or HW issue. |
With respect to dates, it appears that most dates are impacted:
Gives: My ClickHouse version:
Regarding version where it may have happened, my guess it that it was sometime after upgrading to 0.38.2? That's just a guess based on the updates I went through. SigNoz was definitely OK for some duration and then these logs started filling up after those updates. I had to provisioning a bigger drive at some point and now that bigger one is filling up so I need to do something different. For a sample of the full logs:
|
Assuming some sort of corruption, is there a simple way to clear out the bad data or other ClickHouse issues? I am not concerned if I lose data. I am more concerned about the disk filling up again and losing the server. If possible, it'd be nice to maintain the SigNoz settings, but I could rebuild those if that's needed. Thanks for the quick replies and feedback. I really appreciate the support. |
It would help if we could find a way to reproduce this. If you are not using the
|
Bug description
The
signoz-clickhouse
container is filled with tons of logs related to failed background tasks. Most errors appear to include the same language (example below). The main problem for me is that these logs or exceptions appear to take significant disk space. If it's possible to reduce their disk impact, I wouldn't care.Important part of exceptions is probably: Code: 117. DB::Exception: Invalid flag for negative store:
[...] in table signoz_metrics.exp_hist (2b326032-b962-42d9-ab8c-bf5431fcecfa) located on disk default of type local, from mark 0 with max_rows_to_read = 286)
Expected behavior
Exceptions should probably not happen as frequently. If they do, they should not fill the disk?
How to reproduce
Version information
x86_64
Additional context
I added line breaks so you don't have to scroll this:
Thank you for your bug report – we love squashing them!
The text was updated successfully, but these errors were encountered: