New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: [date 5.9] tke regression: tpcc 1000warehouse 1000threads connection timeout #15960
Comments
update on 5.11 also in tpcc 100 warehouse 1000 terminals job link: |
@volgariver6 The issue here is also that when there's a problem, the proxy reports an error indicating a failure to write to the client. 10.143.198.17: client ip |
@daviszhen will help to fix it. |
it was fixed. |
not fixed update on 5.13 job link: |
execute执行超过60秒。然后client链接断开了。不同于卡住的问题。 |
repro:https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9064407826/job/24919544811 |
从metrics看,17:09:00 时,p99 apply-latency 是 23.9s,apply是4.66ms,说明logtail的数量很多。 观察到一个现象,logtail collect duration 在 17:01:30 开始波动。17:01:30 产生了一个尖峰,然后在 17:03:30 开始变成低谷,这个低谷一直持续到 17:10:00。这个时间段刚好对应上面的 cn 的 logtail 数量尖峰。 在 17:03:30,cn 侧的 logtail consume 累积时间达到 8.55s,这个时候的 apply queue 是 1.83k,平均是 4ms 左右消费一个logtail,消费速度正常。问题仍然是突发的logtail数量。 |
这次复现的logtail状况是,大部分apply时间在几ms,偶发几次200ms到400ms的apply。7点半左右,cn收到的logtail数量暴增,但apply queue没有堆积,消费速度正常。 |
long running go routine file: |
Is there an existing issue for the same bug?
Branch Name
main
Commit ID
1aed8d9
Other Environment Information
Actual Behavior
job:
https://github.com/matrixorigin/mo-nightly-regression/actions/runs/9019459807/job/24803217074
The tke environment did not restart
mo log:
https://grafana.ci.matrixorigin.cn/explore?panes=%7B%22fzn%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bnamespace%3D%5C%22mo-nightly-regression-20240509%5C%22%7D%20%7C%3D%20%60%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%221715283409000%22,%22to%22:%221715283420000%22%7D%7D%7D&schemaVersion=1&orgId=1
profile:
linktimeout_profile.tar.gz
Expected Behavior
No response
Steps to Reproduce
Additional information
No response
The text was updated successfully, but these errors were encountered: