-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大规模集群下3000节点,从4wPod扩容到6wPod时,pod无法正常分配ip #4078
Labels
Comments
zhangzujian
added
performance
Anything that can make Kube-OVN faster
and removed
bug
Something isn't working
labels
May 24, 2024
@cmdy 大佬,你们已经用到这么大物理机集群了嘛? |
嗯嗯 差不多了,我们现在线上单集群已经1700多台物理机了,最近在验证kube-ovn 准备用这套方案 |
ovn-northd.log 2024-05-24T03:33:30.635Z|283076|northd|ERR|Dropped 28983 log messages in last 72 seconds (most recently, 24 seconds ago) due to excessive rate
2024-05-24T03:33:30.636Z|283077|northd|ERR|lport fake-pod-7b99c6d54d-r87tb.fake-pod in port group node.kwok.node.545 not found.
2024-05-24T03:33:30.762Z|283078|inc_proc_eng|INFO|node: northd, recompute (forced) took 10572ms
2024-05-24T03:33:31.913Z|283079|inc_proc_eng|INFO|node: lflow, recompute (forced) took 860ms
2024-05-24T03:33:32.132Z|283080|jsonrpc|WARN|tcp:[10.56.64.18]:6642: send error: Broken pipe
2024-05-24T03:33:32.193Z|283081|ovn_northd|INFO|OVNSB commit failed, force recompute next time.
2024-05-24T03:33:32.213Z|283082|timeval|WARN|Unreasonably long 12024ms poll interval (12016ms user, 7ms system)
2024-05-24T03:33:32.213Z|283083|timeval|WARN|faults: 1275 minor, 0 major
2024-05-24T03:33:32.213Z|283084|timeval|WARN|context switches: 0 voluntary, 17 involuntary
2024-05-24T03:33:32.214Z|283085|poll_loop|INFO|wakeup due to [POLLIN][POLLHUP] on fd 3 (/var/run/ovn/ovn-northd.595.ctl<->) at ../lib/stream-fd.c:157 (100% CPU usage)
2024-05-24T03:33:32.215Z|283086|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (10.56.64.18:24322<->10.56.64.18:6641) at ../lib/stream-fd.c:157 (100% CPU usage)
2024-05-24T03:33:32.216Z|283087|reconnect|WARN|tcp:[10.56.64.18]:6642: connection dropped (Broken pipe)
2024-05-24T03:33:32.216Z|283088|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:33:32.216Z|283089|poll_loop|INFO|wakeup due to 0-ms timeout at tcp:[10.56.64.18]:6641 (100% CPU usage)
2024-05-24T03:33:32.216Z|283090|reconnect|INFO|tcp:[10.56.64.18]:6641: connection closed by peer
2024-05-24T03:33:33.217Z|283091|poll_loop|INFO|wakeup due to 1001-ms timeout at ../lib/reconnect.c:677 (100% CPU usage)
2024-05-24T03:33:33.217Z|283092|reconnect|INFO|tcp:[10.56.64.16]:6641: connecting...
2024-05-24T03:33:33.217Z|283093|reconnect|INFO|tcp:[10.56.64.16]:6642: connecting...
2024-05-24T03:33:33.217Z|283094|reconnect|INFO|tcp:[10.56.64.16]:6641: connected
2024-05-24T03:33:33.218Z|283095|reconnect|INFO|tcp:[10.56.64.16]:6642: connected
2024-05-24T03:33:33.231Z|283096|ovsdb_cs|INFO|tcp:[10.56.64.16]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:33.231Z|283097|reconnect|INFO|tcp:[10.56.64.16]:6642: connection attempt timed out
2024-05-24T03:33:33.231Z|283098|reconnect|INFO|tcp:[10.56.64.16]:6642: waiting 2 seconds before reconnect
2024-05-24T03:33:33.238Z|283099|ovsdb_cs|INFO|tcp:[10.56.64.16]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:33.238Z|283100|reconnect|INFO|tcp:[10.56.64.16]:6641: connection attempt timed out
2024-05-24T03:33:33.238Z|283101|reconnect|INFO|tcp:[10.56.64.16]:6641: waiting 2 seconds before reconnect
2024-05-24T03:33:35.233Z|283102|reconnect|INFO|tcp:[10.56.64.17]:6642: connecting...
2024-05-24T03:33:35.233Z|283103|reconnect|INFO|tcp:[10.56.64.17]:6642: connected
2024-05-24T03:33:35.236Z|283104|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:33:35.236Z|283105|ovsdb_cs|INFO|tcp:[10.56.64.17]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:35.236Z|283106|reconnect|INFO|tcp:[10.56.64.17]:6642: connection attempt timed out
2024-05-24T03:33:35.236Z|283107|reconnect|INFO|tcp:[10.56.64.17]:6642: waiting 4 seconds before reconnect
2024-05-24T03:33:35.236Z|283108|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:33:35.237Z|283109|reconnect|INFO|tcp:[10.56.64.17]:6641: connecting...
2024-05-24T03:33:35.237Z|283110|reconnect|INFO|tcp:[10.56.64.17]:6641: connected
2024-05-24T03:33:35.238Z|283111|ovsdb_cs|INFO|tcp:[10.56.64.17]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:35.238Z|283112|reconnect|INFO|tcp:[10.56.64.17]:6641: connection attempt timed out
2024-05-24T03:33:35.238Z|283113|reconnect|INFO|tcp:[10.56.64.17]:6641: waiting 4 seconds before reconnect
2024-05-24T03:33:39.237Z|283114|reconnect|INFO|tcp:[10.56.64.18]:6642: connecting...
2024-05-24T03:33:39.237Z|283115|reconnect|INFO|tcp:[10.56.64.18]:6642: connected
2024-05-24T03:33:39.237Z|283116|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:33:39.239Z|283117|reconnect|INFO|tcp:[10.56.64.18]:6641: connecting...
2024-05-24T03:33:39.239Z|283118|reconnect|INFO|tcp:[10.56.64.18]:6641: connected
2024-05-24T03:33:50.111Z|283119|inc_proc_eng|INFO|node: northd, recompute (forced) took 10843ms
2024-05-24T03:33:51.256Z|283120|inc_proc_eng|INFO|node: lflow, recompute (forced) took 860ms
2024-05-24T03:33:51.473Z|283121|jsonrpc|WARN|tcp:[10.56.64.18]:6642: send error: Broken pipe
2024-05-24T03:33:51.538Z|283122|ovn_northd|INFO|OVNSB commit failed, force recompute next time.
2024-05-24T03:33:51.556Z|283123|timeval|WARN|Unreasonably long 12288ms poll interval (12250ms user, 38ms system)
2024-05-24T03:33:51.556Z|283124|timeval|WARN|faults: 1273 minor, 0 major
2024-05-24T03:33:51.556Z|283125|timeval|WARN|disk: 0 reads, 8 writes
2024-05-24T03:33:51.556Z|283126|timeval|WARN|context switches: 0 voluntary, 24 involuntary
2024-05-24T03:33:51.556Z|283127|coverage|INFO|Dropped 1 log messages in last 20 seconds (most recently, 20 seconds ago) due to excessive rate
2024-05-24T03:33:51.556Z|283128|coverage|INFO|Skipping details of duplicate event coverage for hash=ab6315f5
2024-05-24T03:33:51.556Z|283129|poll_loop|INFO|Dropped 3 log messages in last 19 seconds (most recently, 19 seconds ago) due to excessive rate
2024-05-24T03:33:51.556Z|283130|poll_loop|INFO|wakeup due to [POLLIN][POLLHUP] on fd 3 (/var/run/ovn/ovn-northd.595.ctl<->) at ../lib/stream-fd.c:157 (97% CPU usage)
2024-05-24T03:33:51.556Z|283131|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (10.56.64.18:24328<->10.56.64.18:6641) at ../lib/stream-fd.c:157 (97% CPU usage)
2024-05-24T03:33:51.558Z|283132|reconnect|WARN|tcp:[10.56.64.18]:6642: connection dropped (Broken pipe)
2024-05-24T03:33:51.558Z|283133|reconnect|INFO|tcp:[10.56.64.18]:6642: continuing to reconnect in the background but suppressing further logging
2024-05-24T03:33:51.558Z|283134|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:33:51.559Z|283135|poll_loop|INFO|wakeup due to 0-ms timeout at tcp:[10.56.64.18]:6641 (97% CPU usage)
2024-05-24T03:33:51.559Z|283136|reconnect|INFO|tcp:[10.56.64.18]:6641: connection closed by peer
2024-05-24T03:33:52.560Z|283137|reconnect|INFO|tcp:[10.56.64.16]:6641: connecting...
2024-05-24T03:33:52.560Z|283138|reconnect|INFO|tcp:[10.56.64.16]:6641: connected
2024-05-24T03:33:52.571Z|283139|ovsdb_cs|INFO|tcp:[10.56.64.16]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:52.571Z|283140|reconnect|INFO|tcp:[10.56.64.16]:6641: connection attempt timed out
2024-05-24T03:33:52.571Z|283141|reconnect|INFO|tcp:[10.56.64.16]:6641: waiting 2 seconds before reconnect
2024-05-24T03:33:54.572Z|283142|reconnect|INFO|tcp:[10.56.64.17]:6641: connecting...
2024-05-24T03:33:54.572Z|283143|reconnect|INFO|tcp:[10.56.64.17]:6641: connected
2024-05-24T03:33:54.583Z|283144|ovsdb_cs|INFO|tcp:[10.56.64.17]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:54.583Z|283145|reconnect|INFO|tcp:[10.56.64.17]:6641: connection attempt timed out
2024-05-24T03:33:54.583Z|283146|reconnect|INFO|tcp:[10.56.64.17]:6641: waiting 4 seconds before reconnect
2024-05-24T03:33:58.584Z|283147|reconnect|INFO|tcp:[10.56.64.18]:6641: connecting...
2024-05-24T03:33:58.584Z|283148|reconnect|INFO|tcp:[10.56.64.18]:6641: connected
2024-05-24T03:33:59.559Z|283149|reconnect|INFO|tcp:[10.56.64.16]:6642: connected
2024-05-24T03:33:59.560Z|283150|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:33:59.560Z|283151|ovsdb_cs|INFO|tcp:[10.56.64.16]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:33:59.560Z|283152|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:34:07.564Z|283153|reconnect|INFO|tcp:[10.56.64.17]:6642: connected
2024-05-24T03:34:07.565Z|283154|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:34:07.566Z|283155|ovsdb_cs|INFO|tcp:[10.56.64.17]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:34:07.566Z|283156|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:36:29.427Z|283343|inc_proc_eng|INFO|node: northd, recompute (forced) took 10600ms
2024-05-24T03:36:30.527Z|283344|inc_proc_eng|INFO|node: lflow, recompute (forced) took 808ms
2024-05-24T03:36:30.755Z|283345|jsonrpc|WARN|tcp:[10.56.64.18]:6642: send error: Broken pipe
2024-05-24T03:36:30.823Z|283346|ovn_northd|INFO|OVNSB commit failed, force recompute next time.
2024-05-24T03:36:30.841Z|283347|timeval|WARN|Unreasonably long 12014ms poll interval (12002ms user, 11ms system)
2024-05-24T03:36:30.841Z|283348|timeval|WARN|faults: 1650 minor, 0 major
2024-05-24T03:36:30.841Z|283349|timeval|WARN|context switches: 0 voluntary, 14 involuntary
2024-05-24T03:36:30.841Z|283350|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (10.56.64.18:24380<->10.56.64.18:6641) at ../lib/stream-fd.c:157 (100% CPU usage)
2024-05-24T03:36:30.842Z|283351|reconnect|WARN|tcp:[10.56.64.18]:6642: connection dropped (Broken pipe)
2024-05-24T03:36:30.842Z|283352|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:36:30.843Z|283353|poll_loop|INFO|wakeup due to 0-ms timeout at tcp:[10.56.64.18]:6641 (100% CPU usage)
2024-05-24T03:36:30.843Z|283354|reconnect|INFO|tcp:[10.56.64.18]:6641: connection closed by peer
2024-05-24T03:36:31.843Z|283355|poll_loop|INFO|wakeup due to 999-ms timeout at ../lib/reconnect.c:677 (100% CPU usage)
2024-05-24T03:36:31.843Z|283356|reconnect|INFO|tcp:[10.56.64.16]:6641: connecting...
2024-05-24T03:36:31.843Z|283357|reconnect|INFO|tcp:[10.56.64.16]:6642: connecting...
2024-05-24T03:36:31.843Z|283358|poll_loop|INFO|wakeup due to [POLLOUT] on fd 3 (10.56.64.18:50246<->10.56.64.16:6641) at ../lib/stream-fd.c:153 (100% CPU usage)
2024-05-24T03:36:31.843Z|283359|reconnect|INFO|tcp:[10.56.64.16]:6641: connected
2024-05-24T03:36:31.844Z|283360|reconnect|INFO|tcp:[10.56.64.16]:6642: connected
2024-05-24T03:36:31.858Z|283361|ovsdb_cs|INFO|tcp:[10.56.64.16]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:31.858Z|283362|reconnect|INFO|tcp:[10.56.64.16]:6642: connection attempt timed out
2024-05-24T03:36:31.858Z|283363|reconnect|INFO|tcp:[10.56.64.16]:6642: waiting 2 seconds before reconnect
2024-05-24T03:36:31.864Z|283364|ovsdb_cs|INFO|tcp:[10.56.64.16]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:31.864Z|283365|reconnect|INFO|tcp:[10.56.64.16]:6641: connection attempt timed out
2024-05-24T03:36:31.864Z|283366|reconnect|INFO|tcp:[10.56.64.16]:6641: waiting 2 seconds before reconnect
2024-05-24T03:36:33.859Z|283367|reconnect|INFO|tcp:[10.56.64.17]:6642: connecting...
2024-05-24T03:36:33.859Z|283368|reconnect|INFO|tcp:[10.56.64.17]:6642: connected
2024-05-24T03:36:33.860Z|283369|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:36:33.860Z|283370|ovsdb_cs|INFO|tcp:[10.56.64.17]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:33.860Z|283371|reconnect|INFO|tcp:[10.56.64.17]:6642: connection attempt timed out
2024-05-24T03:36:33.860Z|283372|reconnect|INFO|tcp:[10.56.64.17]:6642: waiting 4 seconds before reconnect
2024-05-24T03:36:33.860Z|283373|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:36:33.863Z|283374|reconnect|INFO|tcp:[10.56.64.17]:6641: connecting...
2024-05-24T03:36:33.863Z|283375|reconnect|INFO|tcp:[10.56.64.17]:6641: connected
2024-05-24T03:36:33.864Z|283376|ovsdb_cs|INFO|tcp:[10.56.64.17]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:33.864Z|283377|reconnect|INFO|tcp:[10.56.64.17]:6641: connection attempt timed out
2024-05-24T03:36:33.864Z|283378|reconnect|INFO|tcp:[10.56.64.17]:6641: waiting 4 seconds before reconnect
2024-05-24T03:36:37.861Z|283379|reconnect|INFO|tcp:[10.56.64.18]:6642: connecting...
2024-05-24T03:36:37.861Z|283380|reconnect|INFO|tcp:[10.56.64.18]:6642: connected
2024-05-24T03:36:37.863Z|283381|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:36:37.864Z|283382|reconnect|INFO|tcp:[10.56.64.18]:6641: connecting...
2024-05-24T03:36:37.864Z|283383|reconnect|INFO|tcp:[10.56.64.18]:6641: connected
2024-05-24T03:36:49.551Z|283384|inc_proc_eng|INFO|node: northd, recompute (forced) took 10650ms
2024-05-24T03:36:50.660Z|283385|inc_proc_eng|INFO|node: lflow, recompute (forced) took 820ms
2024-05-24T03:36:50.894Z|283386|jsonrpc|WARN|tcp:[10.56.64.18]:6642: send error: Broken pipe
2024-05-24T03:36:50.950Z|283387|ovn_northd|INFO|OVNSB commit failed, force recompute next time.
2024-05-24T03:36:50.972Z|283388|timeval|WARN|Unreasonably long 12072ms poll interval (12065ms user, 6ms system)
2024-05-24T03:36:50.972Z|283389|timeval|WARN|faults: 378 minor, 0 major
2024-05-24T03:36:50.972Z|283390|timeval|WARN|disk: 0 reads, 8 writes
2024-05-24T03:36:50.972Z|283391|timeval|WARN|context switches: 0 voluntary, 13 involuntary
2024-05-24T03:36:50.972Z|283392|coverage|INFO|Dropped 1 log messages in last 20 seconds (most recently, 20 seconds ago) due to excessive rate
2024-05-24T03:36:50.972Z|283393|coverage|INFO|Skipping details of duplicate event coverage for hash=ab6315f5
2024-05-24T03:36:50.972Z|283394|poll_loop|INFO|Dropped 2 log messages in last 19 seconds (most recently, 19 seconds ago) due to excessive rate
2024-05-24T03:36:50.975Z|283395|poll_loop|INFO|wakeup due to [POLLIN][POLLHUP] on fd 3 (/var/run/ovn/ovn-northd.595.ctl<->) at ../lib/stream-fd.c:157 (86% CPU usage)
2024-05-24T03:36:50.977Z|283396|poll_loop|INFO|wakeup due to [POLLIN] on fd 17 (10.56.64.18:24388<->10.56.64.18:6641) at ../lib/stream-fd.c:157 (86% CPU usage)
2024-05-24T03:36:50.980Z|283397|reconnect|WARN|tcp:[10.56.64.18]:6642: connection dropped (Broken pipe)
2024-05-24T03:36:50.980Z|283398|reconnect|INFO|tcp:[10.56.64.18]:6642: continuing to reconnect in the background but suppressing further logging
2024-05-24T03:36:50.980Z|283399|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:36:50.981Z|283400|poll_loop|INFO|wakeup due to 0-ms timeout at tcp:[10.56.64.18]:6641 (86% CPU usage)
2024-05-24T03:36:50.981Z|283401|reconnect|INFO|tcp:[10.56.64.18]:6641: connection closed by peer
2024-05-24T03:36:51.981Z|283402|poll_loop|INFO|wakeup due to 1000-ms timeout at ../lib/reconnect.c:677 (86% CPU usage)
2024-05-24T03:36:51.981Z|283403|reconnect|INFO|tcp:[10.56.64.16]:6641: connecting...
2024-05-24T03:36:51.981Z|283404|reconnect|INFO|tcp:[10.56.64.16]:6641: connected
2024-05-24T03:36:51.994Z|283405|ovsdb_cs|INFO|tcp:[10.56.64.16]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:51.994Z|283406|reconnect|INFO|tcp:[10.56.64.16]:6641: connection attempt timed out
2024-05-24T03:36:51.994Z|283407|reconnect|INFO|tcp:[10.56.64.16]:6641: waiting 2 seconds before reconnect
2024-05-24T03:36:53.996Z|283408|reconnect|INFO|tcp:[10.56.64.17]:6641: connecting...
2024-05-24T03:36:53.996Z|283409|reconnect|INFO|tcp:[10.56.64.17]:6641: connected
2024-05-24T03:36:54.007Z|283410|ovsdb_cs|INFO|tcp:[10.56.64.17]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:54.007Z|283411|reconnect|INFO|tcp:[10.56.64.17]:6641: connection attempt timed out
2024-05-24T03:36:54.007Z|283412|reconnect|INFO|tcp:[10.56.64.17]:6641: waiting 4 seconds before reconnect
2024-05-24T03:36:58.008Z|283413|reconnect|INFO|tcp:[10.56.64.18]:6641: connecting...
2024-05-24T03:36:58.008Z|283414|reconnect|INFO|tcp:[10.56.64.18]:6641: connected
2024-05-24T03:36:58.981Z|283415|reconnect|INFO|tcp:[10.56.64.16]:6642: connected
2024-05-24T03:36:58.983Z|283416|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:36:58.984Z|283417|ovsdb_cs|INFO|tcp:[10.56.64.16]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:36:58.984Z|283418|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:37:06.987Z|283419|reconnect|INFO|tcp:[10.56.64.17]:6642: connected
2024-05-24T03:37:06.988Z|283420|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:37:06.988Z|283421|ovsdb_cs|INFO|tcp:[10.56.64.17]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:37:06.988Z|283422|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:37:14.989Z|283423|reconnect|INFO|tcp:[10.56.64.18]:6642: connected
2024-05-24T03:37:23.377Z|283424|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:37:23.867Z|283425|ovn_util|WARN|Dropped 21737 log messages in last 87 seconds (most recently, 35 seconds ago) due to excessive rate
2024-05-24T03:37:23.867Z|283426|ovn_util|WARN|all port tunnel ids exhausted
2024-05-24T03:37:33.870Z|283427|northd|ERR|Dropped 43475 log messages in last 88 seconds (most recently, 44 seconds ago) due to excessive rate
2024-05-24T03:37:33.871Z|283428|northd|ERR|lport fake-pod-7b99c6d54d-r87tb.fake-pod in port group node.kwok.node.545 not found.
2024-05-24T03:37:33.987Z|283429|inc_proc_eng|INFO|node: northd, recompute (forced) took 10610ms
2024-05-24T03:37:35.123Z|283430|inc_proc_eng|INFO|node: lflow, recompute (forced) took 846ms
2024-05-24T03:37:35.362Z|283431|jsonrpc|WARN|tcp:[10.56.64.18]:6642: send error: Broken pipe
2024-05-24T03:37:35.422Z|283432|ovn_northd|INFO|OVNSB commit failed, force recompute next time.
2024-05-24T03:37:35.442Z|283433|timeval|WARN|Unreasonably long 12064ms poll interval (12050ms user, 14ms system)
2024-05-24T03:37:35.442Z|283434|timeval|WARN|faults: 1405 minor, 0 major
2024-05-24T03:37:35.442Z|283435|timeval|WARN|disk: 0 reads, 8 writes
2024-05-24T03:37:35.442Z|283436|timeval|WARN|context switches: 0 voluntary, 12 involuntary
2024-05-24T03:37:35.442Z|283437|poll_loop|INFO|Dropped 3 log messages in last 44 seconds (most recently, 44 seconds ago) due to excessive rate
2024-05-24T03:37:35.444Z|283438|poll_loop|INFO|wakeup due to [POLLIN][POLLHUP] on fd 17 (/var/run/ovn/ovn-northd.595.ctl<->) at ../lib/stream-fd.c:157 (100% CPU usage)
2024-05-24T03:37:35.444Z|283439|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (10.56.64.18:24396<->10.56.64.18:6641) at ../lib/stream-fd.c:157 (100% CPU usage)
2024-05-24T03:37:35.445Z|283440|reconnect|WARN|tcp:[10.56.64.18]:6642: connection dropped (Broken pipe)
2024-05-24T03:37:35.445Z|283441|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:37:35.446Z|283442|poll_loop|INFO|wakeup due to 0-ms timeout at tcp:[10.56.64.18]:6641 (100% CPU usage)
2024-05-24T03:37:35.446Z|283443|reconnect|INFO|tcp:[10.56.64.18]:6641: connection closed by peer
2024-05-24T03:37:36.445Z|283444|poll_loop|INFO|wakeup due to 999-ms timeout at ../lib/reconnect.c:677 (100% CPU usage)
2024-05-24T03:37:36.445Z|283445|reconnect|INFO|tcp:[10.56.64.16]:6642: connecting...
2024-05-24T03:37:36.446Z|283446|poll_loop|INFO|wakeup due to [POLLOUT] on fd 3 (10.56.64.18:27608<->10.56.64.16:6642) at ../lib/stream-fd.c:153 (100% CPU usage)
2024-05-24T03:37:36.446Z|283447|reconnect|INFO|tcp:[10.56.64.16]:6642: connected
2024-05-24T03:37:36.448Z|283448|poll_loop|INFO|wakeup due to 0-ms timeout at ../lib/reconnect.c:677 (100% CPU usage)
2024-05-24T03:37:36.448Z|283449|reconnect|INFO|tcp:[10.56.64.16]:6641: connecting...
2024-05-24T03:37:36.457Z|283450|ovsdb_cs|INFO|tcp:[10.56.64.16]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:37:36.457Z|283451|reconnect|INFO|tcp:[10.56.64.16]:6642: connection attempt timed out
2024-05-24T03:37:36.457Z|283452|reconnect|INFO|tcp:[10.56.64.16]:6642: waiting 2 seconds before reconnect
2024-05-24T03:37:36.457Z|283453|poll_loop|INFO|wakeup due to [POLLOUT] on fd 17 (10.56.64.18:50250<->10.56.64.16:6641) at ../lib/stream-fd.c:153 (100% CPU usage)
2024-05-24T03:37:36.457Z|283454|reconnect|INFO|tcp:[10.56.64.16]:6641: connected
2024-05-24T03:37:36.467Z|283455|ovsdb_cs|INFO|tcp:[10.56.64.16]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:37:36.467Z|283456|reconnect|INFO|tcp:[10.56.64.16]:6641: connection attempt timed out
2024-05-24T03:37:36.467Z|283457|reconnect|INFO|tcp:[10.56.64.16]:6641: waiting 2 seconds before reconnect
2024-05-24T03:37:38.458Z|283458|reconnect|INFO|tcp:[10.56.64.17]:6642: connecting...
2024-05-24T03:37:38.459Z|283459|reconnect|INFO|tcp:[10.56.64.17]:6642: connected
2024-05-24T03:37:38.459Z|283460|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.
2024-05-24T03:37:38.459Z|283461|ovsdb_cs|INFO|tcp:[10.56.64.17]:6642: clustered database server is not cluster leader; trying another server
2024-05-24T03:37:38.459Z|283462|reconnect|INFO|tcp:[10.56.64.17]:6642: connection attempt timed out
2024-05-24T03:37:38.459Z|283463|reconnect|INFO|tcp:[10.56.64.17]:6642: waiting 4 seconds before reconnect
2024-05-24T03:37:38.459Z|283464|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2024-05-24T03:37:38.468Z|283465|reconnect|INFO|tcp:[10.56.64.17]:6641: connecting...
2024-05-24T03:37:38.468Z|283466|reconnect|INFO|tcp:[10.56.64.17]:6641: connected
2024-05-24T03:37:38.468Z|283467|ovsdb_cs|INFO|tcp:[10.56.64.17]:6641: clustered database server is not cluster leader; trying another server
2024-05-24T03:37:38.468Z|283468|reconnect|INFO|tcp:[10.56.64.17]:6641: connection attempt timed out
2024-05-24T03:37:38.468Z|283469|reconnect|INFO|tcp:[10.56.64.17]:6641: waiting 4 seconds before reconnect
2024-05-24T03:37:42.459Z|283470|reconnect|INFO|tcp:[10.56.64.18]:6642: connecting...
2024-05-24T03:37:42.460Z|283471|reconnect|INFO|tcp:[10.56.64.18]:6642: connected
2024-05-24T03:37:42.469Z|283472|reconnect|INFO|tcp:[10.56.64.18]:6641: connecting...
2024-05-24T03:37:42.469Z|283473|reconnect|INFO|tcp:[10.56.64.18]:6641: connected
2024-05-24T03:37:46.215Z|283474|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active. ovsdb-server-nb.log 2024-05-24T03:39:19.604Z|132818|timeval|WARN|Unreasonably long 1064ms poll interval (1055ms user, 7ms system)
2024-05-24T03:39:19.604Z|132819|timeval|WARN|faults: 1275 minor, 0 major
2024-05-24T03:39:19.604Z|132820|timeval|WARN|context switches: 0 voluntary, 3 involuntary
2024-05-24T03:39:19.604Z|132821|reconnect|ERR|tcp:10.56.64.17:29954: no response to inactivity probe after 5.18 seconds, disconnecting
2024-05-24T03:39:23.551Z|132822|timeval|WARN|Unreasonably long 1001ms poll interval (995ms user, 5ms system)
2024-05-24T03:39:23.551Z|132823|timeval|WARN|faults: 1278 minor, 0 major
2024-05-24T03:39:23.551Z|132824|timeval|WARN|context switches: 0 voluntary, 2 involuntary
2024-05-24T03:39:25.467Z|132825|reconnect|ERR|tcp:10.56.64.16:19996: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:39:27.612Z|132826|timeval|WARN|Unreasonably long 1055ms poll interval (1047ms user, 6ms system)
2024-05-24T03:39:27.612Z|132827|timeval|WARN|faults: 846 minor, 0 major
2024-05-24T03:39:27.612Z|132828|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:39:31.655Z|132829|timeval|WARN|Unreasonably long 1006ms poll interval (996ms user, 7ms system)
2024-05-24T03:39:31.655Z|132830|timeval|WARN|faults: 843 minor, 0 major
2024-05-24T03:39:31.655Z|132831|timeval|WARN|context switches: 0 voluntary, 7 involuntary
2024-05-24T03:39:35.719Z|132832|timeval|WARN|Unreasonably long 1064ms poll interval (1060ms user, 4ms system)
2024-05-24T03:39:35.719Z|132833|timeval|WARN|faults: 846 minor, 0 major
2024-05-24T03:39:35.719Z|132834|timeval|WARN|context switches: 0 voluntary, 8 involuntary
2024-05-24T03:39:43.722Z|132835|timeval|WARN|Unreasonably long 1054ms poll interval (1049ms user, 4ms system)
2024-05-24T03:39:43.722Z|132836|timeval|WARN|faults: 812 minor, 0 major
2024-05-24T03:39:43.722Z|132837|timeval|WARN|context switches: 0 voluntary, 5 involuntary
2024-05-24T03:39:43.722Z|132838|reconnect|ERR|tcp:10.56.64.17:29956: no response to inactivity probe after 5.37 seconds, disconnecting
2024-05-24T03:39:47.684Z|132839|timeval|WARN|Unreasonably long 1010ms poll interval (1004ms user, 6ms system)
2024-05-24T03:39:47.685Z|132840|timeval|WARN|faults: 833 minor, 0 major
2024-05-24T03:39:47.685Z|132841|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:39:51.734Z|132842|timeval|WARN|Unreasonably long 1053ms poll interval (1045ms user, 6ms system)
2024-05-24T03:39:51.734Z|132843|timeval|WARN|faults: 833 minor, 0 major
2024-05-24T03:39:51.734Z|132844|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:39:53.735Z|132845|reconnect|ERR|tcp:10.56.64.16:20008: no response to inactivity probe after 5.01 seconds, disconnecting
2024-05-24T03:39:55.704Z|132846|timeval|WARN|Unreasonably long 1017ms poll interval (1012ms user, 5ms system)
2024-05-24T03:39:55.704Z|132847|timeval|WARN|faults: 838 minor, 0 major
2024-05-24T03:39:55.704Z|132848|timeval|WARN|context switches: 0 voluntary, 7 involuntary
2024-05-24T03:39:59.788Z|132849|timeval|WARN|Unreasonably long 1096ms poll interval (1091ms user, 3ms system)
2024-05-24T03:39:59.788Z|132850|timeval|WARN|faults: 834 minor, 0 major
2024-05-24T03:39:59.788Z|132851|timeval|WARN|context switches: 0 voluntary, 2 involuntary
2024-05-24T03:40:01.737Z|132852|reconnect|ERR|tcp:10.56.64.18:24442: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:40:03.731Z|132853|timeval|WARN|Unreasonably long 1033ms poll interval (1027ms user, 7ms system)
2024-05-24T03:40:03.731Z|132854|timeval|WARN|faults: 834 minor, 0 major
2024-05-24T03:40:03.731Z|132855|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:40:07.776Z|132856|timeval|WARN|Unreasonably long 1071ms poll interval (1061ms user, 10ms system)
2024-05-24T03:40:07.776Z|132857|timeval|WARN|faults: 835 minor, 0 major
2024-05-24T03:40:07.776Z|132858|timeval|WARN|context switches: 0 voluntary, 7 involuntary
2024-05-24T03:40:07.776Z|132859|reconnect|ERR|tcp:10.56.64.17:29964: no response to inactivity probe after 5.62 seconds, disconnecting
2024-05-24T03:40:12.777Z|132860|reconnect|ERR|tcp:10.56.64.16:20022: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:40:15.777Z|132861|timeval|WARN|Unreasonably long 1060ms poll interval (1051ms user, 9ms system)
2024-05-24T03:40:15.777Z|132862|timeval|WARN|faults: 794 minor, 0 major
2024-05-24T03:40:15.777Z|132863|timeval|WARN|context switches: 0 voluntary, 2 involuntary
2024-05-24T03:40:15.778Z|132864|coverage|INFO|Dropped 12 log messages in last 56 seconds (most recently, 8 seconds ago) due to excessive rate
2024-05-24T03:40:15.778Z|132865|coverage|INFO|Skipping details of duplicate event coverage for hash=77dac2d1
2024-05-24T03:40:19.726Z|132866|timeval|WARN|Unreasonably long 1002ms poll interval (996ms user, 7ms system)
2024-05-24T03:40:19.726Z|132867|timeval|WARN|faults: 690 minor, 0 major
2024-05-24T03:40:19.726Z|132868|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:40:22.250Z|132869|reconnect|ERR|tcp:10.56.64.18:24458: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:40:23.810Z|132870|timeval|WARN|Unreasonably long 1066ms poll interval (1061ms user, 4ms system)
2024-05-24T03:40:23.810Z|132871|timeval|WARN|faults: 685 minor, 0 major
2024-05-24T03:40:23.810Z|132872|timeval|WARN|context switches: 0 voluntary, 7 involuntary
2024-05-24T03:40:27.770Z|132873|timeval|WARN|Unreasonably long 1020ms poll interval (1015ms user, 5ms system)
2024-05-24T03:40:27.770Z|132874|timeval|WARN|faults: 473 minor, 0 major
2024-05-24T03:40:27.770Z|132875|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:40:31.954Z|132876|timeval|WARN|Unreasonably long 1108ms poll interval (1100ms user, 8ms system)
2024-05-24T03:40:31.954Z|132877|timeval|WARN|faults: 801 minor, 0 major
2024-05-24T03:40:31.954Z|132878|timeval|WARN|context switches: 0 voluntary, 1 involuntary
2024-05-24T03:40:31.954Z|132879|reconnect|ERR|tcp:10.56.64.17:29966: no response to inactivity probe after 5.77 seconds, disconnecting ovsdb-server-sb.log 2024-05-24T03:41:03.258Z|58044|jsonrpc|WARN|tcp:10.16.0.2:35724: receive error: Connection reset by peer
2024-05-24T03:41:03.258Z|58045|reconnect|WARN|tcp:10.16.0.2:35724: connection dropped (Connection reset by peer)
2024-05-24T03:41:05.466Z|58046|reconnect|ERR|tcp:10.56.64.16:24968: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:11.644Z|58047|reconnect|ERR|tcp:10.56.64.17:9754: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:22.055Z|58048|jsonrpc|WARN|tcp:10.16.0.2:35730: receive error: Connection reset by peer
2024-05-24T03:41:22.055Z|58049|reconnect|WARN|tcp:10.16.0.2:35730: connection dropped (Connection reset by peer)
2024-05-24T03:41:27.614Z|58050|reconnect|ERR|tcp:10.56.64.16:24970: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:34.818Z|58051|reconnect|ERR|tcp:10.56.64.17:9756: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:40.054Z|58052|jsonrpc|WARN|tcp:10.16.0.2:35734: receive error: Connection reset by peer
2024-05-24T03:41:40.054Z|58053|reconnect|WARN|tcp:10.16.0.2:35734: connection dropped (Connection reset by peer)
2024-05-24T03:41:41.840Z|58054|reconnect|ERR|tcp:10.56.64.18:8978: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:51.101Z|58055|reconnect|ERR|tcp:10.56.64.16:24972: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:56.430Z|58056|reconnect|ERR|tcp:10.56.64.17:9758: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:41:58.658Z|58057|jsonrpc|WARN|tcp:10.16.0.2:35738: receive error: Connection reset by peer
2024-05-24T03:41:58.658Z|58058|reconnect|WARN|tcp:10.16.0.2:35738: connection dropped (Connection reset by peer)
2024-05-24T03:42:10.905Z|58059|reconnect|ERR|tcp:10.56.64.16:24974: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:16.955Z|58060|jsonrpc|WARN|tcp:10.16.0.2:35742: receive error: Connection reset by peer
2024-05-24T03:42:16.955Z|58061|reconnect|WARN|tcp:10.16.0.2:35742: connection dropped (Connection reset by peer)
2024-05-24T03:42:19.908Z|58062|reconnect|ERR|tcp:10.56.64.17:9760: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:25.899Z|58063|reconnect|ERR|tcp:10.56.64.18:8998: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:34.622Z|58064|reconnect|ERR|tcp:10.56.64.16:24976: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:35.160Z|58065|jsonrpc|WARN|tcp:10.16.0.2:35746: receive error: Connection reset by peer
2024-05-24T03:42:35.160Z|58066|reconnect|WARN|tcp:10.16.0.2:35746: connection dropped (Connection reset by peer)
2024-05-24T03:42:40.127Z|58067|reconnect|ERR|tcp:10.56.64.17:9762: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:49.098Z|58068|reconnect|ERR|tcp:10.56.64.18:9008: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:42:53.460Z|58069|jsonrpc|WARN|tcp:10.16.0.2:35750: receive error: Connection reset by peer
2024-05-24T03:42:53.460Z|58070|reconnect|WARN|tcp:10.16.0.2:35750: connection dropped (Connection reset by peer)
2024-05-24T03:42:54.784Z|58071|reconnect|ERR|tcp:10.56.64.16:24978: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:04.102Z|58072|reconnect|ERR|tcp:10.56.64.17:9764: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:12.859Z|58073|jsonrpc|WARN|tcp:10.16.0.2:35754: receive error: Connection reset by peer
2024-05-24T03:43:12.859Z|58074|reconnect|WARN|tcp:10.16.0.2:35754: connection dropped (Connection reset by peer)
2024-05-24T03:43:18.411Z|58075|reconnect|ERR|tcp:10.56.64.16:24980: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:23.613Z|58076|reconnect|ERR|tcp:10.56.64.17:9766: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:31.133Z|58077|reconnect|ERR|tcp:10.56.64.18:9028: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:31.655Z|58078|jsonrpc|WARN|tcp:10.16.0.2:35758: receive error: Connection reset by peer
2024-05-24T03:43:31.655Z|58079|reconnect|WARN|tcp:10.16.0.2:35758: connection dropped (Connection reset by peer)
2024-05-24T03:43:37.376Z|58080|reconnect|ERR|tcp:10.56.64.16:24982: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:47.256Z|58081|reconnect|ERR|tcp:10.56.64.17:9768: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:43:50.057Z|58082|jsonrpc|WARN|tcp:10.16.0.2:35762: receive error: Connection reset by peer
2024-05-24T03:43:50.057Z|58083|reconnect|WARN|tcp:10.16.0.2:35762: connection dropped (Connection reset by peer)
2024-05-24T03:44:00.688Z|58084|reconnect|ERR|tcp:10.56.64.16:24984: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:44:06.345Z|58085|reconnect|ERR|tcp:10.56.64.17:9770: no response to inactivity probe after 5 seconds, disconnecting
2024-05-24T03:44:08.262Z|58086|jsonrpc|WARN|tcp:10.16.0.2:35766: receive error: Connection reset by peer
2024-05-24T03:44:08.263Z|58087|reconnect|WARN|tcp:10.16.0.2:35766: connection dropped (Connection reset by peer)
2024-05-24T03:44:15.098Z|58088|reconnect|ERR|tcp:10.56.64.18:9046: no response to inactivity probe after 5 seconds, disconnecting |
cmdy
changed the title
[BUG] 大规模集群下3000节点,从4wPod扩容到6wPod时,pod无法正常分配ip
大规模集群下3000节点,从4wPod扩容到6wPod时,pod无法正常分配ip
May 24, 2024
@cmdy https://kubeovn.github.io/docs/v1.12.x/reference/tunnel-protocol/#vxlan vxlan 数量会更少一些,单个 datapath 下 4096 个端口 |
根据 ovn 的架构文档 geneve 单个 datapath 最多支持 2**15 个端口 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Kube-OVN Version
v1.12.11
Kubernetes Version
Client Version: v1.28.8
Server Version: v1.28.8
Operation-system/Kernel Version
"CentOS Linux 7 (Core)"
5.16.20-3.el7.bzl.x86_64
Description
使用 kwok模拟3k节点,当pod 从4w扩容到6w时,Pod IP未正常分配;
kube-ovn-controller 日志
Steps To Reproduce
使用kwok 模拟3k 节点,并将pod 扩容至6w
kube-ovn-controller 资源 8c8Gi
ovn-central 3实例
Current Behavior
Pod IP 未被正常分配
Expected Behavior
Pod IP 可以正常分配
The text was updated successfully, but these errors were encountered: