Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmagent k8s target discovery is too slow #6270

Open
aluode99 opened this issue May 14, 2024 · 4 comments
Open

vmagent k8s target discovery is too slow #6270

aluode99 opened this issue May 14, 2024 · 4 comments
Labels
enhancement New feature or request k8s Kubernetes related issue

Comments

@aluode99
Copy link
Contributor

aluode99 commented May 14, 2024

Is your feature request related to a problem? Please describe

When there are many configured jobs (about 100), vmagent discovers targets very slowly in serial, resulting in no data collection by vmagent for more than half an hour.
image
Since each instance in the vmagent cluster needs to discover all the collection targets before sharding, horizontal scaling cannot solve the problem of slow service discovery.

Describe the solution you'd like

  • Can service discovery sharding be added to resolve the performance bottleneck in service discovery?
  • Can concurrent service discovery be added?

Describe alternatives you've considered

No response

Additional information

No response

@aluode99 aluode99 added the enhancement New feature or request label May 14, 2024
@AndrewChubatiuk
Copy link
Contributor

hey @aluode99
this log record appears during scrape configs reloading
could you please share info about CPU, memory usage?
which VMAgent version are you using?
could you please describe a setup you're running it in?

@aluode99
Copy link
Contributor Author

hi @AndrewChubatiuk thank you for your reply.The detailed configuration for vmagent is as follows:
version:v1.96.0
cpu: 18c
memory: 16G
cluster membersCount: 19
cluster replicationFactor: 1
CPU utilization rate:
image

The use case involves loading kubernetes_sd_configs through a sidecar, and then invoking vmagent reload to load the configuration. When the pod starts, kubernetes_sd_configs is empty,so the service discovery takes 0 seconds. After the sidecar loads the configuration, vmagent reloads and the service discovery takes 2061 seconds.
image

Due to the kubernetes_sd_configs being empty at startup, the startup process remains blocked at the code checkpoint 1 and does not proceed to the reload process at checkpoint 2. As a result, vmagent does not incrementally load the configuration to gradually activate the collection tasks. Instead, it spends 2061 seconds to complete the discovery of all targets before beginning the collection tasks, leading to a 2061-second period without data collection.

9b10cb899a474bc56762d2ef430e0d03

@AndrewChubatiuk
Copy link
Contributor

How much time takes the next configuration update after initial one?
Could you please share information about etcd and kube api request duration?

@aluode99
Copy link
Contributor Author

aluode99 commented May 15, 2024

image

I have compiled the duration of some reloads, with a total time of about 7 minutes. The shortest duration was 0.002 seconds, and the longest was 1.139 seconds. The detailed durations are as follows:

|count |time(s)|
|------|------|
| 30 | 0.002 |
| 32 | 0.003 |
|104 | 0.004 |
|206 | 0.005 |
|149 | 0.006 |
|131 | 0.007 |
|177 | 0.008 |
| 83 | 0.009 |
|152 | 0.010 |
| 83 | 0.011 |
| 60 | 0.012 |
| 75 | 0.013 |
|114 | 0.014 |
|124 | 0.015 |
| 88 | 0.016 |
| 72 | 0.017 |
| 96 | 0.018 |
|110 | 0.019 |
| 55 | 0.020 |
| 48 | 0.021 |
| 80 | 0.022 |
| 68 | 0.023 |
|106 | 0.024 |
|115 | 0.025 |
| 86 | 0.026 |
|113 | 0.027 |
| 64 | 0.028 |
| 88 | 0.029 |
|102 | 0.030 |
|100 | 0.031 |
|105 | 0.032 |
| 87 | 0.033 |
| 99 | 0.034 |
|116 | 0.035 |
| 81 | 0.036 |
| 61 | 0.037 |
| 74 | 0.038 |
| 59 | 0.039 |
| 58 | 0.040 |
| 67 | 0.041 |
| 69 | 0.042 |
| 66 | 0.043 |
| 74 | 0.044 |
| 72 | 0.045 |
| 62 | 0.046 |
| 66 | 0.047 |
| 74 | 0.048 |
| 35 | 0.049 |
| 44 | 0.050 |
| 36 | 0.051 |
| 44 | 0.052 |
| 44 | 0.053 |
| 33 | 0.054 |
| 44 | 0.055 |
| 39 | 0.056 |
| 44 | 0.057 |
| 41 | 0.058 |
| 48 | 0.059 |
| 40 | 0.060 |
| 36 | 0.061 |
| 29 | 0.062 |
| 32 | 0.063 |
| 28 | 0.064 |
| 23 | 0.065 |
| 29 | 0.066 |
| 41 | 0.067 |
| 31 | 0.068 |
| 22 | 0.069 |
| 40 | 0.070 |
| 25 | 0.071 |
| 30 | 0.072 |
| 33 | 0.073 |
| 27 | 0.074 |
| 41 | 0.075 |
| 33 | 0.076 |
| 30 | 0.077 |
| 15 | 0.078 |
| 35 | 0.079 |
| 22 | 0.080 |
| 23 | 0.081 |
| 16 | 0.082 |
| 16 | 0.083 |
| 15 | 0.084 |
| 24 | 0.085 |
| 24 | 0.086 |
| 22 | 0.087 |
| 20 | 0.088 |
| 27 | 0.089 |
| 28 | 0.090 |
| 23 | 0.091 |
| 22 | 0.092 |
| 20 | 0.093 |
| 12 | 0.094 |
| 12 | 0.095 |
| 11 | 0.096 |
| 11 | 0.097 |
|  8 | 0.098 |
| 11 | 0.099 |
|  6 | 0.100 |
|  6 | 0.101 |
| 10 | 0.102 |
| 17 | 0.103 |
| 15 | 0.104 |
| 14 | 0.105 |
| 12 | 0.106 |
|  7 | 0.107 |
| 14 | 0.108 |
| 11 | 0.109 |
|  8 | 0.110 |
|  7 | 0.111 |
|  2 | 0.112 |
|  5 | 0.113 |
|  3 | 0.114 |
|  5 | 0.115 |
|  5 | 0.116 |
|  3 | 0.117 |
|  7 | 0.118 |
|  4 | 0.119 |
|  6 | 0.120 |
|  3 | 0.121 |
|  6 | 0.122 |
|  5 | 0.123 |
|  8 | 0.124 |
|  7 | 0.125 |
|  4 | 0.126 |
|  9 | 0.127 |
|  4 | 0.128 |
|  9 | 0.129 |
|  6 | 0.130 |
|  8 | 0.131 |
|  5 | 0.132 |
| 14 | 0.133 |
|  9 | 0.134 |
|  7 | 0.135 |
|  6 | 0.136 |
|  8 | 0.137 |
|  5 | 0.138 |
|  8 | 0.139 |
|  6 | 0.140 |
|  5 | 0.141 |
|  8 | 0.142 |
|  7 | 0.143 |
|  6 | 0.144 |
|  5 | 0.145 |
|  4 | 0.146 |
|  5 | 0.147 |
|  5 | 0.148 |
|  4 | 0.149 |
|  7 | 0.150 |
|  8 | 0.151 |
|  3 | 0.152 |
|  2 | 0.153 |
|  1 | 0.154 |
|  2 | 0.155 |
|  3 | 0.156 |
|  6 | 0.157 |
|  3 | 0.158 |
|  6 | 0.159 |
|  4 | 0.160 |
|  3 | 0.161 |
|  5 | 0.162 |
|  4 | 0.164 |
|  3 | 0.165 |
|  4 | 0.166 |
|  3 | 0.168 |
|  2 | 0.169 |
|  1 | 0.170 |
|  2 | 0.171 |
|  1 | 0.174 |
|  1 | 0.175 |
|  1 | 0.176 |
|  1 | 0.178 |
|  1 | 0.181 |
|  1 | 0.183 |
|  2 | 0.184 |
|  1 | 0.185 |
|  1 | 0.193 |
|  2 | 0.199 |
|  1 | 0.206 |
|  1 | 0.207 |
|  2 | 0.211 |
|  1 | 0.213 |
|  1 | 0.214 |
|  1 | 0.227 |
|  1 | 0.256 |
|  1 | 0.257 |
|  1 | 0.269 |
|  1 | 0.586 |
|  1 | 0.605 |
|  1 | 0.617 |
|  1 | 0.785 |
|  1 | 0.803 |
|  1 |1.085  |
| 1  |1.108   |
| 1  |1.139   |

@denisgolius denisgolius added the k8s Kubernetes related issue label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request k8s Kubernetes related issue
Projects
None yet
Development

No branches or pull requests

3 participants