Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running clj-kondo with --parallel option sometimes fails with "Clj-kondo cache is locked by other thread or process" #2218

Open
mrkam2 opened this issue Nov 21, 2023 · 7 comments

Comments

@mrkam2
Copy link
Contributor

mrkam2 commented Nov 21, 2023

clj-kondo v2023.10.20

I was running the following command:

clj-kondo --lint <563 multiple source paths> --parallel

Platform: Linux

I've got 16 errors like the following one:

<path>/<file>.clj:0:0: error: Can't parse <path>/<file>.clj:0:0, Clj-kondo cache is locked by other thread or process.

Where <path>/<file> belong to different source paths and are random for each run.

Running without --parallel succeeds.

@borkdude
Copy link
Member

Do you have a test project where I can reproduce this? Without a repro there's not much I can do.

@mrkam2
Copy link
Contributor Author

mrkam2 commented Nov 22, 2023 via email

@borkdude
Copy link
Member

I guess you can run the modified version locally and see if you can find out more.

@mrkam2
Copy link
Contributor Author

mrkam2 commented Jan 30, 2024

So I added some logging to clj-kondo.impl.cache/with-cache and saw that pool-1-thread-N threads (N is 1..9 on my computer) compete for the cache lock. They make 1323 attempts to lock the file and 117 times these fail. Out of those failures 102 are the 0th retry (i. e. first try), 12 - at 1st retry, 2 - at 2nd retry, 1 - at 3rd retry:

retry# - count
0 - 102
1 - 12
2 - 2
3 - 1

I suspect that the number of parallel threads (as when run on a powerful CI host) creates increased contention which results in reaching the max retry attempts for some threads. My logging shows that the max number of retries is set to 6. So on my computer with 9 parallel threads it seems unlikely (although possible) to run into this issue. The number of processors is predefined in https://github.com/clj-kondo/clj-kondo/blob/master/src/clj_kondo/impl/core.clj#L364 so there isn't much I can currently do to change this. The number of retries is predefined in code in https://github.com/clj-kondo/clj-kondo/blob/master/src/clj_kondo/impl/analyzer/namespace.clj#L290 and in https://github.com/clj-kondo/clj-kondo/blob/master/src/clj_kondo/impl/cache.clj#L179.

I tried running locally with increased number of threads. With 26 threads, it failed to obtain lock 291 times and the retry counts were also higher:

retry# - count
0 - 194
1 - 64
2 - 17
3 - 8
4 - 4

It seems to me that to resolve the issue we should make either the number of retries higher and/or configurable or the number of parallel threads lower and/or configurable.

Thoughts?

@borkdude
Copy link
Member

I don't know the exact answer of what tuning will be best, but maybe we could make threads configurable at least.

@mrkam2
Copy link
Contributor Author

mrkam2 commented Jan 31, 2024

I don't know the exact answer of what tuning will be best, but maybe we could make threads configurable at least.

I'd say we never want this to fail, so we probably should increase the number of retries. We could also potentially create a queue of cache updates so that they're guaranteed to execute in the order they happen and no thread is delayed too much.

@borkdude
Copy link
Member

I like the queue idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants