Batch applying events to kqueue #449

panjf2000 · 2024-05-07T04:11:28Z

kqueue has the capability of batch applying events:

The kevent,() kevent64() and kevent_qos() system calls are used to
register events with the queue, and return any pending events to the
user. The changelist argument is a pointer to an array of kevent,
kevent64_s or kevent_qos_s structures, as defined in <sys/event.h>. All
changes contained in the changelist are applied before any pending events
are read from the queue. The nchanges argument gives the size of
changelist.

This PR implements this functionality for kqueue with which we're able to reduce plenty of system calls of kevent(2).

codecov · 2024-05-07T04:21:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.16%. Comparing base (a0aebb6) to head (cf36e8b).
Report is 5 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #449      +/-   ##
============================================
- Coverage     70.17%   70.16%   -0.02%     
============================================
  Files           109      109              
  Lines         59904    59883      -21     
============================================
- Hits          42039    42015      -24     
- Misses        17865    17868       +3

see 24 files with indirect coverage changes

madolson · 2024-05-08T00:13:01Z

Since we don't automatically test on FreeBSD: https://github.com/valkey-io/valkey/actions/runs/8994018097

panjf2000 · 2024-05-08T03:25:11Z

Since we don't automatically test on FreeBSD: valkey-io/valkey/actions/runs/8994018097

Thanks, I see it's succeeded. Just out of curiosity, why didn't we set up a macOS runner that would run by default?

madolson · 2024-05-08T15:40:08Z

Thanks, I see it's succeeded. Just out of curiosity, why didn't we set up a macOS runner that would run by default?

Runner execution time was the historical reason, we didn't want to burn resources on running all tests, since we have a daily set of tests that are slightly more comprehensive.

src/ae_kqueue.c

kqueue has the capability of batch applying events. This PR implements this functionality for kqueue with which we're able to reduce plenty of system calls of `kevent(2)`. --------- Signed-off-by: Andy Pan <[email protected]> --------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-09T13:52:23Z

Hi @madolson, any new thoughts on this?

madolson · 2024-05-10T02:09:12Z

@panjf2000 It makes sense to me, I just haven't had a sense to walk through it and convince myself the code is correct. Hopefully tomorrow?

panjf2000 · 2024-05-10T02:18:37Z

@panjf2000 It makes sense to me, I just haven't had a sense to walk through it and convince myself the code is correct. Hopefully tomorrow?

Sure, take your time. I was asking only to ensure that you'd like to continue this reveiwing.

madolson

Do we expect this to make the code more performant?

src/ae_kqueue.c

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-13T01:38:28Z

Do we expect this to make the code more performant?

This PR is expected to conserve plenty of extra system calls to kevent(2), which should have a significant benefit to the performance.

madolson · 2024-05-13T17:41:07Z

which should have a significant benefit to the performance.

Would you mind using valkey-benchmark -r 10000000 -d 100 -n 10000000 with and without to change to compare the performance? I'm not very happy with the odd failure modes, so trying to understand how much performance we're really going to see.

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-14T13:57:41Z

which should have a significant benefit to the performance.

Would you mind using valkey-benchmark -r 10000000 -d 100 -n 10000000 with and without to change to compare the performance? I'm not very happy with the odd failure modes, so trying to understand how much performance we're really going to see.

Environment

 Model : Mac Studio (2022)
    OS : macOS 14.1.2
   CPU : Apple M1 Max, 10-core CPU with 8 performance cores and 2 efficiency cores
Memory : 32GB unified memory

Benchmark command

# With RDB and AOF disabled on the server.
valkey-benchmark -r 10000000 -d 100 -n 10000000 -q

Benchmark results

valkey-io:unstable

PING_INLINE: 141031.80 requests per second, p50=0.175 msec
PING_MBULK: 164306.14 requests per second, p50=0.159 msec
SET: 160627.09 requests per second, p50=0.191 msec
GET: 156184.11 requests per second, p50=0.175 msec
INCR: 161022.81 requests per second, p50=0.167 msec
LPUSH: 164573.83 requests per second, p50=0.167 msec
RPUSH: 164098.53 requests per second, p50=0.167 msec
LPOP: 163655.41 requests per second, p50=0.167 msec
RPOP: 162211.27 requests per second, p50=0.167 msec
SADD: 160130.66 requests per second, p50=0.167 msec
HSET: 151906.42 requests per second, p50=0.191 msec
SPOP: 158737.72 requests per second, p50=0.175 msec
ZADD: 127665.01 requests per second, p50=0.319 msec
ZPOPMIN: 158866.33 requests per second, p50=0.175 msec
LPUSH (needed to benchmark LRANGE): 164546.77 requests per second, p50=0.167 msec
LRANGE_100 (first 100 elements): 86594.33 requests per second, p50=0.311 msec
LRANGE_300 (first 300 elements): 36351.34 requests per second, p50=0.663 msec
LRANGE_500 (first 500 elements): 22212.60 requests per second, p50=0.751 msec
LRANGE_600 (first 600 elements): 19722.66 requests per second, p50=0.839 msec
MSET (10 keys): 54174.70 requests per second, p50=0.839 msec
XADD: 157696.38 requests per second, p50=0.175 msec

panjf2000:kqueue-batch

PING_INLINE: 159212.86 requests per second, p50=0.159 msec
PING_MBULK: 184396.38 requests per second, p50=0.143 msec
SET: 173722.70 requests per second, p50=0.207 msec
GET: 179742.97 requests per second, p50=0.159 msec
INCR: 182832.06 requests per second, p50=0.159 msec
LPUSH: 174146.25 requests per second, p50=0.151 msec
RPUSH: 174794.62 requests per second, p50=0.151 msec
LPOP: 173250.17 requests per second, p50=0.151 msec
RPOP: 175260.27 requests per second, p50=0.151 msec
SADD: 181957.12 requests per second, p50=0.159 msec
HSET: 174380.08 requests per second, p50=0.175 msec
SPOP: 186985.80 requests per second, p50=0.159 msec
ZADD: 139279.66 requests per second, p50=0.303 msec
ZPOPMIN: 182648.41 requests per second, p50=0.151 msec
LPUSH (needed to benchmark LRANGE): 180554.31 requests per second, p50=0.151 msec
LRANGE_100 (first 100 elements): 93919.64 requests per second, p50=0.287 msec
LRANGE_300 (first 300 elements): 37977.62 requests per second, p50=0.639 msec
LRANGE_500 (first 500 elements): 23096.98 requests per second, p50=0.719 msec
LRANGE_600 (first 600 elements): 20357.48 requests per second, p50=0.815 msec
MSET (10 keys): 54659.74 requests per second, p50=0.823 msec
XADD: 170558.23 requests per second, p50=0.159 msec

@madolson

madolson · 2024-05-14T16:55:47Z

Ok, those performance numbers are compelling, although I'm a little surprised at the result. During the typical command flow, we shouldn't call any of these functions, so I'm surprised to see so much improvement.

panjf2000 · 2024-05-17T03:43:39Z

The benchmark for commands of retrieving data with pipeline could call aeApiAddEvent for AE_WRITABLE. Posting these numbers for another reference.

./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q

valkey-io:unstable:

GET: 3898635.50 requests per second, p50=1.151 msec
LPOP: 3837298.50 requests per second, p50=1.183 msec
RPOP: 3752345.25 requests per second, p50=1.215 msec
SPOP: 3979307.50 requests per second, p50=1.135 msec
ZPOPMIN: 3790750.50 requests per second, p50=1.199 msec

panjf2000:kqueue-batch:

GET: 4093327.75 requests per second, p50=1.095 msec
LPOP: 3971406.00 requests per second, p50=1.143 msec
RPOP: 4127115.00 requests per second, p50=1.095 msec
SPOP: 4152824.00 requests per second, p50=1.095 msec
ZPOPMIN: 3977724.75 requests per second, p50=1.143 msec

@madolson

madolson · 2024-05-19T19:18:04Z

Ok, so I'm still worried about the panic and how it might impact production systems for edge cases related to kqueue failures. I would like to get the performance improvement, but don't want that to come at the cost of potential failures. Would you be happy with this being some type of build flag so that folks have to opt-in to it at build time?

panjf2000 · 2024-05-19T22:43:00Z

Ok, so I'm still worried about the panic and how it might impact production systems for edge cases related to kqueue failures. I would like to get the performance improvement, but don't want that to come at the cost of potential failures. Would you be happy with this being some type of build flag so that folks have to opt-in to it at build time?

I'm OK with that. But I'm not sure how you plan on doing that. The first way I can think of is to add a new configuration in valkey.conf. Is this what you had in mind? @madolson

panjf2000 · 2024-05-19T22:51:02Z

Another approach could be Makefile flags, something like BUILD_TLS or USE_SYSTEMD. This looks more feasible.

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-20T01:29:25Z

@madolson Please check out the latest commit.

PingXie · 2024-05-20T03:22:41Z

@panjf2000, do you mind counting the number of kevent calls before and after this change for a given workload (./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q would be fine too)? My understanding is that we only redister/de-register events on connection establishment and tear-down. It will be really helpful if you could show direct evidence that # of kevent calls was indeed significant and greatly reduced.

panic() on failure makes sense to me. Not being able to register/de-register events is a critical error IMO. The existing error handling of either ignoring the failure or trying to revert the change doesn't make much sense to me. That said, this is indeed a behavior/contract change for the event loop so introducing it with a build macro at the moment is safer. We should also update the README.md to make the behavior change more explicit. I think it is a bit too early to have it controlled by a server config.

src/ae_kqueue.c

panjf2000 · 2024-05-20T03:41:43Z

@panjf2000, do you mind counting the number of kevent calls before and after this change for a given workload (./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q would be fine too)? My understanding is that we only redister/de-register events on connection establishment and tear-down. It will be really helpful if you could show direct evidence that # of kevent calls was indeed significant and greatly reduced.

panic() on failure makes sense to me. Not being able to register/de-register events is a critical error IMO. The existing error handling of either ignoring the failure or trying to revert the change doesn't make much sense to me. That said, this is indeed a behavior/contract change for the event loop so introducing it with a build macro at the moment is safer. We should also update the README.md to make the behavior change more explicit. I think it is a bit too early to have it controlled by a server config.

Not only do we register and unregister events for connections and disconnections, we also do that with AE_WRITABLE
when there is (no) pending data to write.

As for the statistics of the system calls to kevent, I've tried to count the system calls of kevent before, but the equivalent of strace on macOS -- dtruss is hard to use with SIP enabled, and the other one ktrace seems to be broken and deprecated. I guess I can try again later...

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-20T13:54:05Z

Benchmark command

./valkey-benchmark -r 10000000 -d 1024 -n 10000000 -P 100 -t get,lpop,rpop,spop,zpopmin -q

Branch: valkey-io:unstable

CALL                                        COUNT
…
accept                                         64
fcntl                                         174
setsockopt                                    263
kevent                                       1227
read                                        22255
write                                       22257

GET: 3496503.50 requests per second, p50=1.223 msec
LPOP: 3617945.00 requests per second, p50=1.191 msec
RPOP: 3617945.00 requests per second, p50=1.191 msec
SPOP: 3553660.50 requests per second, p50=1.207 msec
ZPOPMIN: 3673769.50 requests per second, p50=1.175 msec

Branch: panjf2000:kqueue-batch

CALL                                        COUNT
…
accept                                         67
fcntl                                         174
setsockopt                                    263
kevent                                        903
read                                        21085
write                                       21114

GET: 3834355.75 requests per second, p50=1.135 msec
LPOP: 3815337.50 requests per second, p50=1.151 msec
RPOP: 3675119.50 requests per second, p50=1.207 msec
SPOP: 3707823.50 requests per second, p50=1.191 msec
ZPOPMIN: 3732736.25 requests per second, p50=1.159 msec

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 · 2024-05-21T03:23:17Z

Ping @madolson @PingXie

madolson · 2024-05-22T00:22:44Z

@PingXie Related to your comment about event handling, I think we should look more seriously into making sure that connSet(Read|Write)Handler always succeed. Once that is done, we can update the kqueue batch handler. I documented my thoughts here: #528.

panjf2000 · 2024-05-22T03:51:08Z

Once that is done, we can update the kqueue batch handler.

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right? If so, this PR is good to go? @madolson

panjf2000 · 2024-05-22T04:00:40Z

Or, did you mean that we need to fix #528 by changing the return type in the function signatures of connSet(Read/Write) from int to void (option 2 in #528, it seems more pragmatic) before we move forward with this PR? If that is it, I think I can take on that and open a new PR for #528. Or say, we do it in this PR directly, which can avoid some conflicts between PRs.

madolson · 2024-05-22T16:00:49Z

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right?

Yeah, this one. I think we can start with the USE_KQUEUE_BATCH macro and remove the USE_KQUEUE_BATCH when we solve some of the issues with aeCreateFileEvent sometimes returning an error. If it never throws an error, it should be safe to always batch the updates.

panjf2000 · 2024-05-22T16:08:56Z

To make it clearer, by "update the kqueue batch handler", you meant removing the USE_KQUEUE_BATCH macro in this PR and make it the only code path, right?

Yeah, this one. I think we can start with the USE_KQUEUE_BATCH macro and remove the USE_KQUEUE_BATCH when we solve some of the issues with aeCreateFileEvent sometimes returning an error. If it never throws an error, it should be safe to always batch the updates.

Great! Then I guess we can continue the code review here? @madolson

panjf2000 · 2024-05-26T11:58:07Z

Ping @madolson

madolson · 2024-05-26T17:38:54Z

@panjf2000 Thanks for you patience, just been chasing other stuff. Will finish and review now!

madolson · 2024-05-26T19:28:11Z

@panjf2000 Ok. I just can't convince myself this is a good change to take right now. We know we don't handle edge cases well, so I'm not able to really convince myself that this configuration will be stable enough for someone to enable in production. I think we should at least address #528, and then maybe think through more about the API contract for the event loop.

We can leave this open and merge at a later date. I just don't really feel comfortable merging it now.

zuiderkwast · 2024-05-26T22:03:40Z

Sometimes I've wondered why Salvatore wrote his own event lib instead of using an existing well-tested one (like libev) and the answer is in this ancient thread: https://groups.google.com/g/redis-db/c/tSgU6e8VuNA

Perhaps it's time to re-evaluate this? The situation may have changed after 15 years.

panjf2000 · 2024-05-27T01:01:47Z

@panjf2000 Ok. I just can't convince myself this is a good change to take right now. We know we don't handle edge cases well, so I'm not able to really convince myself that this configuration will be stable enough for someone to enable in production. I think we should at least address #528, and then maybe think through more about the API contract for the event loop.

We can leave this open and merge at a later date. I just don't really feel comfortable merging it now.

I thought the main reason why we introduced the USE_KQUEUE_BATCH macro was to alleviate this kind of concern? IMO, It's more like an experimental optimization if we choose to control it via a build macro instead of an official server configuration, also it's disabled by default. Besides, those edge cases were already happening way before this PR, so we don't introduce more issues here, if anything, enabling this feature may actually expose these unbeknown bugs that should be fixed ASAP. @madolson

--------- Signed-off-by: Andy Pan <[email protected]>

madolson · 2024-05-31T16:33:20Z

I thought the main reason why we introduced the USE_KQUEUE_BATCH macro was to alleviate this kind of concern? IMO, It's more like an experimental optimization if we choose to control it via a build macro instead of an official server configuration, also it's disabled by default. Besides, those edge cases were already happening way before this PR, so we don't introduce more issues here, if anything, enabling this feature may actually expose these unbeknown bugs that should be fixed ASAP.

You are right. I think the difference is do we merge this now, or do we do some work to improve the status quo and then enable it when we have higher confidence. Do you plan on running this patch in production? Maybe that would build my confidence if you go and test it and validate it works as expected.

Perhaps it's time to re-evaluate this? The situation may have changed after 15 years.

I think the event loop is one of pieces of code we should bias towards owning since it's so deeply connected with the core of the engine and performance. But if someone else has solved these problems, it might be worth it.

panjf2000 · 2024-05-31T17:24:53Z

Well, I've been running it with this optimization enabled on a LAN, serving a few local services, so far so good, working as usual. I just reckon that for an experimental optimization/feature, some small early trial could help us collect data and gather some feedback, to improve that optimization/feature afterward.

panjf2000 force-pushed the kqueue-batch branch from 30e764f to f1b5a6a Compare May 7, 2024 04:12

panjf2000 force-pushed the kqueue-batch branch from f1b5a6a to 00b9dbc Compare May 7, 2024 05:45

madolson reviewed May 8, 2024

View reviewed changes

src/ae_kqueue.c Outdated Show resolved Hide resolved

panjf2000 force-pushed the kqueue-batch branch from 00b9dbc to 9b1c5e9 Compare May 9, 2024 02:39

panjf2000 requested a review from madolson May 9, 2024 02:48

madolson reviewed May 12, 2024

View reviewed changes

src/ae_kqueue.c Outdated Show resolved Hide resolved

src/ae_kqueue.c Outdated Show resolved Hide resolved

src/ae_kqueue.c Outdated Show resolved Hide resolved

src/ae_kqueue.c Outdated Show resolved Hide resolved

Resolve comments

1f558fe

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 requested a review from madolson May 13, 2024 01:38

Panic in aeApiAddEvent when error occurs

c1a0c69

--------- Signed-off-by: Andy Pan <[email protected]>

Enable it by Makefile flags

de00908

--------- Signed-off-by: Andy Pan <[email protected]>

PingXie reviewed May 20, 2024

View reviewed changes

src/ae_kqueue.c Outdated Show resolved Hide resolved

src/ae_kqueue.c Outdated Show resolved Hide resolved

Refine the code

026122c

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 requested a review from PingXie May 20, 2024 13:54

Update the README

2bcc6f3

--------- Signed-off-by: Andy Pan <[email protected]>

panjf2000 force-pushed the kqueue-batch branch from e5931fd to 2bcc6f3 Compare May 20, 2024 14:20

Merge branch 'unstable' into kqueue-batch

31427eb

Fix a typo

cf36e8b

--------- Signed-off-by: Andy Pan <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch applying events to kqueue #449

Batch applying events to kqueue #449

panjf2000 commented May 7, 2024

codecov bot commented May 7, 2024 •

edited

madolson commented May 8, 2024

panjf2000 commented May 8, 2024

madolson commented May 8, 2024

panjf2000 commented May 9, 2024

madolson commented May 10, 2024

panjf2000 commented May 10, 2024

madolson left a comment

panjf2000 commented May 13, 2024

madolson commented May 13, 2024

panjf2000 commented May 14, 2024

madolson commented May 14, 2024

panjf2000 commented May 17, 2024

madolson commented May 19, 2024

panjf2000 commented May 19, 2024

panjf2000 commented May 19, 2024

panjf2000 commented May 20, 2024

PingXie commented May 20, 2024

panjf2000 commented May 20, 2024 •

edited

panjf2000 commented May 20, 2024

panjf2000 commented May 21, 2024

madolson commented May 22, 2024

panjf2000 commented May 22, 2024 •

edited

panjf2000 commented May 22, 2024 •

edited

madolson commented May 22, 2024

panjf2000 commented May 22, 2024

panjf2000 commented May 26, 2024

madolson commented May 26, 2024

madolson commented May 26, 2024

zuiderkwast commented May 26, 2024

panjf2000 commented May 27, 2024 •

edited

madolson commented May 31, 2024

panjf2000 commented May 31, 2024

Batch applying events to kqueue #449

Are you sure you want to change the base?

Batch applying events to kqueue #449

Conversation

panjf2000 commented May 7, 2024

codecov bot commented May 7, 2024 • edited

Codecov Report

madolson commented May 8, 2024

panjf2000 commented May 8, 2024

madolson commented May 8, 2024

panjf2000 commented May 9, 2024

madolson commented May 10, 2024

panjf2000 commented May 10, 2024

madolson left a comment

Choose a reason for hiding this comment

panjf2000 commented May 13, 2024

madolson commented May 13, 2024

panjf2000 commented May 14, 2024

Environment

Benchmark command

Benchmark results

valkey-io:unstable

panjf2000:kqueue-batch

madolson commented May 14, 2024

panjf2000 commented May 17, 2024

madolson commented May 19, 2024

panjf2000 commented May 19, 2024

panjf2000 commented May 19, 2024

panjf2000 commented May 20, 2024

PingXie commented May 20, 2024

panjf2000 commented May 20, 2024 • edited

panjf2000 commented May 20, 2024

Benchmark command

Branch: valkey-io:unstable

Branch: panjf2000:kqueue-batch

panjf2000 commented May 21, 2024

madolson commented May 22, 2024

panjf2000 commented May 22, 2024 • edited

panjf2000 commented May 22, 2024 • edited

madolson commented May 22, 2024

panjf2000 commented May 22, 2024

panjf2000 commented May 26, 2024

madolson commented May 26, 2024

madolson commented May 26, 2024

zuiderkwast commented May 26, 2024

panjf2000 commented May 27, 2024 • edited

madolson commented May 31, 2024

panjf2000 commented May 31, 2024

codecov bot commented May 7, 2024 •

edited

panjf2000 commented May 20, 2024 •

edited

panjf2000 commented May 22, 2024 •

edited

panjf2000 commented May 22, 2024 •

edited

panjf2000 commented May 27, 2024 •

edited