Possible improvements to SegQueue #794

SUPERCILEX · 2022-03-03T03:07:04Z

I've been trying to understand the SegQueue implementation to reply to #675. These are some notes I took along the way:

crossbeam/crossbeam-queue/src/seg_queue.rs

Line 67 in b11f1a8

unsafe { MaybeUninit::zeroed().assume_init() }

Might be UB? Need to know whether or not padding bytes within the structs being uninitialized (even if zeroed) counts as UB. Also seems like you could achieve better performance by using actually uninit memory in the slot value instead of zeroing it out.
SegQueue::new should pre-allocate a block to match the Injector implementation and remove some initialization code from push and pop.
Internally document the memory layout and the magic of the head and tail indexes. Basically, the index skips over one value which is used as a fence to perform block allocation/deallocation. So if LAP=2, BLOCK_CAP=1, and tail=0, two competing threads will result in one of them writing its value into slot 0, bumping tail to 1, and then allocating a new block and bumping tail to 2. The other thread will see that its CAS failed and get a tail of 1 which means it spins/yields until tail is 2. Same but opposite for head.
Extract out offset + 1 == BLOCK_CAP into a variable with the explanation from the previous bullet point.
crossbeam/crossbeam-queue/src/seg_queue.rs

Line 206 in b11f1a8

next_block = Some(Box::new(Block::<T>::new()));

This can double allocate a block under contention that gets immidately discarded. Add block cache to SegQueue-alikes #746 offers a very nice opportunity to throw the extra block back into the pool.
crossbeam/crossbeam-queue/src/seg_queue.rs

Line 229 in b11f1a8

let new_tail = tail + (1 << SHIFT);

crossbeam/crossbeam-queue/src/seg_queue.rs

Line 297 in b11f1a8

let mut new_head = head + (1 << SHIFT);

Pretty sure this should use wrapping_add.
crossbeam/crossbeam-queue/src/seg_queue.rs

Line 300 in b11f1a8

atomic::fence(Ordering::SeqCst);

I don't understand why this is necessary, will have to think on it more.
There's a bunch of LAP - 1 that would be clearer as BLOCK_CAP
All shifting of tail should be able to be removed as no metadata is stored in the tail index. The one issue would be making sure head and tail wrap around at the same rate, so maybe tail could be shifted before addition and then unshifted again.

I'm planning on looking into this stuff so that Tokio can use SegQueue: tokio-rs/tokio#2528. Given #746's lack of progress, my main concern is that any opened PR would just get ignored, so Tokio might end up copying SegQueue with its own fixes (which would be a bummer).

The text was updated successfully, but these errors were encountered:

taiki-e · 2022-07-22T03:15:16Z

Thanks for the suggestions and sorry for the late reply.

Might be UB? Need to know whether or not padding bytes within the structs being uninitialized (even if zeroed) counts as UB.

IIUC, there is no problem with zero-initializing a type that all fields are zeroable but contain padding. However, since the padding is undef, it is UB to inspect it even if it is zero-initialized.

Also seems like you could achieve better performance by using actually uninit memory in the slot value instead of zeroing it out.

AFAIK, in Block::new, both cases will be lowered to memset on many platforms, so performance will not change. (codegen comparison with concurrent-queue that has adopted that approach for a long time).

SegQueue::new should pre-allocate a block to match the Injector implementation and remove some initialization code from push and pop.

I'm torn on this because in some cases users expect no allocation at initialization (like Vec::new).

Internally document the memory layout and the magic of the head and tail indexes.

Extract out offset + 1 == BLOCK_CAP into a variable with the explanation from the previous bullet point.

Pretty sure this should use wrapping_add.

There's a bunch of LAP - 1 that would be clearer as BLOCK_CAP

+1

Given #746 lack of progress

Helps for review are welcome!
(The maintainers currently lack the bandwidth to review PRs with large diffs. However, we can omit some of the reviews on our part if there is a proper review by the community.)

Tokio might end up copying SegQueue with its own fixes

It is hard to add new dependencies to tokio (I remember it took a long time to get approved to use socket2), I would not expect them to add crossbeam-queue as dependencies even if SegQueue is improved.

Also, loom support is required for use with tokio.
Also, MPSC seems to be the better for tokio's use case. (When properly implemented, theoretically it is fewer atomic operations than MPMC.)

taiki-e added crossbeam-queue crossbeam-channel labels Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible improvements to SegQueue #794

Possible improvements to SegQueue #794

SUPERCILEX commented Mar 3, 2022

taiki-e commented Jul 22, 2022

Possible improvements to SegQueue #794

Possible improvements to SegQueue #794

Comments

SUPERCILEX commented Mar 3, 2022

taiki-e commented Jul 22, 2022