Automatic offset tracking for stream queues #661

viktorerlingsson · 2024-04-17T09:42:09Z

WHAT is this pull request doing?

Adds broker tracking of consumer offsets in streams if no x-stream-offset is provided by the consumer. Does not track if the consumer tag is generated by the broker.

✅ When to run cleanup_consumer_offsets?
✅ IndexError when trying to cleanup if msg_size => segment_size

HOW can this pull request be tested?

Run specs

src/lavinmq/mfile.cr

src/lavinmq/queue/stream_queue_message_store.cr

spec/stream_queue_spec.cr

src/lavinmq/client/channel/stream_consumer.cr

spec/stream_queue_spec.cr

carlhoerberg

What happens with consumer_offset_capacity is reached? MFile doesn't auto expand.

src/lavinmq/queue/stream_queue_message_store.cr

src/lavinmq/client/channel/stream_consumer.cr

src/lavinmq/mfile.cr

src/lavinmq/queue/stream_queue_message_store.cr

viktorerlingsson · 2024-06-17T12:35:48Z

What happens with consumer_offset_capacity is reached? MFile doesn't auto expand.

Fixed here

src/lavinmq/queue/stream_queue_message_store.cr

carlhoerberg · 2024-06-17T21:35:31Z

src/lavinmq/queue/stream_queue_message_store.cr

 return {@last_offset, seg, pos} if @size.zero?
 mfile = @segments[seg]
 msg = BytesMessage.from_bytes(mfile.to_slice + pos)
 offset = offset_from_headers(msg.properties.headers)
 {offset, seg, pos}
+ rescue ex : IndexError # first segment can be empty if message size >= segment size
+ return offset_at(seg + 1, pos, true) unless retried


what if many segments has been deleted? this only tired one more segment? and should pos be reused? Shouldn't it be in the beginning of the next available segment?

Suggested change

return offset_at(seg + 1, pos, true) unless retried

return offset_at(@segments.first_key, 4, true) unless retried

offset_at() is always called with @segments.first_key (or @segments.last_key), so seg + 1 will always be the second segment (unless we're looking in the last segment, but since it's the last segment, we will never run into the issue of a message spilling over to the next segment in that case).

And I don't think we ever will have two empty segments in a row? Anytime a message spills over, it will end up in the next segment. The way we expire segments for streams shouldn't allow for any gaps. But maybe getting the next key in the hash is safer?

pos should be set to 4 though 👍

src/lavinmq/queue/stream_queue_message_store.cr

…e on init to facilitate simple write.

Co-authored-by: Carl Hörberg <[email protected]>

…g x-stream-use-automatic-offset

Co-authored-by: Carl Hörberg <[email protected]>

carlhoerberg

Need to think about replication of the consumer offset file.

src/lavinmq/queue/stream_queue_message_store.cr

carlhoerberg · 2024-06-20T21:34:09Z

src/lavinmq/queue/stream_queue_message_store.cr

+
+ def update_consumer_offset(consumer_tag : String, new_offset : Int64)
+ if pos = @consumer_offset_positions[consumer_tag]?
+ IO::ByteFormat::SystemEndian.encode(new_offset, @consumer_offsets.to_slice(pos, 8, false))


this information is not replicated to followers? i guess Replicator doesn't have support for modifying files, so needs to built out.

or rethink, should we do append-only and GC the file instead? Either by assigning each ctag a number, and persist that information too, or by using lz4 compression?

this information is not replicated to followers? i guess Replicator doesn't have support for modifying files, so needs to built out.

Yeah, haven't really thought about replication, but we should obviously do that!

or rethink, should we do append-only and GC the file instead? Either by assigning each ctag a number, and persist that information too, or by using lz4 compression?

Hmm, I feel like append-only could cause some issues where the data we need is scattered between potentially large amounts of stale data. But maybe we can handle that by GC'ing (when file is full? or regular intervals) and keep only the latest offset for each ctag? Replacing the file with a new, compacted version. (and expand the file if it's full and there's nothing to GC.) We already keep a hash (@consumer_offset_positions) of all tracked ctags and their respective position in the file , so knowing what to keep when compacting should be pretty straight-forward.

I don't think I understand what you mean by Either by assigning each ctag a number, and persist that information too, or by using lz4 compression? though. I guess lz4 would be to use less space on disk in trade-off for a little extra cpu usage?

And I'm not sure if that's better than building out Replicator to support modifying files? Maybe there are other use cases for supporting modifying files in the future (like log compaction in streams).

Co-authored-by: Carl Hörberg <[email protected]>

viktorerlingsson force-pushed the streams_automatic_offset_tracking branch 2 times, most recently from 2ecc8d3 to e00b4e5 Compare April 23, 2024 13:56

carlhoerberg requested changes Apr 23, 2024

View reviewed changes

viktorerlingsson linked an issue May 14, 2024 that may be closed by this pull request

Stream queues: Automatic offset tracking #678

Open

carlhoerberg requested changes May 16, 2024

View reviewed changes

viktorerlingsson force-pushed the streams_automatic_offset_tracking branch from 194926e to e58ad9b Compare May 23, 2024 14:38

viktorerlingsson marked this pull request as ready for review June 3, 2024 08:09

viktorerlingsson requested review from kickster97, snichme and spuun as code owners June 3, 2024 08:09

viktorerlingsson requested a review from carlhoerberg June 3, 2024 08:09

snichme reviewed Jun 3, 2024

View reviewed changes

spec/stream_queue_spec.cr Show resolved Hide resolved

snichme reviewed Jun 4, 2024

View reviewed changes

src/lavinmq/client/channel/stream_consumer.cr Show resolved Hide resolved

spec/stream_queue_spec.cr Show resolved Hide resolved

snichme approved these changes Jun 4, 2024

View reviewed changes

carlhoerberg requested changes Jun 17, 2024

View reviewed changes

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved

src/lavinmq/client/channel/stream_consumer.cr Outdated Show resolved Hide resolved

carlhoerberg reviewed Jun 17, 2024

View reviewed changes

src/lavinmq/mfile.cr Outdated Show resolved Hide resolved

src/lavinmq/queue/stream_queue_message_store.cr Outdated Show resolved Hide resolved

src/lavinmq/queue/stream_queue_message_store.cr Show resolved Hide resolved

carlhoerberg requested changes Jun 17, 2024

View reviewed changes

viktorerlingsson added 12 commits June 19, 2024 15:18

WIP adds automatic offset tracking for streams

5bfa5eb

specs

5c2b46b

add specs for edge cases. refactor specs so they are more readable

0e50b86

format

da8962f

refactor, dont init consumer tracking stuff unless needed. resize fil…

c6e649d

…e on init to facilitate simple write.

add spec that checks that only one entry is saved per consumer tag

f3a3090

format

69ab9b4

lint

a3c7a26

add function to remove consumer tags from file

27a7028

spec for removing consumer tags

d4ea06f

save length of ctag in file, dont use space as deliminator

7af298b

format

aa11afd

viktorerlingsson and others added 26 commits June 19, 2024 15:20

String#bytesize

d83d2f3

Co-authored-by: Carl Hörberg <[email protected]>

refactor store_consumer_offset and restore_consumer_offset_positions

8afb057

refactor cleanup_consumer_offsets

0f86d68

replace offset file instead of delete

d510845

lint

5d97755

lint

511ac32

dont track offsets when consumer tag is generated

4eb1a0c

remove unused code

e008ec1

cleanup consumer offsets when dropping overflow

3d186ce

format

17fed17

lint

e8e6486

handle large messages causing first segment to be empty

0c9856a

cleanup spec

5b70c04

add option to use broker tracking when x-stream-offset is set by usin…

6b30bf0

…g x-stream-use-automatic-offset

add spec

ca6112f

no need to truncate mfile, it's being deleted

728f2f7

Co-authored-by: Carl Hörberg <[email protected]>

x-stream-use-automatic-offset -> x-stream-automatic-offset-tracking

91519e1

implement rename in mfile

fb8228b

expand consumer offsets file if full

c988961

remove unused code

69cd07a

use LittleEndian

6dbbc1e

Co-authored-by: Carl Hörberg <[email protected]>

remove instance variables

749f154

use old_consumer_offsets.path

c7819f3

start reading at pos=4 after IndexError in offset_at

78eacb7

use queue_data_dir

3c2f97a

update specs to start amqp servers where needed

a74d4a6

viktorerlingsson force-pushed the streams_automatic_offset_tracking branch from d52b6f5 to a74d4a6 Compare June 19, 2024 13:39

ameba:disable Metrics/CyclomaticComplexity for find_offset

8c21f06

carlhoerberg requested changes Jun 20, 2024

View reviewed changes

include tag size prefix byte in consumer_offset_file_full?

6412095

Co-authored-by: Carl Hörberg <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic offset tracking for stream queues #661

Automatic offset tracking for stream queues #661

viktorerlingsson commented Apr 17, 2024 •

edited

carlhoerberg left a comment

viktorerlingsson commented Jun 17, 2024 •

edited

carlhoerberg Jun 17, 2024

viktorerlingsson Jun 19, 2024

carlhoerberg left a comment

carlhoerberg Jun 20, 2024

viktorerlingsson Jun 20, 2024

	return offset_at(seg + 1, pos, true) unless retried
	return offset_at(@segments.first_key, 4, true) unless retried

Automatic offset tracking for stream queues #661

Are you sure you want to change the base?

Automatic offset tracking for stream queues #661

Conversation

viktorerlingsson commented Apr 17, 2024 • edited

WHAT is this pull request doing?

HOW can this pull request be tested?

carlhoerberg left a comment

Choose a reason for hiding this comment

viktorerlingsson commented Jun 17, 2024 • edited

carlhoerberg Jun 17, 2024

Choose a reason for hiding this comment

viktorerlingsson Jun 19, 2024

Choose a reason for hiding this comment

carlhoerberg left a comment

Choose a reason for hiding this comment

carlhoerberg Jun 20, 2024

Choose a reason for hiding this comment

viktorerlingsson Jun 20, 2024

Choose a reason for hiding this comment

viktorerlingsson commented Apr 17, 2024 •

edited

viktorerlingsson commented Jun 17, 2024 •

edited