feat: have flightGroups take some time to allow reuse #286

moshegood · 2024-02-05T17:11:09Z

Max memory usage on two clusters with/without this change.
The lower line is the one with the change.
Note also, the high memory usage servers were actively shedding load and rejecting connections to stay alive.

Other Approaches

The main other way to do this would be to cache the data from the MakeServerSidePutEvent(...) call in getReplayEvent()
That would require cache-invalidation, which would require the data stores to have some version id to enable the ld-relay to know that the underlying information has changed.

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Related issues

Provide links to any issues in this repository or elsewhere relating to this pull request.

Describe the solution you've provided

Provide a clear and concise description of what you expect to happen.

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.

moshegood · 2024-02-05T17:12:24Z

internal/streams/stream_provider_server_side.go

@@ -109,6 +110,7 @@ func (r *serverSideEnvStreamRepository) Replay(channel, id string) chan eventsou
 // getReplayEvent will return a ServerSidePutEvent with all the data needed for a Replay.


This function is only called when we have a new or reconnecting client.
So some small delay here should be acceptable.

cwaldren-ld · 2024-02-05T21:31:51Z

Hi @moshegood, I like the idea here - trading increased latency for decreased memory usage.

In your use-case, what would you actually use as the configured LD_STREAMING_DELAY_SECONDS?

One downside I'm seeing is that the latency would be introduced even if no one else is being served. A dynamic approach might calculate the incoming RPS and use that to increase the latency, but that might be too complicated.

moshegood · 2024-02-06T16:28:07Z

Hi @moshegood, I like the idea here - trading increased latency for decreased memory usage.

In your use-case, what would you actually use as the configured LD_STREAMING_DELAY_SECONDS?

One downside I'm seeing is that the latency would be introduced even if no one else is being served. A dynamic approach might calculate the incoming RPS and use that to increase the latency, but that might be too complicated.

I've been testing with 2 seconds.
Our big spikes in connectivity come during deployments and after network reconnects between the relays and LaunchDarkly.

moshegood · 2024-02-15T13:48:56Z

Would it be better if we only delay when there was another request completed very recently?
For example, pause the flight group only if there was another request completed less than a second ago?

cwaldren-ld · 2024-02-15T17:59:36Z

I think so, and I'm wondering if we can use something like token bucket to solve this.

For example:

Setup token bucket with rate 1 req/sec (can be configurable)
Request comes in, check the token bucket.
If token available, then serve the request immediately.
Otherwise, block and initiate flightgroup.
[meanwhile, other requests may come in and join the flightgroup.]
Serve the requests all at once. This fills the token bucket back up, go to (1).

moshegood · 2024-02-16T20:05:14Z

Changed things around so that a single new client connection is responded to immediately.
We only delay the bulk fetch of all data between batches of new clients if we very recently fetched that data, and if the option to do so is set.

…_DELAY_SECONDS

moshegood · 2024-02-23T15:56:06Z

Any updates required on this PR?
Do you need to run additional tests?

cwaldren-ld · 2024-02-28T19:14:01Z

Hi @moshegood , pardon the delay on my end.

We are still discussing how to approach this internally. Mainly it seems that we might have outgrown the usage of the flightgroup and need a more sophisticated system.

moshegood requested a review from a team as a code owner February 5, 2024 17:11

moshegood commented Feb 5, 2024

View reviewed changes

moshegood force-pushed the moshe/lower.memory.usage branch from e6a11e1 to d22388d Compare February 5, 2024 19:22

cwaldren-ld changed the title ~~have flightGroups take some time to allow reuse~~ feat: have flightGroups take some time to allow reuse Feb 8, 2024

moshegood force-pushed the moshe/lower.memory.usage branch 3 times, most recently from 2976a5e to 52ff6a5 Compare February 11, 2024 16:20

moshegood force-pushed the moshe/lower.memory.usage branch from 2ea42ef to f517e61 Compare February 16, 2024 20:06

Moshe Good added 3 commits February 22, 2024 14:18

have flightGroups take some time to allow reuse

c0136be

Update configuration documentation to describe the usage of STREAMING…

47e7c35

…_DELAY_SECONDS

Option to delay fetching bulk data between batches of new clients

2c8e623

moshegood force-pushed the moshe/lower.memory.usage branch from aaf06f9 to 2c8e623 Compare February 22, 2024 19:18

moshegood mentioned this pull request Feb 22, 2024

feat: have flightGroups take some time to allow reuse moshegood/ld-relay#1

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: have flightGroups take some time to allow reuse #286

feat: have flightGroups take some time to allow reuse #286

moshegood commented Feb 5, 2024 •

edited

moshegood Feb 5, 2024

cwaldren-ld commented Feb 5, 2024

moshegood commented Feb 6, 2024

moshegood commented Feb 15, 2024

cwaldren-ld commented Feb 15, 2024

moshegood commented Feb 16, 2024

moshegood commented Feb 23, 2024

cwaldren-ld commented Feb 28, 2024

		@@ -109,6 +110,7 @@ func (r *serverSideEnvStreamRepository) Replay(channel, id string) chan eventsou
		// getReplayEvent will return a ServerSidePutEvent with all the data needed for a Replay.

feat: have flightGroups take some time to allow reuse #286

Are you sure you want to change the base?

feat: have flightGroups take some time to allow reuse #286

Conversation

moshegood commented Feb 5, 2024 • edited

Other Approaches

moshegood Feb 5, 2024

Choose a reason for hiding this comment

cwaldren-ld commented Feb 5, 2024

moshegood commented Feb 6, 2024

moshegood commented Feb 15, 2024

cwaldren-ld commented Feb 15, 2024

moshegood commented Feb 16, 2024

moshegood commented Feb 23, 2024

cwaldren-ld commented Feb 28, 2024

moshegood commented Feb 5, 2024 •

edited