Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional timeout argument to flush fixed-time-window-fn buffer #812

Open
psivesely opened this issue May 9, 2017 · 2 comments
Open

Comments

@psivesely
Copy link

It has been repeatedly requested that fixed-time-window-fn both (a) automatically dump its buffer at the end of each time window (#563 #259 #728) and (b) remain the way it is (#728 (comment) #728 (comment)). It has also been noted that the code as-is will drop events from an older time window that arrive after events from a newer one #728 (comment).

An idea I had is to add a (maybe optional) argument to fixed-time-window-fn (let's call it pippo for now because I need an excuse to use this var name) which specifies how long to wait after a time period ends before flushing a given time window's buffer. So a given fixed-time-window-fn-based stream will maintain multiple buffers for different time windows when pippo > 0. Each of these buffers will only be flushed pippo seconds after the the time window they are for has ended even if events from later time windows have already been received.

This is advantageous in two ways:

  1. In configurations where upstream streams may delay events fed to fixed-time-window-fn, pippo can be set to slightly more than the maximum delay time for an event to make it to the fixed-time-window-fn stream. This allows users like @Anvil to take advantage of automatic buffer flushing without worrying about missing events.
  2. Even if a fixed-time-window-fn-based stream is not being fed by parent streams that induce delays, it fixes the problem of missing events that may arrive slightly out of order. I'd imagine that setting pippo to a small value like 10 should be sufficient to make sure that the majority of events that arrive slightly out of order are still included in the vector of events for the appropriate time window, instead of being dropped.

The only concern I have regarding such an implementation would be the memory usage when pippo is set very high. It seems like one shouldn't need to set it to more than a few times the value that the time window is set to, so it shouldn't use more than a few times the memory that it does at present. I think if the functionality is documented well this shouldn't be a problem.

@mcorbin
Copy link
Contributor

mcorbin commented May 9, 2017

We should probably create a new stream with this behavior and do not touch the fixed-time-window stream.

@jamtur01
Copy link
Member

jamtur01 commented Jun 5, 2017

What @mcorbin said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants