Configuration option to send Buffer in newest-to-oldest order #15208

kj4tmp · 2024-04-22T22:36:54Z

Use Case

The telegraf buffer is sometimes used to temporarily accommodate high write rates to an influxdb instance.

Sometimes it is preferred to see the newest data in influxdb first and backfill the older data until the production rate decreases.

Expected behavior

I expected to see a configuration option for the buffer to allow it to flush the newest data first instead of the oldest data.

Actual behavior

Only the oldest data is flushed first.

Additional info

No response

powersj · 2024-04-22T22:58:16Z

Hi,

The telegraf buffer is sometimes used to temporarily accommodate high write rates to an influxdb instance.

How long is your output down such that you actually see the impact of FIFO occur? Can you provide some logs giving an example? I am initially hesitant to treat the buffer like a stack versus a queue without understanding if your stack would ever flush in your situation.

fwiw, the running outputs will request a batch from the buffer, and then the buffer returns a batch size or less slice of metrics to write.

kj4tmp · 2024-04-23T23:52:59Z

Our application requires accommodating bursts of writes exceeding 1 million field writes / second (which is the typical limit of a single node influxdb OSS instance) for up to 20 minutes, while also reporting live down-sampled data. This can be accomplished with a telegraf buffer of about 30 GB (on our machine).

The FIFO-ness of the telegraf buffer means if we do nothing, the data in influx will be about 10 minutes behind by the end of the 20 minute burst, violating the live down-sampled data reporting requirement.

One option available to satisfy this live + buffering requirement is to use multiple influxdb output plugins and pipe the live downsampled data to a dedicated output plugin instance using some tag-based routing with aggregator.final and a processor to strip the final back off the aggregator output

###############################################################################
#                       AGGREGATOR PLUGINS                                    #
###############################################################################

# 10 Hz Aggregator
[[aggregators.final]]
  # alias = "10-Hz-Aggregator"
  ## The period on which to flush & clear the aggregator.
  period = "0.1s"

  ## If true, the original metric will be dropped by the
  ## aggregator and will not get sent to the output plugins.
  # drop_original = false

  ## The time that a series is not updated until considering it final. Ignored
  ## when output_strategy is "periodic".
  # series_timeout = "5m"

  ## Output strategy, supported values:
  ##   timeout  -- output a metric if no new input arrived for `series_timeout`
  ##   periodic -- output the last received metric every `period`
  output_strategy = "periodic"
  [aggregators.final.tags]
    aggregator = "10-Hz-Aggregator"

###############################################################################
#                       PROCESSOR PLUGINS                                     #
###############################################################################

# Trim the _final stuff from the aggregator plugin
[[processors.regex]]
  ## Other configurations for the processor...

  ## Rename metric fields to strip "_final" suffix
  [[processors.regex.field_rename]]
    ## Regular expression to match on the field name
    pattern = "(.*)_final$"
    ## Replacement expression defining the name of the new field
    replacement = "${1}"

  [processors.regex.tagpass]
    aggregator = ["10-Hz-Aggregator"]

the solution could be simpler if the FIFO buffer could be configured as LIFO. Though it may have some un-forseen consequences for users who rely on the "last-write wins" aspect of how influxdb handles duplicate data.

kj4tmp · 2024-04-24T00:01:16Z

here is a relevant issue
#5633

powersj · 2024-04-25T16:40:10Z

Thanks for the background it does help to understand the situation and desire for this better. My initial concerns are two fold:

First, that some output simply do not work with sending newer data first. For example, stackdriver absolutely requires data to be sent in order, oldest to newest. We have had issues come up in the past where this was not occurring. Adding an option like this, while opt-in, would prevent the usage of certain plugins and we would need to make that very clear somehow.

Second, I do have a concern with using the buffer like a stack, in that it is possible some metrics would get lost or sit in the stack whereas the queue ensures nothing stays in there forever.

That said, because this is opt-in, I am not opposed to adding some sort of option to allow this behavior. We are currently working on a rework of the buffer implementation to allow us to write to files and not just in-memory. I would want to make this sort of change after that work is landed.

next steps: continue work on buffer implementation, get initial re-structure in place, and look to add an option in how the buffer is read from.

kj4tmp added the feature request Requests for new plugin and for new features to existing plugins label Apr 22, 2024

kj4tmp changed the title ~~Configuration to send Buffer in newests to oldest order~~ Configuration option to send Buffer in newest-to-oldest order Apr 22, 2024

powersj added the waiting for response waiting for response from contributor label Apr 22, 2024

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Apr 23, 2024

kj4tmp mentioned this issue Apr 24, 2024

Configuration option to drop oldest or newest data in the buffer when buffer is full #15217

Open

powersj added help wanted Request for community participation, code, contribution size/m 2-4 day effort labels Apr 25, 2024

powersj mentioned this issue May 16, 2024

feat(processors): Traffic shaper processor plugin to shape uneven distribution of incoming metrics #15353

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration option to send Buffer in newest-to-oldest order #15208

Configuration option to send Buffer in newest-to-oldest order #15208

kj4tmp commented Apr 22, 2024

powersj commented Apr 22, 2024

kj4tmp commented Apr 23, 2024 •

edited

kj4tmp commented Apr 24, 2024

powersj commented Apr 25, 2024

Configuration option to send Buffer in newest-to-oldest order #15208

Configuration option to send Buffer in newest-to-oldest order #15208

Comments

kj4tmp commented Apr 22, 2024

Use Case

Expected behavior

Actual behavior

Additional info

powersj commented Apr 22, 2024

kj4tmp commented Apr 23, 2024 • edited

kj4tmp commented Apr 24, 2024

powersj commented Apr 25, 2024

kj4tmp commented Apr 23, 2024 •

edited