Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Adding missing metric-generator keys #3643

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

WesselAtWork
Copy link

@WesselAtWork WesselAtWork commented May 2, 2024

What this PR does:

Adds missing keys to the metrics-generator

Which issue(s) this PR fixes:
Fixes #3642

Sources/References

override_ring_key


cfg.QueryTimeout = 30 * time.Second

override_ring_key


cfg.OverrideRingKey = generatorRingKey

// generatorRingKey is the default key under which we store the metric-generator's ring in the KVStore.
generatorRingKey = "metrics-generator"

processor.service_graphs.peer_attributes

peerAttr := make([]string, 0, len(defaultPeerAttributes))
for _, attr := range defaultPeerAttributes {
peerAttr = append(peerAttr, string(attr))
}

var defaultPeerAttributes = []attribute.Key{
semconv.PeerServiceKey, semconv.DBNameKey, semconv.DBSystemKey,
}

processor.span_metrics.subprocessors

This one did not work

cfg.Subprocessors = make(map[Subprocessor]bool)
cfg.Subprocessors[Latency] = true
cfg.Subprocessors[Count] = true
cfg.Subprocessors[Size] = true

storage.trace.pool

MaxWorkers: 30,
QueueDepth: 10000,

@CLAassistant
Copy link

CLAassistant commented May 2, 2024

CLA assistant check
All committers have signed the CLA.

@WesselAtWork WesselAtWork changed the title Initial [DOC] Adding missing metric-generator keys May 2, 2024
@WesselAtWork
Copy link
Author

WesselAtWork commented May 2, 2024

Unsure if I should put the WAL config in it's own block.
The loki docs usually separate out common config into typed blocks
e.g. https://grafana.com/docs/loki/latest/configure/#s3_storage_config

They postfix these blocks with _config

Comment on lines 1073 to 1077
wal: <WAL Config>
[path: <string> | default = "/var/tempo/wal"]
[v2_encoding: <string> | default = snappy]
[search_encoding: <string> | default = none]
[ingestion_time_range_slack: <duration> | default = 2m]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed a way to show that this particular component is overriding values from the base component block. (WAL Config)

Wanted to keep it light for the future, so no comments where none would add additional value.
If there were comments we would need to keep both synchronized.

This way if you are making changes to the query-frontend you won't need to change the base block because it isn't dependant on the query-frontend
Same in reverse, if you made changes to the base WAL Config you wouldn't need to propagate them to every component that use it. (Unless you made changes to them too)

Feel free to dispute the omission of the newline breaks.

I felt a need to keep this visually distinct, but I am not married to the idea.
Might be better to keep the format consistent

Comment on lines 1251 to 1254


## Overrides

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure on the placement of this (Configuration blocks) section

Feel free to dispute

Comment on lines 389 to 396
# Configuration block for the Write Ahead Log (WAL)
traces_storage: <WAL Config>

# Path to store the wal files.
# Must be set.
# Example: "/var/tempo/generator/traces"
[path: <string> | default = ""]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a default value for the path is set in the code, around the metrics-generator
But If I look at the wal code, it should except if you do not set it, which feels like an oversight?
It feels like it should set a default value?

tempo/tempodb/wal/wal.go

Lines 52 to 54 in 251bf5a

if c.Filepath == "" {
return nil, fmt.Errorf("please provide a path for the WAL")
}

@WesselAtWork
Copy link
Author

Just discovered something:
missing the localblocks key

type ProcessorConfig struct {
ServiceGraphs servicegraphs.Config `yaml:"service_graphs"`
SpanMetrics spanmetrics.Config `yaml:"span_metrics"`
LocalBlocks localblocks.Config `yaml:"local_blocks"`
}

and most of the ring keys:

type RingConfig struct {
KVStore kv.Config `yaml:"kvstore"`
HeartbeatPeriod time.Duration `yaml:"heartbeat_period"`
HeartbeatTimeout time.Duration `yaml:"heartbeat_timeout"`
InstanceID string `yaml:"instance_id"`
InstanceInterfaceNames []string `yaml:"instance_interface_names"`
InstanceAddr string `yaml:"instance_addr"`
InstancePort int `yaml:"instance_port"`
EnableInet6 bool `yaml:"enable_inet6"`
// Injected internally
ListenPort int `yaml:"-"`
}

@@ -330,7 +339,7 @@ metrics_generator:
span_metrics:

# Buckets for the latency histogram in seconds.
[histogram_buckets: <list of float> | default = 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.02, 2.05, 4.10]
[histogram_buckets: <list of float> | default = 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128, 0.256, 0.512, 1.02, 2.05, 4.10, 8.20, 16,40]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

cfg.HistogramBuckets = prometheus.ExponentialBuckets(0.002, 2, 14)

Should be 14 entries correct?

@knylander-grafana knylander-grafana added the type/docs Improvements or additions to documentation label May 2, 2024
@joe-elliott
Copy link
Member

Howdy, @WesselAtWork. Appreciate all the details cleaned up here!

@kvrhdn is going to take a look at this in the next few days. Just didn't want you to think we were ignoring you.

@knylander-grafana
Copy link
Contributor

I love all the clean up you've done on this doc. It really helps readability. (I'll defer to developer review for this PR.)

Copy link
Member

@kvrhdn kvrhdn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work! 🙏🏻

Sorry it took a while to get through it, overall changes look good I've just left some smaller comments. Once those are addressed this is good to go 🙂

docs/sources/tempo/configuration/_index.md Outdated Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Outdated Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Outdated Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Outdated Show resolved Hide resolved
docs/sources/tempo/configuration/_index.md Show resolved Hide resolved
Comment on lines 431 to 447
# Maximum (???) (lifespan for any given block?)
[max_block_duration: <duration> | default = 1m]

# Maximum block size before it is closed (?)
[max_block_bytes: <uint64> | default = 500000000]

# Maximum (???)
[complete_block_timeout: <duration> | default = 1h]

# (Unused?)
# Max number of "live" traces
[max_live_traces: <uint64>]

# Whether server spans should be filtered in or not.
# true means keep the server kinds (???)
# false means drop the server kinds (???)
[filter_server_spans: <bool> | default = true]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Help with these descriptions please

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to try to help. What do you mean by server kinds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// filterBatches to only root spans or kind==server. Does not modify the input
// but returns a new struct referencing the same input pointers. Returns nil
// if there were no matching spans.
func filterBatches(batches []*v1.ResourceSpans) []*v1.ResourceSpans {
keep := make([]*v1.ResourceSpans, 0, len(batches))
for _, batch := range batches {
var keepSS []*v1.ScopeSpans
for _, ss := range batch.ScopeSpans {
var keepSpans []*v1.Span
for _, s := range ss.Spans {
if s.Kind == v1.Span_SPAN_KIND_SERVER || len(s.ParentSpanId) == 0 {
keepSpans = append(keepSpans, s)
}
}
if len(keepSpans) > 0 {
keepSS = append(keepSS, &v1.ScopeSpans{
Scope: ss.Scope,
Spans: keepSpans,
})
}
}
if len(keepSS) > 0 {
keep = append(keep, &v1.ResourceSpans{
Resource: batch.Resource,
ScopeSpans: keepSS,
})
}
}
return keep
}

It's the span kind(?) client/server kinds I think.

Copy link
Member

@kvrhdn kvrhdn May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestions would be:

            # How often to run flush checks (??)
            [flush_check_period: <duration> | default = 10s]

How often to run the flush loop to cut idle traces and blocks

            # After this idle period, a trace is closed. (??)
            [trace_idle_period: <duration> | default = 10s]

Duration after which to consider a trace complete if no spans have been received

(this is copied from the flag definition)

            # Maximum (???) (lifespan for any given block?)
            [max_block_duration: <duration> | default = 1m]

Maximum duration which the head block can be appended to before cutting it

            # Maximum block size before it is closed (?)
            [max_block_bytes: <uint64> | default = 500000000]

Maximum size of the head block before cutting it

            # Maximum (???)
            [complete_block_timeout: <duration> | default = 1h]

Duration to keep blocks in the ingester after they have been flushed

            # (Unused?)
            # Max number of "live" traces
            [max_live_traces: <uint64>]

Maximum amount of live traces, if this is exceeded traces will be dropped with reason live_traces_exceeded.

            # Whether server spans should be filtered in or not.
            # true means keep the server kinds (???)
            # false means drop the server kinds (???)
            [filter_server_spans: <bool> | default = true]

This code iterates through all spans and only keeps spans that either:

  • have span.kind == SPAN_KIND_SERVER
  • don't have a parent span

If enabled, only parent spans or spans with SpanKind server will be retained

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why only retain parents and Server spans by default?
What about other spans or Client types?

Comment on lines +1430 to +1433
# Where to store the intermediate blocks while they are being appended to.
# Will always join the `path` with "blocks" to generate the effective path
# Example: "/var/tempo/wal/blocks" (ignored)
[blocksfilepath: <ignored> | = join(.path, "/blocks")]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found a bug?

I described what the code does here, but I don't think this is the intended behaviour...

Please verify :

tempo/tempodb/wal/wal.go

Lines 70 to 75 in 3a4f309

p := filepath.Join(c.Filepath, blocksDir)
err = os.MkdirAll(p, os.ModePerm)
if err != nil {
return nil, err
}
c.BlocksFilepath = p

I think it's supposed to look like this (but with c.BlocksFilepath == ""):

tempo/tempodb/wal/wal.go

Lines 56 to 67 in 3a4f309

// The /completed/ folder is now obsolete and no new data is written,
// but it needs to be cleared out one last time for any files left
// from a previous version.
if c.CompletedFilepath == "" {
completedFilepath := filepath.Join(c.Filepath, completedDir)
err = os.RemoveAll(completedFilepath)
if err != nil {
return nil, err
}
c.CompletedFilepath = completedFilepath
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Improvements or additions to documentation
Projects
None yet
5 participants