Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Metrics family config #2121

Merged
merged 4 commits into from
Mar 20, 2024
Merged

Conversation

amnonh
Copy link
Contributor

@amnonh amnonh commented Feb 26, 2024

This series adds metrics family config.

Family config is a configuration that relates to all metrics with the same name but with different labels.

set_metrics_family_configs allows changing that configuration during run time. Specifically, change the label aggregation based on a metric name.

Using multiple independent labels on the same metric is a common practice. A known drawback is that this cartesian product makes it easy to discover in real-time that the number of metrics explodes.
While metrics relabel config allows dropping some of those metrics, it could be useful to report an aggregated metric.

The following is an example for setting the aggregate labels for the metric test_gauge_1 and all metrics matching the regex test_gauge1.*

std::vector<sm::metrics_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metrics_family_configs(fc);

An alternative to #1966

Metric family metadata consists of information relevant to all metrics
with the same name (but with different labels).  For example, name,
description, and aggregate labels.

The metric family metadata is found in two data structures: the
_value_map that holds all registered metrics and _metadata that contains
only reported metrics.

This patch makes the _metadata hold a reference to the data in the
_value_map instead of copying it, which serves two purposes:
1. Minor performance gain (memory and computation).
2. It will now be possible for a change in _value_map to be reflected in
   _metadata.

Signed-off-by: Amnon Heiman <[email protected]>
@amnonh amnonh force-pushed the metrics_family_config branch 2 times, most recently from 8881bdc to c8dff12 Compare March 5, 2024 10:09
@amnonh
Copy link
Contributor Author

amnonh commented Mar 11, 2024

@nyh can you take a look?

@nyh
Copy link
Contributor

nyh commented Mar 12, 2024

std::vector<sm::metrics_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metrics_family_configs(fc);

Can you please explain what does the value of "aggregate_label" do? What does it mean for it to be "lb" or "ll","aa"?

Also, please explain if these aggregations sum all the values for the same metric - including values for different shards (which are summed to get a per-node value), or does it happen on a per-shard basis and we are still left with shard labels even after the aggregation?

@@ -333,7 +333,7 @@ using metric_metadata_fifo = std::deque<metric_info>;
* each of the metric.
*/
struct metric_family_metadata {
metric_family_info mf;
metric_family_info& mf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment saying that this holds a pointer into _value_map.
It's important to know what is the lifetime of this object.

* \brief metrics_family_config allow changing metrics family configuration
*
* Allow changing metrics family configuration
* Supports changing the aggregation labels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this sentence is saying - "changing the aggregation labels"? Are you changing labels? Or the aggregation of values with different labels?

*
* name - optional exact metric name
* regex_name - if set, all the metrics name that match the regular expression
* aggregate_labels - The labels to aggregate the metrics by.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand (see my comment on the commit message too) what aggregate_labels="x","y" means:
Does it mean you aggregate (sum?) the value with label x and label y and... where do you put the sum?
Or maybe it means that for each of x and y, you aggregate many different values that might have different other labels (does this include different shards?) and the result is one value for x and one value for y?
Or does it means something else?

Please explain this more clearly, because I don't have any intuitive grasp of what the concepts of "metric families" or "label aggregation" means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading your test in the last patch, I think I understand better: It turns out labels have a name and a value - it's not that we have a metric with a label x - we can have one metric with the label "x=dog" and another one with "x=cat". So apparently aggregation will sum these two metrics and create one metric without (?) the label x at all. I still don't understand, though, what happens with other labels: What if we have the following four metrics:

some_metric[x=dog, z=hello] = 1
some_metric[x=cat, z=hello] = 2
some_metric[x=dog, z=hi] = 3
some_metric[x=cat, z=yo] = 4

How do we aggregate it over "x"? Do we sum all of those metrics and drop all labels? Or will we have three metrics for the three different "z" label values (i.e., just the first two metrics are summed)? This makes sense, but is it documented somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, just for context, we already support aggregate by label, but currently, you have to configure it when you register the metrics, and it cannot be changed.

Aggregation works by dropping one or more labels and summing over all metrics that now have the same name and label.

Using your example, let's drop x

some_metric[ z=hello] = 1
some_metric[ z=hello] = 2
some_metric[ z=hi] = 3
some_metric[ z=yo] = 4

Now we sum and get:

some_metric[ z=hello] = 3
some_metric[ z=hi] = 3
some_metric[ z=yo] = 4

This is not new, just it's run-time configuration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have been sleeping under a rock when you added it.

That being said, all of this should be documented somewhere. I also found the names of the fields and their explanations to be confusing. It says "aggregate_labels - The labels to aggregate the metrics by.". In the above example, we are not aggregating "by x" - in fact in some sense it's the opposite - we're dropping x, and aggregating the metrics by the other labels.


/*
* \brief return the current metrics_family_config
* This function returns a vector of the current metrics relabel configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasto mentions "relabel".

@@ -130,7 +130,20 @@ future<metric_relabeling_result> set_relabel_configs(const std::vector<relabel_c
const std::vector<relabel_config>& get_relabel_configs() {
return impl::get_local_impl()->get_relabel_configs();
}
void impl::impl::update_aggregate(metric_family_info& mf) const noexcept {
for (const auto& fc : _metrics_family_configs) {
if (fc.name == mf.name || fc.regex_name.match(mf.name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you need to check regex_name.empty() here? Or match() is correct (and efficient enough) in the empty case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that it was efficient enough and the update_aggregate is a rare operation

@amnonh
Copy link
Contributor Author

amnonh commented Mar 12, 2024

std::vector<sm::metrics_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metrics_family_configs(fc);

Can you please explain what does the value of "aggregate_label" do? What does it mean for it to be "lb" or "ll","aa"?

Also, please explain if these aggregations sum all the values for the same metric - including values for different shards (which are summed to get a per-node value), or does it happen on a per-shard basis and we are still left with shard labels even after the aggregation?

I saw you already figured things out, but I'll add a context.
The metrics layer as of today, already supports aggregate labels; what this means is that you take all metrics with the same name (but with different label values) drop one or more labels, and return an aggregated value of that metric (using sum as an aggregated function)

The main usage is dropping the shard label and reporting a metric per node, but it can also be used to aggregate over any other label (more info on the Prometheus enhancements #1120 ).

While we can modify configuration that is per metric instance (e.g. name and labels) using the metrics_relabel, we don't have the option to change in run-time the aggregate labels, because this is a feature that belongs to the metric family.

This is what the series is about. Changing in run-time the metric family configuration.

@nyh
Copy link
Contributor

nyh commented Mar 12, 2024

std::vector<sm::metrics_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metrics_family_configs(fc);

Can you please explain what does the value of "aggregate_label" do? What does it mean for it to be "lb" or "ll","aa"?
Also, please explain if these aggregations sum all the values for the same metric - including values for different shards (which are summed to get a per-node value), or does it happen on a per-shard basis and we are still left with shard labels even after the aggregation?

I saw you already figured things out, but I'll add a context. The metrics layer as of today, already supports aggregate labels; what this means is that you take all metrics with the same name (but with different label values) drop one or more labels, and return an aggregated value of that metric (using sum as an aggregated function)

The main usage is dropping the shard label and reporting a metric per node, but it can also be used to aggregate over any other label (more info on the Prometheus enhancements #1120 ).

While we can modify configuration that is per metric instance (e.g. name and labels) using the metrics_relabel, we don't have the option to change in run-time the aggregate labels, because this is a feature that belongs to the metric family.

This is what the series is about. Changing in run-time the metric family configuration.

All of this should be documented somewhere (not in a discussion in this PR). I see I figured out the right meaning of this stuff after I finished reviewing the entire series including the test, but I don't think users should read tests to guess what an API does.

By the way, I see you have classes metric_family_info, metric_family, metric_family_metadata, but in this patch you used the plural form, metrics_family_config, set_metrics_family_config, etc. This inconsistency is probably unintended?

@amnonh
Copy link
Contributor Author

amnonh commented Mar 12, 2024

You are right about the documentation, for the user-facing part (the relabel and metrics family config) there's already an open issue around that

@amnonh
Copy link
Contributor Author

amnonh commented Mar 12, 2024

About the naming, the plural is intended because it could be multiple metrics, multiple families

@nyh
Copy link
Contributor

nyh commented Mar 12, 2024

About the naming, the plural is intended because it could be multiple metrics, multiple families

The plural of "metric family" is "metric families", not "metrics famly".

@nyh
Copy link
Contributor

nyh commented Mar 12, 2024

You are right about the documentation, for the user-facing part (the relabel and metrics family config) there's already an open issue around that

Ok, then we can have this PR not solve this problem - just please say that and add a reference to that issue.
However, you did add a few new comments describing new functions, which I found confusing - and I think those new comments can be improved. I found even the basic phrase of aggregating by label x to be confusing, or even misleading, because I don't think that's what you actually do - it's more like you aggregate by the other labels after dropping label x.

@amnonh
Copy link
Contributor Author

amnonh commented Mar 12, 2024

You are right about the documentation, for the user-facing part (the relabel and metrics family config) there's already an open issue around that

Ok, then we can have this PR not solve this problem - just please say that and add a reference to that issue. However, you did add a few new comments describing new functions, which I found confusing - and I think those new comments can be improved. I found even the basic phrase of aggregating by label x to be confusing, or even misleading, because I don't think that's what you actually do - it's more like you aggregate by the other labels after dropping label x.

You are right that it requires documentation, but it does not add functionality, it adds an option to modify the current functionality in run-time.
I'll open a seastar issue around it (we have other issues in other repositories)

I've opened #2140

This patch adds support for metric family config.
A metric family config is a configuration that relates to all the
metrics with the same name (but with different labels).

The configuration would allow changing the labels aggregation for
metrics.

Besides the configuration class, which is similar to
metrics_relabel_config, two enhancements were added to the
relabel_config_regex:

1. empty() which check if it's empty or not.
2. match() that checks if the regex is nonempty and matches a string
without returning the matched group.

Signed-off-by: Amnon Heiman <[email protected]>
Family config is a configuration that relates to all metrics with the
same name but with different labels values.

set_metric_family_configs allows changing that configuration during run
time. Specifically, change the label aggregation based on a metric name.

The following is an example for setting the aggregate labels for the
metric test_gauge_1 and all metrics matching the regex test_gauge1.*:

std::vector<sm::metric_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metric_family_configs(fc);

Signed-off-by: Amnon Heiman <[email protected]>

include/seastar/core/metrics_api.hh
This patch adds a test for the family config API. it register metrics
and change their aggregate labels.
It check that both metrics that create before and after the call to
set_metric_family_configs works as expected.

Signed-off-by: Amnon Heiman <[email protected]>
@nyh nyh closed this in 4aa93b6 Mar 20, 2024
@nyh nyh merged commit 4aa93b6 into scylladb:master Mar 20, 2024
14 checks passed
graphcareful pushed a commit to graphcareful/seastar that referenced this pull request Mar 20, 2024
This series adds metrics family config.

Family config is a configuration that relates to all metrics with the same name but with different labels.

set_metrics_family_configs allows changing that configuration during run time. Specifically, change the label aggregation based on a metric name.

Using multiple independent labels on the same metric is a common practice. A known drawback is that this cartesian product makes it easy to discover in real-time that the number of metrics explodes.
While metrics relabel config allows dropping some of those metrics, it could be useful to report an aggregated metric.

The following is an example for setting the aggregate labels for the metric test_gauge_1 and all metrics matching the regex test_gauge1.*

```
std::vector<sm::metrics_family_config> fc(2);
fc[0].name = "test_gauge_1";
fc[0].aggregate_labels = { "lb" };
fc[1].regex_name = "test_gauge1.*";
fc[1].aggregate_labels = { "ll", "aa" };

sm::set_metrics_family_configs(fc);
```

An alternative to scylladb#1966

Closes scylladb#2121

* github.com:scylladb/seastar:
  tests/unit/metrics_test.cc: Test metrics family config
  metrics: Adds metric_family_config support
  core/relabel_config.hh: support metric family config
  metrics_api.hh: Remove duplication of metric family metadata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants