Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semconv 1.25 #4690

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from
Draft

Conversation

dyladan
Copy link
Member

@dyladan dyladan commented May 9, 2024

This is a big PR but most of it is autogenerated. Below is a list of changes:

  • Update semconv to 1.25
  • Update semconv generator to 0.24
  • Output to experimental.ts and stable.ts so we can export separately in the future if required
    • experimental attributes and metrics now have @experimental jsdoc tag
  • Change SEMRESATTRS_ and RESATTRS_ to just ATTR_ for attributes
  • Generate constants for metric names with METRIC_ prefix
  • Deprecate all old names. These files will never change again and be removed in 2.0 if we ever release one
  • All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

Notes:

  • Template attributes are still not supported such as http.request.header.<key> for now. It's not clear how we can/should support them and until we make a decision i'd leave them out (they were excluded/didn't exist before)

Questions:

  • For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental? main export is stable only with backwards compatibility for previous releases.

Example: this is what it would look like to update the utils.ts file in the http instrumentation.

import {
  ATTR_HTTP_ROUTE,
  // These are not in the updated semconv and need to be imported with old names for now
  SEMATTRS_HTTP_CLIENT_IP,
  SEMATTRS_HTTP_HOST,
  SEMATTRS_HTTP_REQUEST_CONTENT_LENGTH_UNCOMPRESSED,
  SEMATTRS_HTTP_RESPONSE_CONTENT_LENGTH_UNCOMPRESSED,
  SEMATTRS_HTTP_SERVER_NAME,
  SEMATTRS_NET_HOST_IP,
  SEMATTRS_NET_PEER_IP,
} from '@opentelemetry/semantic-conventions';
import {
  NET_TRANSPORT_VALUES_IP_TCP,
  NET_TRANSPORT_VALUES_IP_UDP,
  ATTR_HTTP_FLAVOR,
  ATTR_HTTP_METHOD,
  ATTR_HTTP_REQUEST_CONTENT_LENGTH,
  ATTR_HTTP_RESPONSE_CONTENT_LENGTH,
  ATTR_HTTP_SCHEME,
  ATTR_HTTP_STATUS_CODE,
  ATTR_HTTP_TARGET,
  ATTR_HTTP_URL,
  ATTR_HTTP_USER_AGENT,
  ATTR_NET_HOST_NAME,
  ATTR_NET_HOST_PORT,
  ATTR_NET_PEER_NAME,
  ATTR_NET_PEER_PORT,
  ATTR_NET_TRANSPORT,
} from '@opentelemetry/semantic-conventions/experimental';

Copy link
Member Author

@dyladan dyladan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some quick comments to explain some of my thought process

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template is a lot smaller now because I removed all the namespaced stuff that will never be updated and is considered deprecated.

scripts/semconv/generate.sh Outdated Show resolved Hide resolved
scripts/semconv/generate.sh Outdated Show resolved Hide resolved
*
* Note: Total CPU time consumed by the specific container on all available CPU cores.
*
* @experimental this metric is experimental and is subject to change in minor releases of `@opentelemetry/semantic-conventions`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this file is experimental

@@ -1 +1,2 @@
opentelemetry-specification/
semantic-conventions/
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new ignore. Kept the old one since a lot of people have already checked it out and it isn't hurting anything

*
* @experimental this metric is experimental and is subject to change in minor releases of `@opentelemetry/semantic-conventions`.
*/
export const METRIC_CONTAINER_CPU_TIME = 'container.cpu.time';
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added metric names with METRIC_ prefix

@dyladan
Copy link
Member Author

dyladan commented May 9, 2024

/cc @trentm @JamieDanielson since you seemed interested in this

/cc @MSNev since you have done the most work on this recently

@dyladan dyladan marked this pull request as ready for review May 9, 2024 16:31
@dyladan dyladan requested a review from a team as a code owner May 9, 2024 16:31
@trentm
Copy link
Contributor

trentm commented May 9, 2024

I'll take a while to review this. I'm still trying to grok the generation, the semantic-conventions/model vs schemas/... subdirs, etc. Some early Qs/thoughts:

  • I gather merging the SEMRESATTRS_ and SEMATTRS_ groups is related to the "Problem" described at Reconsidering what semantic conventions code generation should produce semantic-conventions#551 A hearty +1 to not using those prefixes. Did you consider also dropping the "ATTRS_" prefix? IIUC the Go semconv does not have any prefix on the exports from its semconv package. Java has namespacing of a different sort via the HttpAttributes part of import io.opentelemetry.semconv.HttpAttributes.
  • Similar to above, did you consider not having the METRIC_ prefix on metrics-related constants? (I don't see metrics-related values in open-telemetry/semantic-conventions-java.git and I'm not sure why. Does OTel Java not publish a package with metrics semconv constants?)

Correctness Qs:

  • Are you sure that the "deprecated" dirs in "semantic-conventions/model/..." handle all the deprecated values? For example http.resend_count was renamed to http.request.resend_count, but with your PR there is no deprecated HTTP_RESEND_COUNT entry.
  • http.client_ip is deprecated. There is a SEMATTRS_HTTP_CLIENT_IP but no ATTR_HTTP_CLIENT_IP, even though:
* @deprecated use ATTR_HTTP_CLIENT_IP
*/
export const SEMATTRS_HTTP_CLIENT_IP = TMP_HTTP_CLIENT_IP;

Same for SEMATTRS_DB_CASSANDRA_KEYSPACE, and I assume for others.

@trentm
Copy link
Contributor

trentm commented May 9, 2024

The _VALUES_ fields are using the description of the field for which they are values as their comment, e.g.:

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_CPP = 'cpp';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_DOTNET = 'dotnet';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_ERLANG = 'erlang';

/**
 * The language of the telemetry SDK.
 */
export const TELEMETRY_SDK_LANGUAGE_VALUES_GO = 'go';

Can the description (is that the "brief" yaml field?) of the value be used, instead?

@dyladan
Copy link
Member Author

dyladan commented May 10, 2024

Did you consider also dropping the "ATTRS_" prefix? ... Similar to above, did you consider not having the METRIC_ prefix

Yes I did consider that and I still would consider it if we want to go that route. It is my understanding that the semconv has decided to use a registry of unique attributes that can be applied to any signal or resource so there is no reason to differentiate them. I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

Are you sure that the "deprecated" dirs in "semantic-conventions/model/..." handle all the deprecated values? For example http.resend_count was renamed to http.request.resend_count, but with your PR there is no deprecated HTTP_RESEND_COUNT entry.

I'm actually sure they're NOT all there. The deprecated.yaml didn't exist when many of these were removed and they weren't all added back. I left all the old versions in the file they were already in, so it isn't a breaking change, but I am going to add the missing attributes to the registry anyway (see open-telemetry/semantic-conventions#1025)

Can the description (is that the "brief" yaml field?) of the value be used, instead?

Good catch. I'll update the PR

@dyladan
Copy link
Member Author

dyladan commented May 10, 2024

Can the description (is that the "brief" yaml field?) of the value be used, instead?

Good catch. I'll update the PR

Unfortunately it looks like there aren't actually descriptions on the values themselves. I think the intellisense autocomplete looks ok anyway though:

image

@dyladan
Copy link
Member Author

dyladan commented May 10, 2024

@trentm what about this?

image

@dyladan
Copy link
Member Author

dyladan commented May 10, 2024

I ended up with something like this:

/**
 * Enum value 'created' for attribute {@link ATTR_ANDROID_STATE}.
 *
 * @experimental this attribute is experimental and is subject to change in minor releases of `@opentelemetry/semantic-conventions`.
 */
export const ANDROID_STATE_VALUES_CREATED = 'created';

Which looks like this and actually links back to its parent attribute

image

Copy link

codecov bot commented May 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.04%. Comparing base (ecc88a3) to head (9bd9802).
Report is 25 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4690   +/-   ##
=======================================
  Coverage   91.04%   91.04%           
=======================================
  Files          89       89           
  Lines        1954     1954           
  Branches      416      416           
=======================================
  Hits         1779     1779           
  Misses        175      175           

Copy link
Member

@JamieDanielson JamieDanielson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this is looking really great, thanks @dyladan and thanks @trentm for the review so far.

I like ATTR better than SEMATTRS and SEMRESATTRS and wish I had realized this sooner and commented on the original PR that introduced them. I'm not clear what the full benefit is of having those other prefixes, although it may have been more relevant before they were in the global registry.

I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

I'm not sure I understand the value of having ATTR prefix for attributes but no prefix for values. In that case I'd think they could be prefixed as well, or prefix neither.

@JamieDanielson
Copy link
Member

For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental?

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

@dyladan
Copy link
Member Author

dyladan commented May 13, 2024

So far this is looking really great, thanks @dyladan and thanks @trentm for the review so far.

I like ATTR better than SEMATTRS and SEMRESATTRS and wish I had realized this sooner and commented on the original PR that introduced them. I'm not clear what the full benefit is of having those other prefixes, although it may have been more relevant before they were in the global registry.

Exactly. Previously there was some chance (although probably it wouldn't have happened) that the same attribute could have been defined for different signals. I think the most reasonable way this could have happened would be for an attribute to have bounded specific values for metrics to control cardinality, but be unbounded for other signals or resources.

I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

I'm not sure I understand the value of having ATTR prefix for attributes but no prefix for values. In that case I'd think they could be prefixed as well, or prefix neither.

The way we have it in this PR values have a postfix (actually an infix between the enum name and the value name). It provides separation between the enum name and the value name so it is distinguishable easily. For example, HOST_TYPE_LINUX is less obvious to me than HOST_TYPE_VALUE_LINUX where it is clear that LINUX is the value for the HOST_TYPE enum (these are fake attributes I just made up to prove a point).

@JamieDanielson
Copy link
Member

For the main export import {} from '@opentelemetry/semantic-conventions' should ALL semconv be exported experimental and stable, or should only the stable be exported and experimental would be imported from @opentelemetry/semantic-conventions/experimental?

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

I guess Java has a separate package for experimental attributes - there's a semconv in instrumentation-api, and a semconv in instrumentation-api-incubator. Python also has semconv in incubating separate from semconv stable. Go seems to have it all in one.

@dyladan
Copy link
Member Author

dyladan commented May 13, 2024

🤔 The benefit of keeping experimental attributes in /experimental subdirectory is that we are making it very explicit that it is an experimental attribute. I guess the downside is when the experimental attribute becomes stable, the consumer of that would have to update their code when they upgrade packages?

This is the definition of experimental... It also would force users to at least consider if they need to make a change. If the semconv attributes you're using change it might be good to force our users to acknowledge that by changing to the stable export. If they can get all from a single export they may never notice if something is renamed/deprecated.

I guess Java has a separate package for experimental attributes - there's a semconv in instrumentation-api, and a semconv in instrumentation-api-incubator. Python also has semconv in incubating separate from semconv stable. Go seems to have it all in one.

I think a single package with multiple entry points is roughly equivalent to having separate packages and less overhead. Go has all in one but they export each version separately so you have to do something to get the new semconv version.

@trentm
Copy link
Contributor

trentm commented May 13, 2024

I guess Java has a separate package for experimental attributes

My understanding of the OTel Java team's recommendations/requirements is that they do not allow a stable instrumentation package to have a dependency on the instrumentation-api-incubating package. They instead suggest the instrumentation have a copy of the experimental attributes in its own package code. This means that a user of the (non-experimental) semconv package is never broken by a semver-minor update of the package.

I guess we could get the equivalent by either (a) never using the "../experimental" entry point in stable instrumentation packages, or (b) pinning the @opentelemetry/semantic-conventions dep to a particular minor in packages that do.

I think a single package with multiple entry points is roughly equivalent to having separate packages and less overhead.

Agreed.

Go has all in one but they export each version separately so you have to do something to get the new semconv version.

This PR beat me to an attempt to update the semconv package. FWIW, I had been considering having separate entry points for each semconv version. See #4572 (comment)
I'm not advocating that option over this PR, however.

@trentm
Copy link
Contributor

trentm commented May 13, 2024

I ended up with something like this: [screenshot of intellisense for a _VALUES_ field]

Nice. That looks good.

I only kept the ATTR and METRIC prefix just to make it easier to find the value you want when autocompleting and not get confused.

My soft vote is for no prefixes. The way I thinking/expecting developers to use semconv values was to (a) have a semantic-conventions document open (e.g. https://opentelemetry.io/docs/specs/semconv/http/http-metrics/) and see a string (e.g. http.server.request.duration) and (b) then want to be able to import HTTP_SERVER_REQUEST_DURATION.

IIUC, autocomplete will show ATTR_HTTP_* and METRIC_HTTP_* values when typing HTTP so I think it is fine for autocomplete either way. Having the METRIC_ does help the developer that knows they are scoped to metrics stuff. ATTR_ feels out of place for non-metrics, non-logs stuff.

Another small reason is that I like the shorter names in code.

This is a soft vote though. I don't have a very strong reaction to ATTR_.

@trentm
Copy link
Contributor

trentm commented May 13, 2024

The way we have it in this PR values have a postfix (actually an infix between the enum name and the value name)

I like the _VALUES_ infix.
I'm not sure if reads better as _VALUE_ (singular).

@dyladan
Copy link
Member Author

dyladan commented May 22, 2024

@MSNev what would you think of combining _VALUES_ infix option in this pr with ENUM_ prefix? Would look something like this:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const ENUM_LOG_IOSTREAM_VALUES_STDOUT = 'stdout';
export const ENUM_LOG_IOSTREAM_VALUES_STDERR = 'stderr';

edit: I like someone's suggestion above to singular _VALUE_ instead of plural _VALUES_ so I'd probably make that change

@dyladan
Copy link
Member Author

dyladan commented May 23, 2024

Reverting to draft while we wait on a resolution from open-telemetry/semantic-conventions#1031

@trentm
Copy link
Contributor

trentm commented May 23, 2024

Interestingly, perhaps, I just noticed that the recently updated Python semconv generation appends _TEMPLATE to the const name if the field is type: template[string[]]. So, for example, http.request.header:

      - id: request.header
        stability: stable
        type: template[string[]]
        brief: >
          HTTP request headers, `<key>` being the normalized HTTP Header name (lowercase), the value being the header values.

Is HTTP_REQUEST_HEADER_TEMPLATE and not HTTP_REQUEST_HEADER
https://github.com/open-telemetry/opentelemetry-python/blob/8b80a28e825b102417eceb429f64d5ce52f3c2e7/scripts/semconv/templates/semantic_attributes.j2#L24

@dyladan
Copy link
Member Author

dyladan commented May 24, 2024

Interestingly, perhaps, I just noticed that the recently updated Python semconv generation appends _TEMPLATE to the const name if the field is type: template[string[]]. So, for example, http.request.header:

      - id: request.header
        stability: stable
        type: template[string[]]
        brief: >
          HTTP request headers, `<key>` being the normalized HTTP Header name (lowercase), the value being the header values.

Is HTTP_REQUEST_HEADER_TEMPLATE and not HTTP_REQUEST_HEADER https://github.com/open-telemetry/opentelemetry-python/blob/8b80a28e825b102417eceb429f64d5ce52f3c2e7/scripts/semconv/templates/semantic_attributes.j2#L24

Yeah we're actually ignoring those for now. I was going to add them in a follow-up because they are handled differently

@dyladan
Copy link
Member Author

dyladan commented May 29, 2024

@MSNev does the limitation on attribute values not being the same as attribute namespaces mentioned by @lmolkova in open-telemetry/semantic-conventions#1064 ease your concerns with enum names? There should be no collision.

@MSNev
Copy link
Contributor

MSNev commented May 29, 2024

@MSNev does the limitation on attribute values not being the same as attribute namespaces mentioned by @lmolkova in open-telemetry/semantic-conventions#1064 ease your concerns with enum names? There should be no collision.

If we want to go with the changing the names of the values to full screaming snake case rather than the existing <attribute name screaming>_<value name as screaming snake case> (eg other languages use camel case for the names) then as we are already prefixing all attributes with ATTR_ I think the better option would be your option 3 and use a more generic prefix (as they may not necessarily be considered to be enums) like VAL_ or VALUE_

so

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOGIOSTREAM_STDOUT = 'stdout';
export const LOGIOSTREAM_STDERR = 'stderr';

becomes

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const VAL_LOG_IO_STREAM_STDOUT = 'stdout';
export const VAL_LOG_IO_STREAM_STDERR = 'stderr';

This way there would always be zero chance of any conflict, vs the infix option. This would also work for whatever the outcome of the client.id / client_id resolution will be (which looks like the recommendation will be '_' -> '__', with the final option up to each language as not all languages use snake case.

Personally, I prefer the existing (but I guess I'm a little biased as that was my original choice) to convert the CamelCased values classes to the combination to avoid clashes 😀

@MSNev
Copy link
Contributor

MSNev commented Jun 5, 2024

General comment on this from in the description

All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

This was actually always the goal, the namespace "fun" was just part of the stepping stones to move forward and to try and keep the generated package as small as possible without just duplicating the string.

@dyladan
Copy link
Member Author

dyladan commented Jun 5, 2024

General comment on this from in the description

All names are constants now. Removes requirement for all the weird type stuff (sorry @MSNev I know you spent a lot of time on that)

This was actually always the goal, the namespace "fun" was just part of the stepping stones to move forward and to try and keep the generated package as small as possible without just duplicating the string.

I decided to just duplicate it. I think not long from now we'll go 2.0 and remove the namespace fun entirely.

scripts/semconv/generate.sh Outdated Show resolved Hide resolved
@trentm
Copy link
Contributor

trentm commented Jun 5, 2024

then as we are already prefixing all attributes with ATTR_ I think the better option would be your option 3 [...] like VAL_ or VALUE_ [...]

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const VAL_LOG_IOSTREAM_STDOUT = 'stdout';
export const VAL_LOG_IOSTREAM_STDERR = 'stderr';

Comparing this to other options:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOG_IOSTREAM_VALUES_STDOUT = 'stdout';
export const LOG_IOSTREAM_VALUES_STDERR = 'stderr';

or perhaps singular VALUE:

export const ATTR_LOG_IOSTREAM = 'log.iostream';
export const LOG_IOSTREAM_VALUE_STDOUT = 'stdout';
export const LOG_IOSTREAM_VALUE_STDERR = 'stderr';

I can understand the desire for a prefix (e.g. VAL_), given ATTR_ is a prefix (was there another reason for that preference?). However, I like that the "VALUE(S)_" string separates the attribute name and value in the latter options.

The infix _VALUES_ separation is more helpful with an enum value that includes a .. However that is very rare -- only the deprecated HTTP_FLAVOR enum values include a . in their IDs.

export const VAL_HTTP_FLAVOR_HTTP1_0 = '1.0' as const;
export const VAL_HTTP_FLAVOR_HTTP1_1 = '1.1' as const;
export const VAL_HTTP_FLAVOR_HTTP2_0 = '2.0' as const;

I have a slight preference for infix _VALUE_, but not a strong aversion to the other options.

@dyladan
Copy link
Member Author

dyladan commented Jun 6, 2024

I have a slight preference for infix VALUE, but not a strong aversion to the other options.

I tend to agree with you

@dyladan
Copy link
Member Author

dyladan commented Jun 6, 2024

@trentm wdyt about 2dce2eb

@trentm
Copy link
Contributor

trentm commented Jun 6, 2024

wdyt about 2dce2eb

@dyladan Yah, I don't hate ENUM_{attr}_VALUE_{valueid}. It reads well. It does make the constants longer, which may lead to ellipses in auto-complete UIs and more line wrapping in code that uses a max-line-length.

Using "VALUE" in an auto-complete UI to filter on the enum values suffices, so having ENUM_ isn't strictly needed.

Part of me still wants to drop the ATTR_ prefix, but I see its utility in filtering out the _VALUE_-enums in an auto-complete UI.

My vote is for:

ATTR_{attr}   // e.g. ATTR_LOG_IOSTREAM
{attr}_VALUE_{valueid}  // e.g. LOG_IOSTREAM_VALUE_STDOUT

but at this point I'm fine with almost any of the variants on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants