Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement spans exporting for ClickHouse storage in Jaeger V2 #4941

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

haanhvu
Copy link
Contributor

@haanhvu haanhvu commented Nov 12, 2023

Which problem is this PR solving?

Part of #5058

@haanhvu haanhvu requested a review from a team as a code owner November 12, 2023 15:18
@haanhvu
Copy link
Contributor Author

haanhvu commented Nov 12, 2023

@yurishkuro This still needs some supplements and a lot of cleanups and tests. I'm still working on them. But could you take an initial look to see if I'm going the right way? Thanks!

@haanhvu haanhvu changed the title [WIP] Implement spans exporting for ClickHouse storage for Jaeger V2 [WIP] Implement spans exporting to ClickHouse storage for Jaeger V2 Nov 12, 2023
@haanhvu haanhvu changed the title [WIP] Implement spans exporting to ClickHouse storage for Jaeger V2 [WIP] Implement spans exporting for ClickHouse storage in Jaeger V2 Nov 12, 2023
}
}

func (f *Factory) ExportSpans(ctx context.Context, td ptrace.Traces) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is ExportSpans defined on the factory instead of span store?

Copy link
Contributor Author

@haanhvu haanhvu Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is defined in plugin/storage/clickhouse/spanstore/exporter.go (link). But Jaeger V2's exporter is depending on the real type of Factory to decide whether to call SpanWriter.WriteSpan() or ExportSpans() (link). So I think making ExportSpans() a func of clickhouse Factory may be a good idea.

Of course, a better idea is to define a new API for ExportSpans(), but maybe we can do that later? Or you have a better idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a better idea, but it is not well formed yet, I need to write it down and maybe put together the new interfaces. Will try to do this later this weekend. Are you blocked on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I don't think this blocks the progress. Can continue.

@k0zl
Copy link
Contributor

k0zl commented Nov 29, 2023

👍 I've been waiting clickhouse support for a long time. Is there anything I can help you with?

@haanhvu
Copy link
Contributor Author

haanhvu commented Nov 30, 2023

👍 I've been waiting clickhouse support for a long time. Is there anything I can help you with?

@k0zl Sure! For this PR (spans exporting), the remaining parts are:

  • Finish schema implementation in plugin/storage/clickhouse/spanstore/schema.go. We described our schema choices in our benchmark report. As you can see in the current schema.go, I just implemented the spans table, not the materialized views yet.
  • Add tests: I'm pushing an unit factory_test.go this week. There are other unit tests, e2e test, or even local test with Jaeger v2 binary to do. (Jaeger v2 issue/decription is here)

After finishing with spans exporting, we'll move to spans reading,...

Also, since clickhouse storage is supported in Jaeger v2, to make this realeased you could also help with Jaeger v2 roadmap, in the issue I linked above.

Please choose whatever interests you. Thanks!

@yurishkuro
Copy link
Member

@haanhvu it would be useful to write a design doc / roadmap documenting what we're doing and what tasks need to be done, are you up for that? You can model it after the Jaeger V2 document.

@haanhvu
Copy link
Contributor Author

haanhvu commented Nov 30, 2023

@haanhvu it would be useful to write a design doc / roadmap documenting what we're doing and what tasks need to be done, are you up for that? You can model it after the Jaeger V2 document.

Yeah I'll create an issue for supporting clickhouse in jaeger v2 tomorrow

Copy link

codecov bot commented Dec 7, 2023

Codecov Report

Attention: Patch coverage is 0% with 133 lines in your changes missing coverage. Please review.

Project coverage is 53.45%. Comparing base (dd4ccea) to head (d9f4678).

Files Patch % Lines
plugin/storage/clickhouse/spanstore/exporter.go 0.00% 59 Missing ⚠️
plugin/storage/clickhouse/config.go 0.00% 29 Missing ⚠️
plugin/storage/clickhouse/factory.go 0.00% 23 Missing ⚠️
...eger/internal/extension/jaegerstorage/extension.go 0.00% 19 Missing ⚠️
plugin/storage/clickhouse/spanstore/schema.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4941       +/-   ##
===========================================
- Coverage   96.20%   53.45%   -42.75%     
===========================================
  Files         327      165      -162     
  Lines       16014     8491     -7523     
===========================================
- Hits        15406     4539    -10867     
- Misses        432     3506     +3074     
- Partials      176      446      +270     
Flag Coverage Δ
badger_v1 8.05% <ø> (ø)
badger_v2 1.90% <0.00%> (-0.04%) ⬇️
cassandra-3.x-v1 16.45% <ø> (ø)
cassandra-3.x-v2 1.82% <0.00%> (-0.03%) ⬇️
cassandra-4.x-v1 16.45% <ø> (ø)
cassandra-4.x-v2 1.82% <0.00%> (-0.03%) ⬇️
elasticsearch-7.x-v1 18.89% <ø> (ø)
elasticsearch-8.x-v1 19.07% <ø> (-0.02%) ⬇️
elasticsearch-8.x-v2 19.09% <ø> (+0.01%) ⬆️
grpc_v1 9.47% <ø> (-0.02%) ⬇️
grpc_v2 7.41% <0.00%> (-0.13%) ⬇️
kafka 9.77% <ø> (ø)
opensearch-1.x-v1 18.94% <ø> (+0.01%) ⬆️
opensearch-2.x-v1 18.94% <ø> (ø)
opensearch-2.x-v2 18.93% <ø> (-0.02%) ⬇️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@haanhvu
Copy link
Contributor Author

haanhvu commented Dec 29, 2023

@yurishkuro Could you take a look at the failed all-in-one build? make all-in-one-integration-test passed locally for me. Not sure why it fails in CI.

@yurishkuro yurishkuro added the changelog:new-feature Change that should be called out as new feature in CHANGELOG label Dec 29, 2023
@yurishkuro yurishkuro marked this pull request as draft December 29, 2023 17:16
@yurishkuro
Copy link
Member

I am seeing error

Error: failed to build pipelines: failed to create "jaeger_storage_exporter" exporter for data type "traces": nil PushTraces
2023/12/29 16:57:59 failed to build pipelines: failed to create "jaeger_storage_exporter" exporter for data type "traces": nil PushTraces
+ exit 1

@haanhvu
Copy link
Contributor Author

haanhvu commented Dec 29, 2023

I am seeing error

Error: failed to build pipelines: failed to create "jaeger_storage_exporter" exporter for data type "traces": nil PushTraces
2023/12/29 16:57:59 failed to build pipelines: failed to create "jaeger_storage_exporter" exporter for data type "traces": nil PushTraces
+ exit 1

@yurishkuro This error didn't come up in my make all-in-one-integration-test. Probably because my make all-in-one-integration-test checked v1, not v2? How to make sure this command checks v2?

@yurishkuro
Copy link
Member

This error didn't come up in my make all-in-one-integration-test

This test doesn't just run on its own, it expects the binary running, so how are you starting the binary?

For example, this works:

# terminal 1
$ go run -tags=ui ./cmd/jaeger

# terminal 2
$ SKIP_SAMPLING=true make all-in-one-integration-test

But this is running with default aio config for OTEL, whereas you need to provide your own config in order to test the same against CH.

@haanhvu
Copy link
Contributor Author

haanhvu commented Jan 1, 2024

go run -tags=ui ./cmd/jaeger

Ah, I ran the binary with make run-all-in-one... That's why v2 wasn't checked... Let me try again.

@haanhvu
Copy link
Contributor Author

haanhvu commented Jan 1, 2024

But this is running with default aio config for OTEL, whereas you need to provide your own config in order to test the same against CH.

The nil Pushtraces error with default aio config has been resolved.

So now I'll have to run clickhouse server, make my own jaeger v2 config (that exports to clickhouse) and go run -tags=ui ./cmd/jaeger --config=config.yaml right? I'll try it tomorrow.

Copy link

github-actions bot commented Jan 2, 2024

Test Results

2 054 tests  +2   2 044 ✅ +2   1m 9s ⏱️ ±0s
  221 suites +2      10 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit f7c95ed. ± Comparison against base commit d489e3a.

♻️ This comment has been updated with latest results.

@haanhvu
Copy link
Contributor Author

haanhvu commented Jan 2, 2024

@yurishkuro I ran the binary with my config.yaml, spans were exported successfully to clickhouse:
Screenshot from 2024-01-03 03-07-22

Should we add an e2e test now, or wait until spans reading are implemented?

Also, can you check the unit tests? I just have one factory_test.go, no test files for config.go or spanstore because factory already calls config and spanstore.

Pls check if there's anything else you think we should test in this PR.

@yurishkuro
Copy link
Member

Should we add an e2e test now, or wait until spans reading are implemented?

You wouldn't be able to do e2e without reading, that's how the tests verify that traces are saved.

@haanhvu haanhvu force-pushed the ch-pushtraces branch 3 times, most recently from 34f6464 to e498cdb Compare June 13, 2024 07:06
Signed-off-by: haanhvu <[email protected]>
@haanhvu
Copy link
Contributor Author

haanhvu commented Jun 13, 2024

@yurishkuro I rebased and made a little change following what we're having with other V2 storage. I checked the change locally and it worked:
Screenshot from 2024-06-13 15-15-16

There's one goleak failure in unit tests. However, the goleak failure is in database/sql package, not our code. Should we ignore or do what with this?

Also, the code coverage is a bit skewed. Seems like codecov checks per file or per package and crossed-files and cross-packages function calls are not counted. Do you want to rearrange the files/packages so that code coverage is more accurate?

@yurishkuro
Copy link
Member

There's one goleak failure in unit tests. However, the goleak failure is in database/sql package, not our code. Should we ignore or do what with this?

We shouldn't ignore leaks. Most likely it's from not closing resources in tests properly, you can see which goroutine is remaining running and understand where it starts. There are rare cases where libraries introduce unsolvable leaks (like glog), but I am skeptical database/sql would be such lib.

The codecov for storage is usually combined from unit tests and e2e tests.

Note that V2 Storage API already exists, and we have an adapter to v1 that will be used by the jaegerexporter. The Clickhouse implementation can be the first one to be natively supporting V2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:new-feature Change that should be called out as new feature in CHANGELOG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants