New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(ci): Switch to Confluent docker images since wurstmeister ones disappeared #20465
Conversation
…disappeared Signed-off-by: Jesse Szwedko <[email protected]>
Tests don't provide client certs Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
- KAFKA_LISTENERS=PLAINTEXT://:9091,SSL://:9092,SASL_PLAINTEXT://:9093 | ||
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9091,SSL://kafka:9092,SASL_PLAINTEXT://kafka:9093 | ||
- KAFKA_SSL_KEYSTORE_TYPE=PKCS12 | ||
- KAFKA_SSL_KEYSTORE_LOCATION=/certs/kafka.p12 | ||
- KAFKA_SSL_KEYSTORE_PASSWORD=NOPASS | ||
- KAFKA_SSL_TRUSTSTORE_TYPE=PKCS12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests aren't actually providing a valid client certificate so I removed the truststore bits and set KAFKA_SSL_CLIENT_AUTH=none
.
- KAFKA_OPTS=-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.conf | ||
- KAFKA_INTER_BROKER_LISTENER_NAME=SASL_PLAINTEXT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these seemingly unneeded options.
ports: | ||
- 9091:9091 | ||
- 9092:9092 | ||
- 9093:9093 | ||
volumes: | ||
- ../../../tests/data/ca/intermediate_server/private/kafka.p12:/certs/kafka.p12:ro | ||
- ../../../tests/data/ca/intermediate_server/private/kafka.pass:/etc/kafka/secrets/kafka.pass:ro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confluent images require a file to supply the password.
Signed-off-by: Jesse Szwedko <[email protected]>
@@ -1,5 +1,6 @@ | |||
KafkaServer { | |||
org.apache.kafka.common.security.plain.PlainLoginModule required | |||
serviceName="kafka" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was complaining this was missing now.
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Jesse Szwedko <[email protected]>
Signed-off-by: Jesse Szwedko <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Jesse Szwedko <[email protected]>
/ci-run-integration-kafka |
src/sources/kafka.rs
Outdated
"First batch of events should be non-zero (increase KAFKA_SHUTDOWN_DELAY?)" | ||
); | ||
assert_ne!(events2.len(), 0, "Second batch of events should be non-zero (decrease KAFKA_SHUTDOWN_DELAY or increase KAFKA_SEND_COUNT?) "); | ||
// TODO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The image update seems to have resulted in the first consumer no longer receiving events before the shutdown so I commented out the per-consumer assertions for now.
cc/ @jches I'm sorry to bother, but since you added this test, I expect you might more easily be able to figure out what's happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting..I wouldn't expect a different image to matter much; I think I used the bitnami kafka images when I was doing this testing locally. IIRC this test sets up two consumers consecutively, so the first one runs, shuts down, and then the second runs, to test the "drain acknowledgements before shutdown" part.
If the first consumer isn't getting any messages, you could allow it to run for a little bit longer - the default is 2s and can be adjusted in the test code a few lines above this, or by environment variables.
Another thing worth looking into is consumer group wait times - I don't recall the exact setting but there is a configurable waiting period where the brokers will wait for more consumers to join before the consumers can do their initial partition assignment. If that's set longer than the shutdown delay here it might explain this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah it's group.initial.rebalance.delay.ms
and looks like the default is 3s https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance . Not sure if the old images had a different setting that made this work..
I'd try setting that lower (1s would probably be fine in this case) in the broker config, or set the shutdown delay to 4 or 5s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That did it! Thank you!!
Datadog ReportBranch report: ✅ 0 Failed, 8 Passed, 0 Skipped, 25.41s Total Time |
Signed-off-by: Jesse Szwedko <[email protected]>
Regression Detector ResultsRun ID: 1e026c5d-a29a-47be-9b64-2ea97e71eb81 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | syslog_humio_logs | ingress throughput | +2.54 | [+2.35, +2.73] |
➖ | fluent_elasticsearch | ingress throughput | +1.93 | [+1.45, +2.42] |
➖ | http_elasticsearch | ingress throughput | +1.09 | [+1.01, +1.16] |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.54 | [+0.44, +0.65] |
➖ | otlp_grpc_to_blackhole | ingress throughput | +0.53 | [+0.44, +0.62] |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +0.51 | [+0.36, +0.65] |
➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.50 | [+0.34, +0.66] |
➖ | syslog_splunk_hec_logs | ingress throughput | +0.42 | [+0.35, +0.48] |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.41 | [+0.27, +0.55] |
➖ | http_to_http_acks | ingress throughput | +0.32 | [-1.03, +1.68] |
➖ | splunk_hec_route_s3 | ingress throughput | +0.24 | [-0.22, +0.69] |
➖ | syslog_loki | ingress throughput | +0.18 | [+0.10, +0.27] |
➖ | http_to_http_noack | ingress throughput | +0.15 | [+0.05, +0.25] |
➖ | otlp_http_to_blackhole | ingress throughput | +0.08 | [-0.03, +0.19] |
➖ | http_to_http_json | ingress throughput | +0.02 | [-0.06, +0.09] |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.00 | [-0.14, +0.14] |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | -0.01 | [-0.15, +0.14] |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.05 | [-0.16, +0.07] |
➖ | enterprise_http_to_http | ingress throughput | -0.09 | [-0.15, -0.03] |
➖ | http_to_s3 | ingress throughput | -0.16 | [-0.44, +0.12] |
➖ | http_text_to_http_json | ingress throughput | -0.25 | [-0.37, -0.13] |
➖ | socket_to_socket_blackhole | ingress throughput | -0.26 | [-0.34, -0.19] |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | -0.63 | [-0.72, -0.55] |
➖ | datadog_agent_remap_blackhole | ingress throughput | -1.09 | [-1.26, -0.92] |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -1.51 | [-1.61, -1.41] |
➖ | file_to_blackhole | egress throughput | -1.79 | [-4.30, +0.71] |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.86 | [-1.99, -1.73] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Regression Detector ResultsRun ID: ceb74c24-fdc5-4547-a695-5c2db39b0bf6 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | splunk_hec_route_s3 | ingress throughput | +1.77 | [+1.30, +2.23] |
➖ | syslog_loki | ingress throughput | +1.75 | [+1.67, +1.82] |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +1.73 | [+1.62, +1.84] |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | +1.68 | [+1.52, +1.84] |
➖ | datadog_agent_remap_blackhole | ingress throughput | +1.11 | [+0.98, +1.23] |
➖ | syslog_log2metric_humio_metrics | ingress throughput | +1.10 | [+0.96, +1.25] |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | +1.05 | [+0.96, +1.15] |
➖ | syslog_humio_logs | ingress throughput | +0.95 | [+0.85, +1.04] |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.93 | [+0.80, +1.05] |
➖ | http_to_http_acks | ingress throughput | +0.77 | [-0.60, +2.14] |
➖ | http_elasticsearch | ingress throughput | +0.57 | [+0.49, +0.65] |
➖ | fluent_elasticsearch | ingress throughput | +0.57 | [+0.07, +1.06] |
➖ | otlp_grpc_to_blackhole | ingress throughput | +0.56 | [+0.47, +0.65] |
➖ | socket_to_socket_blackhole | ingress throughput | +0.52 | [+0.43, +0.60] |
➖ | http_to_s3 | ingress throughput | +0.15 | [-0.13, +0.43] |
➖ | http_to_http_noack | ingress throughput | +0.14 | [+0.04, +0.25] |
➖ | http_to_http_json | ingress throughput | +0.03 | [-0.05, +0.10] |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.14, +0.15] |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | -0.00 | [-0.14, +0.14] |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.04 | [-0.16, +0.07] |
➖ | enterprise_http_to_http | ingress throughput | -0.08 | [-0.17, +0.00] |
➖ | file_to_blackhole | egress throughput | -0.22 | [-2.56, +2.12] |
➖ | http_text_to_http_json | ingress throughput | -0.59 | [-0.71, -0.48] |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -1.06 | [-1.19, -0.94] |
➖ | otlp_http_to_blackhole | ingress throughput | -1.55 | [-1.69, -1.40] |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | -1.98 | [-2.14, -1.82] |
➖ | syslog_splunk_hec_logs | ingress throughput | -4.11 | [-4.21, -4.00] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Signed-off-by: Jesse Szwedko <[email protected]>
Regression Detector ResultsRun ID: 906aff07-c8a9-41aa-a189-1a0d5b116954 Performance changes are noted in the perf column of each table:
No significant changes in experiment optimization goalsConfidence level: 90.00% There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.
|
perf | experiment | goal | Δ mean % | Δ mean % CI |
---|---|---|---|---|
➖ | syslog_splunk_hec_logs | ingress throughput | +1.94 | [+1.84, +2.04] |
➖ | file_to_blackhole | egress throughput | +1.27 | [-1.28, +3.82] |
➖ | fluent_elasticsearch | ingress throughput | +1.20 | [+0.71, +1.68] |
➖ | http_elasticsearch | ingress throughput | +1.19 | [+1.08, +1.29] |
➖ | syslog_humio_logs | ingress throughput | +1.04 | [+0.87, +1.20] |
➖ | syslog_loki | ingress throughput | +0.80 | [+0.71, +0.89] |
➖ | syslog_log2metric_humio_metrics | ingress throughput | +0.79 | [+0.65, +0.93] |
➖ | datadog_agent_remap_datadog_logs | ingress throughput | +0.58 | [+0.47, +0.70] |
➖ | datadog_agent_remap_blackhole_acks | ingress throughput | +0.24 | [+0.11, +0.37] |
➖ | otlp_http_to_blackhole | ingress throughput | +0.06 | [-0.07, +0.20] |
➖ | http_to_http_noack | ingress throughput | +0.06 | [-0.03, +0.14] |
➖ | http_to_http_json | ingress throughput | +0.05 | [-0.03, +0.13] |
➖ | syslog_regex_logs2metric_ddmetrics | ingress throughput | +0.01 | [-0.19, +0.22] |
➖ | splunk_hec_to_splunk_hec_logs_acks | ingress throughput | +0.00 | [-0.14, +0.14] |
➖ | splunk_hec_indexer_ack_blackhole | ingress throughput | +0.00 | [-0.15, +0.15] |
➖ | enterprise_http_to_http | ingress throughput | -0.05 | [-0.14, +0.04] |
➖ | splunk_hec_to_splunk_hec_logs_noack | ingress throughput | -0.06 | [-0.17, +0.06] |
➖ | syslog_log2metric_tag_cardinality_limit_blackhole | ingress throughput | -0.25 | [-0.41, -0.10] |
➖ | http_to_s3 | ingress throughput | -0.30 | [-0.58, -0.02] |
➖ | otlp_grpc_to_blackhole | ingress throughput | -0.40 | [-0.49, -0.31] |
➖ | syslog_log2metric_splunk_hec_metrics | ingress throughput | -0.40 | [-0.56, -0.25] |
➖ | datadog_agent_remap_datadog_logs_acks | ingress throughput | -0.53 | [-0.62, -0.45] |
➖ | http_text_to_http_json | ingress throughput | -0.61 | [-0.75, -0.46] |
➖ | socket_to_socket_blackhole | ingress throughput | -0.75 | [-0.84, -0.67] |
➖ | datadog_agent_remap_blackhole | ingress throughput | -0.82 | [-0.92, -0.71] |
➖ | http_to_http_acks | ingress throughput | -1.41 | [-2.77, -0.06] |
➖ | splunk_hec_route_s3 | ingress throughput | -1.56 | [-2.04, -1.07] |
Explanation
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
No description provided.