Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to DisableCrlValidation test to improve reliability #5537

Merged
merged 9 commits into from
May 16, 2024

Conversation

ahsonkhan
Copy link
Member

@ahsonkhan ahsonkhan commented Apr 16, 2024

Fix #5533

It's easiest to review the actual changes in the PR without whitespaces:
https://github.com/Azure/azure-sdk-for-cpp/pull/5537/files?diff=split&w=1

@ahsonkhan ahsonkhan added Azure.Core test-reliability Issue that causes tests to be unreliable labels Apr 16, 2024
@ahsonkhan ahsonkhan self-assigned this Apr 16, 2024
@ahsonkhan ahsonkhan marked this pull request as ready for review April 16, 2024 05:51
@RickWinter
Copy link
Member

Why is this only relevant in this test? I would expect we'd need to handle this problem in the product as well.

How do we know the transport exception is due to the network and not how the request is formulated?

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Apr 16, 2024

Why is this only relevant in this test?

The issue linked has more info. This is the only test that's failing intermittently due to the CRL validation check timing out for our test site. This test enables that TransportOptions EnableCertificateRevocationListCheck setting. For others, this setting is off by default (or use "real" sites).

I would expect we'd need to handle this problem in the product as well.

Ability to deal with transient network failures is handled in the product, via the RetryPolicy. This test is directly calling the transport layer. See the pipeline being used in the tests:

std::vector<std::unique_ptr<Azure::Core::Http::Policies::HttpPolicy>> pipelinePolicies;
pipelinePolicies.push_back(
std::make_unique<Azure::Core::Http::Policies::_internal::TransportPolicy>(
transportOptions));

How do we know the transport exception is due to the network and not how the request is formulated?

Because it is intermittent, at 95%+ success rate, and we know the error. The CRL validation check is hitting a transient timeout. If there was an issue with the request, it would be more determinsitic and we'd see different error behavior.

@ahsonkhan
Copy link
Member Author

/azp run cpp - core - ci

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ahsonkhan
Copy link
Member Author

@danieljurek CI pipeline runs keep getting canceled after 45 minutes. Can you please help with this? Otherwise green runs aren't showing up as passing.

@danieljurek
Copy link
Member

danieljurek commented Apr 16, 2024

The win2022 image has been deleted at that version. we'll have to roll forward to the latest image which has the problems detailed here: #5483 ... I'm working on removing unneeded packages based on the mitigation described here: actions/runner-images#9701

Copy link
Member

@antkmsft antkmsft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be merged now?

@ahsonkhan ahsonkhan enabled auto-merge (squash) May 16, 2024 17:17
@ahsonkhan ahsonkhan disabled auto-merge May 16, 2024 17:47
@ahsonkhan ahsonkhan enabled auto-merge (squash) May 16, 2024 17:51
@ahsonkhan ahsonkhan merged commit b7ae7b8 into main May 16, 2024
41 checks passed
@ahsonkhan ahsonkhan deleted the ahsonkhan-patch-1 branch May 16, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Core test-reliability Issue that causes tests to be unreliable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intermittent test failure for TransportAdapterOptions.DisableCrlValidation - unable to get certificate CRL
5 participants