[CI] test_redirect_io.rb - fixup for intermittent failures #3157

MSP-Greg · 2023-05-17T14:17:13Z

Description

Puma's test suite runs with 'retry', which is helpful when CI systems are not 'resource robust'. Recent additions to CI log all test 'retry' failures. test_redirect_io.rb appears intermittently in these logs.

Hence, fixup with the hope of reducing or eliminating 'retry' failures.

This file is about log files. Currently, it opens the file, checks if it can be read, then closes it, only to perform a File.read later. The read operation should be contained in the File.open block.
The current File.open block may run forever, as it has no limit on the retry count.
Reduce redundant code.

Your checklist for this pull request

I have reviewed the guidelines for contributing to this repository.
I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
My pull request is 100 lines added/removed or less so that it can be easily reviewed.
If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
I have updated the documentation accordingly.
All new and existing tests passed, including Rubocop.

MSP-Greg · 2023-05-17T15:07:19Z

Why I'm very frustrated with the test suite:

https://github.com/MSP-Greg/puma/actions?query=branch%3A00-test-redirect-io.rb

The tests passed in my fork before I submitted this PR. See test run 1036 in the above link. test_redirect_io.rb does not appear in the 'retry' logs.

After the failure here, I ran another job (using 'workflow_dispatch'), which is run 1037, which also passed.

The current test suite has too many blocking operations and creates too many threads.

When using an 'in process' server, blocking operations often stop the server. When using a 'spawned/integration' server, many of the tests runs parallel (in threads), so blocking operations interfere with the parallel execution.

As to threads, they easily block other threads, and take resources (and time) to create.

dentarg · 2023-05-17T20:53:53Z

Should we run less tests in parallell? Just accept that it takes longer time to run the tests?

MSP-Greg · 2023-05-17T21:26:42Z

Thanks for reviewing.

Should we run less tests in parallel? Just accept that it takes longer time to run the tests?

I've run the 'tests.yaml` workflow about 350 times since I started working on this project. Some of those runs have been for unrelated PR's. Parallel tests can be much more consistent, but there are lot of changes. I also have code to parse the CI test logs, and it shows which parallel tests haven't completed. That code has been very helpful, as I have removed the timeout on many files, which drops the number of threads created by the test suite.

See https://github.com/MSP-Greg/puma/actions/workflows/tests.yaml?query=branch%3A00-test-update-jruby.

Some of the intermittent failures are due to the odd Puma http parser assert issue or the libev/epoll failures, which are probably impossible to repo and coming from nio4r, but not necessarily bugs in nio4r.

At present, the CI systems are either 2 core (Ubuntu & Windows) or 3 core (macos). If people are running CI locally (like me), parallel testing is very helpful.

As I mentioned, blocking can be a problem regardless of whether the tests are running parallel.

So, bottom line, I think we should keep some tests running parallel, as I think the stability of the test suite has a lot of room for improvement.

EDIT: JFYI, JRuby fails quite often. The changes I've got seem to drastically improve that.

dentarg · 2023-05-17T21:40:21Z

Some of the intermittent failures are due to the odd Puma http parser assert issue or the libev/epoll failures, which are probably impossible to repo and coming from nio4r, but not necessarily bugs in nio4r.

Are we seeing #2905 in CI?

MSP-Greg · 2023-05-17T22:20:46Z

Not about that error. The following are three I've copied from CI logs, but didn't include any notes:

Ubuntu ?
../libev/ev_epoll.c:134: epoll_modify: Assertion `("libev: I/O watcher with invalid fd found in epoll_ctl", errno != EBADF && errno != ELOOP && errno != EINVAL)' failed.


macOS ?
Assertion failed: (("libev: kqueue found invalid fd", 0)), function kqueue_poll, file ev_kqueue.c, line 133.

macOS Error
(libev) kqueue kevent: Bad file descriptor
rake aborted!
SignalException: SIGABRT

nateberkopec

Makes sense + a nice cleanup as well.

[CI] test_redirect_io.rb - fixup for intermittent failures

fcca905

dentarg approved these changes May 17, 2023

View reviewed changes

nateberkopec added the CI / Testing label May 17, 2023

nateberkopec approved these changes May 17, 2023

View reviewed changes

MSP-Greg merged commit c467e57 into puma:master May 17, 2023
60 of 64 checks passed

MSP-Greg deleted the 00-test-redirect-io.rb branch May 18, 2023 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

MSP-Greg commented May 17, 2023

MSP-Greg commented May 17, 2023

dentarg commented May 17, 2023

MSP-Greg commented May 17, 2023 •

edited

dentarg commented May 17, 2023

MSP-Greg commented May 17, 2023

nateberkopec left a comment

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

Conversation

MSP-Greg commented May 17, 2023

Description

Your checklist for this pull request

MSP-Greg commented May 17, 2023

dentarg commented May 17, 2023

MSP-Greg commented May 17, 2023 • edited

dentarg commented May 17, 2023

MSP-Greg commented May 17, 2023

nateberkopec left a comment

Choose a reason for hiding this comment

MSP-Greg commented May 17, 2023 •

edited