Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] test_redirect_io.rb - fixup for intermittent failures #3157

Merged
merged 1 commit into from
May 17, 2023

Conversation

MSP-Greg
Copy link
Member

Description

Puma's test suite runs with 'retry', which is helpful when CI systems are not 'resource robust'. Recent additions to CI log all test 'retry' failures. test_redirect_io.rb appears intermittently in these logs.

Hence, fixup with the hope of reducing or eliminating 'retry' failures.

  1. This file is about log files. Currently, it opens the file, checks if it can be read, then closes it, only to perform a File.read later. The read operation should be contained in the File.open block.
  2. The current File.open block may run forever, as it has no limit on the retry count.
  3. Reduce redundant code.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@MSP-Greg
Copy link
Member Author

Why I'm very frustrated with the test suite:

https://github.com/MSP-Greg/puma/actions?query=branch%3A00-test-redirect-io.rb

The tests passed in my fork before I submitted this PR. See test run 1036 in the above link. test_redirect_io.rb does not appear in the 'retry' logs.

After the failure here, I ran another job (using 'workflow_dispatch'), which is run 1037, which also passed.

The current test suite has too many blocking operations and creates too many threads.

When using an 'in process' server, blocking operations often stop the server. When using a 'spawned/integration' server, many of the tests runs parallel (in threads), so blocking operations interfere with the parallel execution.

As to threads, they easily block other threads, and take resources (and time) to create.

@dentarg
Copy link
Member

dentarg commented May 17, 2023

Should we run less tests in parallell? Just accept that it takes longer time to run the tests?

@MSP-Greg
Copy link
Member Author

MSP-Greg commented May 17, 2023

Thanks for reviewing.

Should we run less tests in parallel? Just accept that it takes longer time to run the tests?

I've run the 'tests.yaml` workflow about 350 times since I started working on this project. Some of those runs have been for unrelated PR's. Parallel tests can be much more consistent, but there are lot of changes. I also have code to parse the CI test logs, and it shows which parallel tests haven't completed. That code has been very helpful, as I have removed the timeout on many files, which drops the number of threads created by the test suite.

See https://github.com/MSP-Greg/puma/actions/workflows/tests.yaml?query=branch%3A00-test-update-jruby.

Some of the intermittent failures are due to the odd Puma http parser assert issue or the libev/epoll failures, which are probably impossible to repo and coming from nio4r, but not necessarily bugs in nio4r.

At present, the CI systems are either 2 core (Ubuntu & Windows) or 3 core (macos). If people are running CI locally (like me), parallel testing is very helpful.

As I mentioned, blocking can be a problem regardless of whether the tests are running parallel.

So, bottom line, I think we should keep some tests running parallel, as I think the stability of the test suite has a lot of room for improvement.

EDIT: JFYI, JRuby fails quite often. The changes I've got seem to drastically improve that.

@dentarg
Copy link
Member

dentarg commented May 17, 2023

Some of the intermittent failures are due to the odd Puma http parser assert issue or the libev/epoll failures, which are probably impossible to repo and coming from nio4r, but not necessarily bugs in nio4r.

Are we seeing #2905 in CI?

@MSP-Greg
Copy link
Member Author

Not about that error. The following are three I've copied from CI logs, but didn't include any notes:

Ubuntu ?
../libev/ev_epoll.c:134: epoll_modify: Assertion `("libev: I/O watcher with invalid fd found in epoll_ctl", errno != EBADF && errno != ELOOP && errno != EINVAL)' failed.


macOS ?
Assertion failed: (("libev: kqueue found invalid fd", 0)), function kqueue_poll, file ev_kqueue.c, line 133.

macOS Error
(libev) kqueue kevent: Bad file descriptor
rake aborted!
SignalException: SIGABRT

Copy link
Member

@nateberkopec nateberkopec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense + a nice cleanup as well.

@MSP-Greg MSP-Greg merged commit c467e57 into puma:master May 17, 2023
60 of 64 checks passed
@MSP-Greg MSP-Greg deleted the 00-test-redirect-io.rb branch May 18, 2023 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants