Explore merging development and CI environments #3946

kocolosk · 2022-02-27T22:42:08Z

Overview

Just a work in progress right now, but the ideas I'm exploring include:

Using the same container image for PRs in Jenkins and as the default container environ in e.g. VS Code
Converting over to the official Erlang container images so we can automatically stay up-to-date with patch releases
Using a separate FDB container for the FDB server and linking it to the development environment

Testing recommendations

Opening a PR to see how Jenkins handles the linked container approach. I'm intending to follow the pattern from https://www.jenkins.io/doc/book/pipeline/docker/#running-sidecar-containers

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

New `elixir-suite` Makefile target is added. It runs a predefined set of elixir integration tests. The feature is controlled by two files: - test/elixir/test/config/suite.elixir - contains list of all available tests - test/elixir/test/config/skip.elixir - contains list of tests to skip In order to update the `test/elixir/test/config/suite.elixir` when new tests are added. The one would need to run the following command: ``` MIX_ENV=integration mix suite > test/elixir/test/config/suite.elixir ```

Add ability to control which Elixir integration tests to run

All endpoints but _session support gzip encoding and there's no practical reason for that. This commit enables gzip decoding on compressed requests to _session.

1. The caching effort was a bust and has been removed. 2) chunkify can be done externally with a custom persist_fun.

* Simplify and speedup dev node startup This patch introduces an escript that generates an Erlang .boot script to start CouchDB using the in-place .beam files produced by the compile phase of the build. This allows us to radically simplify the boot process as Erlang computes the optimal order for loading the necessary modules. In addition to the simplification this approach offers a significant speedup when working inside a container environment. In my test with the stock .devcontainer it reduces startup time from about 75 seconds down to under 5 seconds. * Rename boot_node to monitor_parent * Add formatting suggestions from python-black Co-authored-by: Paul J. Davis <[email protected]>

* Add a development container config for VS Code This creates a development environment with a FoundationDB server and a CouchDB layer in two containers, sharing a network through Docker Compose. It uses the FDB image published to Docker Hub for the FDB container, and downloads the FDB client packages from foundationdb.org to provide the development headers and libraries. www.foundationdb.org is actually not trusted in Debian Buster by default, so we have to download the GeoTrust_Global_CA.pem. The following link has more details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=962596 Once the Docker Compose setup is running, VS Code executes the create_cluster_file.bash script to write down a cluster file containing the IP address in the compose network where the FDB service can be found. This cluster file is used both for a user-driven invocation of `./dev/run`, as well as for unit tests that require a running CouchDB. Additionally, I've got a small fix to the way we run explicitly specified eunit tests: * Run eunit tests for each app separately The `eunit` target executes a for loop that appears intended to use a separate invocation of rebar for each Erlang application's unit tests. When running `make eunit` without any arguments this works correctly, as the for loop processes the output of `ls src`. But if you specify a comma-delimited list of applications the for loop will treat that as a single argument and pass it down to rebar. This asymmetry is surprising, but also seems to cause some issues with environment variables not being inherited by the environment used to execute the tests for the 2..N applications in the list. I didn't bother digging into the rebar source code to figure out what was happening there. This patch just parses the incoming comma-delimited list with `sed` to create a whitespace-delimited list for the loop, so we get the same behavior regardless of whether we are specifying applications explicitly or not.

…g: chunked (#3360) Transfer-Encoding: chunked causes the server to wait indefinitely, then issue a a 500 error when the client finally hangs up, when PUTing a multipart/related document + attachments. This commit fixes that issue by adding proper handling for chunked multipart/related requests.

This allows users to verify that compaction processes are suspended outside of any configured strict_window.

Show process status in active_tasks

These two test cases expose the subtle bug in ebtree:lookup_multi/3 where a key that doesn't exist in the tree can prevent a subsequent lookup key from matching in the same KV node.

If one of the provided lookup keys doesn't exist in the ebtree, it can inadvertently prevent a second lookup key from being found if it the first key greater than the missing lookup key is equal to the second lookup key.

use collate in lookup

A tidier version of #3384 that saves an unnecessary call to collate.

Optimize lookup/3

Previously, when an erlfdb error occured and a recursive call to `update/3` was made, the result of that call was always matched against `{Mrst, State}`. However, in the case when the call had finalized and returned `couch_eval:release_map_context/1` response, the result would be `ok` which would blow with a badmatch error against `{Mrst, State}`.

* Win32-SM91 support and fixes * spidermonkey_68 identified as spidermonkey_60 and erroneously(?) blocked by configure on aarch64 #3149 * remove unnecessary shell when setting ERL_AFLAGS * fix foundationdb urls in github workflow * quote AFLAGS like win echo and fix references to pwd Co-authored-by: Will <[email protected]>

Instead of building one image with all supported Erlang versions through kerl, this configuration looks for a specific container image for each Erlang version. Decoupling it like this enables us to more easily adopt newer distros for newer Erlang versions, and to build new images with patch releases of Erlang without needing a simultaneous PR to the CouchDB repo to pick them up in CI (although some change to Jenkins might be needed to avoid images being cached for too long when a stable tag changes).

This avoids the situation where a build fails with a timeout because all the docker-based agents were busy running other jobs. Jenkins' semantics for options.timeout is that the stage-specific timeout starts the countdown even while waiting for an agent matching the selected label to become available. We see occasional spurious job failures as a result.

This makes it easier to observe the pipeline progres in the UI. We get timings for each step in the build, and if one of the steps fails the logs for that step will be the only ones expanded by default. We can also label each of the steps to provide a bit more context to the developer about what the CI pipeline is actually doing.

It seems different versions of mix cannot agree on how these lines ought to be formatted, so let's try to help out.

This is one of those situations where you go in to make a small change, see an opportunity for some refactoring, and get sucked into a rabbit hole that leaves you wondering if you have any idea how computers actually work. My initial goal was simply to update the Erlang version used in our binary packages to a modern supported release. Along the way I decided I wanted to figure out how to eliminate all the copypasta we generate for making any change to this file, and after a few days of hacking here we are. This rewrite has the following features: * Updates to use Debian 11 (current stable) as the base image for building releases and packaging repos. * Defaults to Erlang 24.2 as the embedded Erlang version in packages. * Dynamically generates the parallel build stages used to test and package CouchDB on various OSes. This is accomplished through a bit of scripted pipeline code that relies on two new methods defined at the beginning of the Jenkinsfile, one for "native" builds on macOS and FreeBSD and one for container-based builds. See comments in the Jenkinsfile for additional details. * Expands commands like `make check` into a series of steps to improve visibility. The Jenkins UI will now show the time spent in each step of the build process, and if a step (e.g. `make eunit`) fails it will only expand the logs for that step by default instead of showing the logs for the entire build stage. The downside is that if we do make changes to the series of targets underneath `check` we need to remember to update the Jenkinsfile as well. * Starts per-stage timer _after_ agent is acquired. Previously builds could fail with a 15m timeout when all they did was sit in the build queue.

@nickva

Credit to @nickva for the original improvements. The main branch is already Erlang 21+ so the minimum version check is less essential, but the performance improvements are greatly appreciated!

Fix typos

* Remove emilio-related Python script The Emilio style checker was removed in #3674. * Remove unused scripts from autotools days * Update credo to support Elixir v1.12 * Ensure the bin directory sticks around

- fixing links

Added the commit message conventions from the proposal of discussion #3918 and updated all links to use https and moved all external links to the end of the file

Long overdue, lots of build improvements and a couple of bug fixes in that patch release.

Still a work in progress, but the idea is that developers should be working with the same base image that we use to validate Pull Requests in CI. I've also started to add a GitHub Action that could publish these devcontainer images on a regularly scheduled basis to pick up fixes and new patch releases from upstream.

iilyak and others added 30 commits December 2, 2020 10:16

Use elixir-suite

5b8bf5a

Merge pull request #3286 from cloudant/specify-elixir-tests

be87c40

Add ability to control which Elixir integration tests to run

treat 408 as a retryable error condition (#3303)

df2fb67

Goodbye 2020. Hello 2021. YES. (#3318)

af436c1

Upgrade Credo to 1.5.4

7f2feb0

Add to credo ignores and gitignore new file_system dependency

0eff137

Switch from assert length === 0 to Enum.empty? as Credo suggests

a89242d

Allow gzipped requests to _session (#3323)

bc9773a

All endpoints but _session support gzip encoding and there's no practical reason for that. This commit enables gzip decoding on compressed requests to _session.

Update README.md

b2a34dc

1. The caching effort was a bust and has been removed. 2) chunkify can be done externally with a custom persist_fun.

remove {restart_tx, true} from mango _all_docs

94c5fe0

add http error for fdb 1031

053595c

fixing links after master->main branch rename

1780573

fix additional links after branch renaming (master -> main)

127c441

Handle all erlfdb error codes (#3355)

0b488de

Show process status in active_tasks

a9f2a5e

This allows users to verify that compaction processes are suspended outside of any configured strict_window.

Merge pull request #3365 from apache/active-tasks-process-status-main

5f43148

Show process status in active_tasks

use collate in lookup

6f6db1e

Fix typo

79b64ea

Add failing cases for ebtree:lookup_multi/3 bug

73875b5

These two test cases expose the subtle bug in ebtree:lookup_multi/3 where a key that doesn't exist in the tree can prevent a subsequent lookup key from matching in the same KV node.

Fix ebtree:lookup_multi/3

ec4b213

If one of the provided lookup keys doesn't exist in the ebtree, it can inadvertently prevent a second lookup key from being found if it the first key greater than the missing lookup key is equal to the second lookup key.

Merge pull request #3384 from apache/ebtree-lookup-collate-eq

3d4a827

use collate in lookup

Optimize lookup/3

45d4039

A tidier version of #3384 that saves an unnecessary call to collate.

Merge pull request #3386 from apache/ebtree-lookup-opt

a9e0ebe

Optimize lookup/3

Set default nodes in dev/run to 1

6822fe4

Make session elixir test more robust

04086e6

Will and others added 22 commits December 17, 2021 10:57

Add rebar3 and erlfmt install commands to configure.ps1 #3873

8cc41b3

Remove ERL_OPTS

7f63d93

Update Jenkins Erlang versions, add 24 (#3892)

e847d5f

Remove CI support for Ubuntu 16.04

99b6871

Bump Credo to 1.5.6 for Elixir 1.12 support

004f799

Tweak Elixir formatting

6130c6b

It seems different versions of mix cannot agree on how these lines ought to be formatted, so let's try to help out.

Forward port erlfmt improvements from #3837

fca5a2e

Credit to @nickva for the original improvements. The main branch is already Erlang 21+ so the minimum version check is less essential, but the performance improvements are greatly appreciated!

Apply new formatting from erlfmt

1a42659

Fix typos (#3916)

bef7838

Fix typos

Random bits of cleanup (#3914)

76fb5b9

* Remove emilio-related Python script The Emilio style checker was removed in #3674. * Remove unused scripts from autotools days * Update credo to support Elixir v1.12 * Ensure the bin directory sticks around

Fix publication of nightly packages (#3926)

a2f3626

- rename master to main

e584e1e

- fixing links

Adding commit message conventions and update links

5777fd2

Added the commit message conventions from the proposal of discussion #3918 and updated all links to use https and moved all external links to the end of the file

Use https as default protocol for links

6c28960

Bump erlfdb to v1.3.5

237f5a9

Long overdue, lots of build improvements and a couple of bug fixes in that patch release.

Experiment with linking an FDB sidecar container

9d741e1

kocolosk force-pushed the merge-devcontainer-jenkins branch from 130bad7 to 9d741e1 Compare February 27, 2022 22:49

kocolosk added 6 commits February 27, 2022 19:02

Use a matrix for building devcontainer images

a5b1f00

Try new images

d80ba3d

Minor fixup

6ead351

Use getent instead of dig to support container links

95467f0

A bit more debugging

20a3c2b

Very puzzling, this

2099c70

nickva force-pushed the main branch from e41407e to a1fc807 Compare June 7, 2022 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore merging development and CI environments #3946

Explore merging development and CI environments #3946

kocolosk commented Feb 27, 2022 •

edited

Explore merging development and CI environments #3946

Are you sure you want to change the base?

Explore merging development and CI environments #3946

Conversation

kocolosk commented Feb 27, 2022 • edited

Overview

Testing recommendations

Checklist

kocolosk commented Feb 27, 2022 •

edited