Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: use identity file as node is authority and remove init command and config-override flags #7766

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented May 15, 2024

Overall plan: https://www.notion.so/neondatabase/Rollout-Plan-simplified-pageserver-initialization-f935ae02b225444e8a41130b7d34e4ea?pvs=4


Ansible will soon write the node id to identity.toml in the work dir for new pageservers. On the pageserver side, we read the node id from the identity file if it is present and use that as the source of truth. If the file is not present, we rely on the node id stored in pageserver.toml.

This PR also removes the --init mode and the --config-override flag from the pageserver binary.
The neon_local is already not using these flags anymore.
Ansible still uses them until the linked change is merged, so, this PR has to land simultaneously or after the Ansible change due to that.

Related ansible change: https://github.com/neondatabase/aws/pull/1322
Cplane change to remove config-override usages: https://github.com/neondatabase/cloud/pull/13417
Closes: #7736

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Ansible will soon write the node id to `identity.toml` in the work
dir for new pageservers. On the pageserver side, we read the node
id from the identity file if it is present and use that as the source
of truth. If the file is not present, we rely on the node id stored
in `pageserver.toml`.

This allows us to remove the `pageserver init` step altogether since
Ansible was the only user of it.

Note that this change and the associated Ansible change should merge
roughly at the same time. While only one of them is merged we will
not be able to provision new pageservers.
@VladLazar VladLazar requested a review from a team as a code owner May 15, 2024 11:16
@VladLazar VladLazar requested review from skyzh and problame and removed request for skyzh May 15, 2024 11:16
Copy link

github-actions bot commented May 15, 2024

3060 tests run: 2933 passed, 0 failed, 127 skipped (full report)


Flaky tests (1)

Postgres 16

  • test_vm_bit_clear_on_heap_lock: debug

Code coverage* (full report)

  • functions: 31.4% (6350 of 20192 functions)
  • lines: 47.4% (47936 of 101065 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
338bbed at 2024-05-17T09:27:24.534Z :recycle:

Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title should mention that we remove the --init mode.

The test test_pageserver_init_node_id will fail because of that.

The removal of --init is a breaking change, so, we might as well get rid of --config-override in this PR as well. (The title should mention that as well)

pageserver/src/bin/pageserver.rs Outdated Show resolved Hide resolved
@problame problame marked this pull request as draft May 15, 2024 16:11
@VladLazar VladLazar changed the title pageserver: get node id from identity file at start-up pageserver: use identity file as node is authority and remove init command and config-override flags May 16, 2024
@VladLazar VladLazar requested review from problame and jcsp May 16, 2024 13:32
@VladLazar VladLazar marked this pull request as ready for review May 16, 2024 13:38
Copy link
Contributor

@problame problame left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor requests for changes, but approving to unblock. This does what we need for the rollout plan. Let's try that it works tomorrow by merging the aws.git PR first, then this one, and watching it land in staging.

pageserver/src/config.rs Show resolved Hide resolved
pageserver/src/config.rs Outdated Show resolved Hide resolved
pageserver/src/bin/pageserver.rs Outdated Show resolved Hide resolved
@VladLazar
Copy link
Contributor Author

The test-images action wasn't fetching submodules and it was causing the whole action to time out. Updated it. Let's see..

@VladLazar VladLazar requested a review from bayandin May 16, 2024 17:20
Dockerfile Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify pageserver initialization & configuration
3 participants