Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Docker image and push to GHCR #230

Open
wants to merge 11 commits into
base: unstable/v1
Choose a base branch
from

Conversation

br3ndonland
Copy link
Contributor

@br3ndonland br3ndonland commented Apr 19, 2024

Description

Closes #58

Up to this point, the project has been set up as a Docker action referencing the Dockerfile.

runs:
using: docker
image: Dockerfile

The downside to using the Dockerfile for the action is that the Docker image must be built every time the action is used (#58).

This PR will set up the project to build the Docker image and push it to GitHub Container Registry (GHCR). This change will speed up user workflows every time the action is used because the workflows will simply pull the Docker image from GHCR instead of building again.

Changes

Build container image with GitHub Actions

This PR will build Docker images with the Docker CLI (docker build). Builds will include inline cache metadata so layers can be reused by future builds.

This PR only proposes to build container images for x86_64 (linux/amd64) because GitHub Actions Linux runners currently only support x86_64 CPU architectures (actions/runner-images#5631), and this project only supports GitHub Actions Linux runners. The README explains:

Since this GitHub Action is docker-based, it can only be used from within GNU/Linux based jobs in GitHub Actions CI/CD workflows. This is by design and is unlikely to change due to a number of considerations we rely on.

Push container image to GHCR

The workflow will log in to GHCR using the built-in GitHub token and push the Docker image. Workflow runs triggered by pull requests will build the Docker image and run the smoke tests but will not push the Docker image.

Update action to pull container image from GHCR

Docker actions support pulling in pre-built Docker images by supplying a registry address to the image: key. The downside to this syntax is that there's no way to specify the correct Docker tag because the GitHub Actions image: and uses: keys don't accept any context. For example, if a user's workflow has uses: pypa/gh-action-pypi-publish@release/v1.8, then the action should pull in a Docker image built from the release/v1.8 ref, something like ghcr.io/pypa/gh-action-pypi-publish:release-v1.8 (Docker tags can't have /).

# this works but the image tag can't be customized
runs:
  using: docker
  image: docker://ghcr.io/pypa/gh-action-pypi-publish:release-v1.8
# this doesn't work because `image:` doesn't support context
runs:
  using: docker
  image: docker://ghcr.io/pypa/gh-action-pypi-publish:${{ github.action_ref }}

The workaround is to switch the top-level action.yml to a composite action that then calls the Docker action, substituting the correct image name and tag.

Related

@webknjaz
Copy link
Member

This looks.. intriguing! I don't remember if I ever considered combining composite+docker actions (I did play with having two composites in the same repo in the past, though).

I'll need to take some time to think about it and look through the patch more closely. Please, don't expect an immediate review, however it does look very promising at glance!

Originally I thought that I'd have a workflow where I trigger a release, that release adds a commit that hardcodes an update to action.yml with the "future" version tag, tags that commit and pushes it (post docker publish). It wouldn't be on the main branch, the tags would be on the orphaned leaves.

This looks like a better idea so far. Thanks again!

br3ndonland added a commit to br3ndonland/gh-action-pypi-publish that referenced this pull request Apr 26, 2024
@br3ndonland
Copy link
Contributor Author

That sounds great. Take your time. Thanks for your consideration.

If you do decide to accept this change, I'm happy to help maintain the workflows in the future. Feel free to mention me @br3ndonland and I will help address any issues that come up.

webknjaz pushed a commit to br3ndonland/gh-action-pypi-publish that referenced this pull request May 16, 2024
@webknjaz
Copy link
Member

Thanks! I've hit "rebase" on the UI to get this on top of the recent changes/linting/lockfile bumps but haven't yet looked into it deeper.

Comment on lines 36 to 44
args:
- ${{ inputs.user }}
- ${{ inputs.password }}
- ${{ inputs.repository-url }}
- ${{ inputs.packages-dir }}
- ${{ inputs.verify-metadata }}
- ${{ inputs.skip-existing }}
- ${{ inputs.verbose }}
- ${{ inputs.print-hash }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't actually used. Somebody contributed them, but I don't see them being needed. So perhaps we shouldn't keep them around anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the inputs used by the entrypoint script?

INPUT_REPOSITORY_URL="$(get-normalized-input 'repository-url')"
INPUT_PACKAGES_DIR="$(get-normalized-input 'packages-dir')"
INPUT_VERIFY_METADATA="$(get-normalized-input 'verify-metadata')"
INPUT_SKIP_EXISTING="$(get-normalized-input 'skip-existing')"
INPUT_PRINT_HASH="$(get-normalized-input 'print-hash')"

pull_request:
workflow_run:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two separate workflow runs is often hard to track. Instead, I adopted a practice of modularizing the workflow pieces as reusable workflows having the reusable- prefix in their names. This allows embedding everything in all the right places. Let's try this, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will return to this suggestion at a later time.

.pre-commit-config.yaml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
steps:
- name: Reset path if needed
run: |
# Reset path if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be needed outside the container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you've set up a test that modifies the $PATH (#112, 1350b8b).

- name: ✅ Smoke-test the locally checked out action
uses: ./test
env:
DEBUG: >-
true
PATH: utter-nonsense

action.yml Outdated Show resolved Hide resolved
action.yml Outdated
Comment on lines 115 to 130
- name: Set Docker image name and tag
run: |
# Set Docker image name and tag
# if action run was triggered by a pull request to this repo,
# build image from Dockerfile because it has not been pushed to GHCR,
# else pull image from GHCR
if [[ $GITHUB_EVENT_NAME == "pull_request" ]] &&
[[ $GITHUB_REPOSITORY == "pypa/gh-action-pypi-publish" ]]; then
IMAGE="../../../Dockerfile"
else
REF=${{ steps.set-repo-and-ref.outputs.ref }}
REPO=${{ steps.set-repo-and-ref.outputs.repo }}
IMAGE="docker://ghcr.io/$REPO:${REF/'/'/'-'}"
fi
FILE=".github/actions/run-docker-container/action.yml"
sed -i -e "s|{{image}}|$IMAGE|g" "$FILE"
shell: bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I was thinking… Why do we need to check in action.yml to Git even if this modifies it anyway? Why not generate it all, then?

Also, I'd use Python instead of Bash here. Then, it'd be possible to have a dict with data and write it as YAML using json.dump() (because JSON is valid YAML, almost always).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work. I've added a Python script that generates the Docker action (9ae1850).

Regarding your related comment:

I think that generating the file is a good idea. It should be possible to write the file without bringing in the PyYAML dependency. But it's not that easy for reading it. Can we make use of yq somehow, and convert YAML to JSON this way, maybe?

I've started by using PyYAML but will think about a way to do this with yq.

user: ${{ inputs.user }}
password: ${{ inputs.password }}
repository-url: ${{ inputs.repository-url || inputs.repository_url }}
packages-dir: ${{ inputs.packages-dir || inputs.packages_dir }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break the deprecation messages, it seems. Have you checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The new inputs don't have defaults, so for example, ${{ inputs.repository-url || inputs.repository_url }} will default to inputs.repository_url and deprecationMessage will be logged.

@webknjaz
Copy link
Member

I'm done with the initial review. More is needed, but I'd rather accept what I can through separate PRs to make this one smaller. And the suggested refactoring could be done in parallel. I think that generating the file is a good idea. It should be possible to write the file without bringing in the PyYAML dependency. But it's not that easy for reading it. Can we make use of yq somehow, and convert YAML to JSON this way, maybe?

@br3ndonland
Copy link
Contributor Author

@webknjaz thank you for your detailed review. I've addressed most of your comments so far.

@webknjaz
Copy link
Member

Commits like 213c885ac41d769527ac150e2e633bb1ccd886d4 aren't really necessary since Git would automatically absorb changes applied separately. FYI. In fact, this one may have a harmful effect — when the label change is merged into the default branch, and this one rebased on top, Git will keep the removal commit and attempt deleting the label 🤷‍♂️

Anyway, we'll address this later on.



def set_image(event: str, ref: str, repo: str) -> str:
if event == 'pull_request' and 'gh-action-pypi-publish' in repo:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you check for 'gh-action-pypi-publish' in repo? How about forks that renamed it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the workflow was triggered by a PR to this repo (pypa/gh-action-pypi-publish), then the action should build the Docker image. Else, the action should be using the GHCR Docker image.

This is important because workflows in other repos might run the marketplace action on PRs. For example, they might want to smoke test package uploads using DevPI like this:

name: Smoke test workflow in a different repo

on:
  pull_request:

env:
  devpi-password: abcd1234
  devpi-username: root
  devpi-port: 3141

jobs:
  smoke-test:
    runs-on: ubuntu-latest
    services:
      devpi:
        image: muccg/devpi
        env:
          DEVPI_PASSWORD: ${{ env.devpi-password }}
        ports:
          - 3141
    timeout-minutes: 2
    steps:
      - name: Check out the action locally
        uses: actions/checkout@v4
      - name: Install the packaging-related tools
        run: python3 -m pip install build twine
      - name: Create the stub package importable directory
        run: mkdir -pv src/test_package
      - name: Populate the stub package `__init__.py`
        run: echo '__version__ = "0.1"' > src/test_package/__init__.py
      - name: Populate the stub package `README.md`
        run: echo "# Test Package" > README.md
      - name: Populate the stub package `pyproject.toml`
        run: echo "$CONTENTS" > pyproject.toml
        env:
          CONTENTS: |
            [build-system]
            requires = [
              "setuptools == 65.6.3",
            ]
            build-backend = "setuptools.build_meta"

            [project]
            name = "test-package"
            version = "0.1"
            readme = "README.md"
      - name: Build the stub package sdist and wheel distributions
        run: python3 -m build
      - name: Register the stub package in devpi
        run: twine register dist/*.tar.gz
        env:
          TWINE_USERNAME: ${{ env.devpi-username }}
          TWINE_PASSWORD: ${{ env.devpi-password }}
          TWINE_REPOSITORY_URL: >-
            http://localhost:${{
              job.services.devpi.ports[env.devpi-port]
            }}/${{
              env.devpi-username
            }}/public/
      - name: Smoke-test the marketplace action
        uses: pypa/gh-action-pypi-publish@unstable/v1
        with:
          user: ${{ env.devpi-username }}
          password: ${{ env.devpi-password }}
          repository-url: http://devpi:${{ env.devpi-port }}/${{ env.devpi-username }}/public/

In that case, github.event_name would be pull_request, but the repo name would be different.

As far as handling PRs to/from renamed forks of this repo (pypa/gh-action-pypi-publish), if you have some other way of specifying "this repo" that doesn't include gh-action-pypi-publish, I'm happy to consider it.

action.yml Outdated
- name: Create Docker container action
run: |
# Create Docker container action
python -m pip install -r requirements/github-actions.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right way of using constraint files. Here's what should be done instead:

Suggested change
python -m pip install -r requirements/github-actions.txt
PIP_CONSTRAINT=requirements/github-actions.txt python -Im pip install --only-binary=:all: -r requirements/github-actions.in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the latest approach of dumping JSON to a .yml file, we won't need to deal with the PyYAML requirement.

# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --allow-unsafe --config=../.pip-tools.toml --output-file=github-actions.txt --strip-extras github-actions.in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling PyYAML on various VMs might be problematic, so I'd pass --only-binary=:all: via pip-opts or even put it into the config. This way, installing would be more predictable on different versions of Ubuntu VMs that GitHub has.

Another concern is that originally, I opted for a docker-based action so that the external runtime is not mutated (that's one of a few reasons). By installing things with pip we violate that promise. I was kinda hoping that it'd be possible to avoid using a third-party dependency at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the latest approach of dumping JSON to a .yml file, we won't need to deal with the PyYAML requirement.

action.yml Outdated
- name: Create Docker container action
run: |
# Create Docker container action
python -m pip install -r requirements/github-actions.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider, provided that it's actually needed (see https://github.com/pypa/gh-action-pypi-publish/pull/230/files#r1619055808), is making a virtualenv managed by this action (in its checkout/tmp dir so that the end-user's projects aren't polluted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the latest approach of dumping JSON to a .yml file, we won't need to deal with the PyYAML requirement.

Comment on lines 78 to 79
with action_path.open(mode='w', encoding='utf-8') as file:
yaml.dump(action, file, allow_unicode=True, sort_keys=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check if using just

Suggested change
with action_path.open(mode='w', encoding='utf-8') as file:
yaml.dump(action, file, allow_unicode=True, sort_keys=False)
action_path.write_text(json.dumps(action), encoding='utf-8')

would work?

Most of the time, any valid JSON is also valid YAML so GHA should be able to read it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could have a template for the YAML file that would be post-processed by Template strings, str.format_map() or even just f-strings.

Any of these would let us avoid having an external dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get it to work by dumping JSON to a .yml file.

@webknjaz
Copy link
Member

@br3ndonland I'm sure my review is incomplete, but hopefully it gives you enough ideas to try out until the next time.

I think it'd be nice to get this in before #236 if at all possible.

@webknjaz
Copy link
Member

@br3ndonland could you also rebase this branch locally? The GH button doesn't work, meaning there's going to be some conflicts to resolve.

Up to this point, the project has been set up as a Docker action
referencing the Dockerfile. The downside to using the Dockerfile for the
action is that the Docker image must be built every time the action is
used.

This commit will set up the project to build the Docker image and push
it to GitHub Container Registry (GHCR). This change will speed up user
workflows every time the action is used because the workflows will
simply pull the Docker image from GHCR instead of building again.

Changes:

- Add required metadata to Dockerfile
- Build container image with GitHub Actions
- Push container image to GHCR

Docker actions support pulling in pre-built Docker images. The downside
is that there's no way to specify the correct Docker tag because the
GitHub Actions `image` and `uses:` keys don't accept any context.
For example, if a user's workflow has
`uses: pypa/gh-action-pypi-publish@release/v1.8`, then the action should
pull in a Docker image built from the `release/v1.8` branch, something
like `ghcr.io/pypa/gh-action-pypi-publish:release-v1.8` (Docker tags
can't have `/`). The workaround is to switch the top-level `action.yml`
to a composite action that then calls the Docker action, substituting
the correct image name and tag.
@br3ndonland
Copy link
Contributor Author

@br3ndonland could you also rebase this branch locally? The GH button doesn't work, meaning there's going to be some conflicts to resolve.

No problem. I'm not sure what "the GH button" is (some GitHub Mobile thing?) but I've rebased the branch.

I will address the other comments soon. Looks like there's a small git fetch issue in the smoke test workflow so I will look into that as well.

@br3ndonland
Copy link
Contributor Author

I was able to get it to work by dumping JSON to a .yml file. The file must have a .yml or .yaml extension (GitHub Actions will raise "Error: Can't find 'action.yml', 'action.yaml' or 'Dockerfile'" on .json files), but the file contents can be JSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build&publish the base container to GHCR + point to it from action
2 participants