Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit mounting via volumes-from by passing orchestrator ID #924

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

cheald
Copy link

@cheald cheald commented Aug 26, 2019

tl;dr This helps CodeClimate engines not need intimiate docker host knowledge, which permits the usage of CodeClimate outside of docker-in-docker setups. In particular, this permits for easily running CodeClimate checks in Gitlab while retaining Docker layer caching, vastly improving the runtime of each build.

In contexts like self-hosted Gitlab, we sometimes have a context where we have an invoking runner like Gitlab CI running the Docker executor, which exposes the Docker socket to the running job, so that the running job may invoke its own Docker jobs on the host. Gitlab's top-level job will set up some filesystem context (/builds, mounted as a Docker volume, in the Gitlab case).

Right now, Gitlab can only support CodeClimate in a Docker-in-Docker runner, because CodeClimate performs volume mounting for the individual engines via Docker's --volume flag, which mounts not the path from the invoking container, but rather a path on the docker host. This requires that the path passed to CodeClimate as the CODECLIMATE_CODE variable match the real host path, and in the Gitlab CI case, we don't want that, so we have to "hide" the host with a DinD approach. However, this means that we also don't get any layer caching between jobs, which makes running CodeClimate prohibitively expensive, as all the engines etc have to be downloaded for each job.

By supporting Docker's volumes-from mounting option, we can instead tell the engines to inherit any mounts from the invoking orchestrator. This permits CodeClimate to allow the top-level context set up a Docker volume, bind it to the orchestrator, and then allow the orchestrator to pass that to invoked children. This sidesteps the issue of the Engines needing to know the actual host path; as long as the orchestrator's /code directory is mounted, the children can just presume to use it as-is.

To accomplish this, we just a) name the top-level container, and b) pass that name via the CODECLIMATE_ORCHESTRATOR env var:

    docker run \
      --interactive --tty --rm \
      --name codeclimate_orchestrator \
      --env CODECLIMATE_ORCHESTRATOR="codeclimate_orchestrator" \
      --env CODECLIMATE_CODE="/code" \
      --volume "$PWD":/code \
      --volume /var/run/docker.sock:/var/run/docker.sock \
      --volume /tmp/cc:/tmp/cc \
      codeclimate/codeclimate-wrapped analyze

In the bare-metal case, this doesn't change anything - we're mounting the real host path, which then gets passed to the individual children mounted on the /code mount.

While not immediately pertinent to the CodeClimate PR, In Gitlab, we can invoke the Gitlab codequality image like so:

script:
    - CONTAINER_ID=$(docker ps -q -f "label=com.gitlab.gitlab-runner.job.id=${CI_JOB_ID}")
    - BUILDS_VOLUME_ID=$(docker inspect $CONTAINER_ID --format '{{ range .Mounts }}{{ if eq .Destination "$CI_BUILDS_DIR" }}{{ .Name }}{{ end }}{{ end }}')
    - SOURCE_CODE="/code/${CI_PROJECT_DIR#$CI_BUILDS_DIR}"
    - docker run
        --rm
        --name "codeclimate_orchestrator_${CI_JOB_ID}"
        --env SOURCE_CODE=$SOURCE_CODE
        --env REPORT_FILENAME="gl-code-quality-report.json"
        --env CODECLIMATE_IMAGE="codeclimate:latest"
        --env ORCHESTRATOR_ID="codeclimate_orchestrator_${CI_JOB_ID}"
        --volume /var/run/docker.sock:/var/run/docker.sock
        --volume "${BUILDS_VOLUME_ID}":/code
        gitlab/codequality:latest $SOURCE_CODE

Because this job must be executed in a context that is visible to Docker, we can query Docker to get the current job's container ID, and from there get the volume ID mounted as $CI_BUILDS_DIR. We then volume mount that volume as /code, and specify /code as the "host" location of our code to be evaluated. The orchestrator will use the passed volume as /code, which is then passed onto the engine jobs, allowing the entire process to run against an ephemeral Docker volume rather than requiring a known path on the host.

@CLAassistant
Copy link

CLAassistant commented Aug 26, 2019

CLA assistant check
All committers have signed the CLA.

@HenningCash
Copy link

We are facing the same issue: Our application has no access to the Docker host, only to the Daemon itself via remote API. With this approach we could create a codeclimate container, copy all files to /code using the Docker API and run the analysis without touching the host's filesystem.

👍 Would love to see this PR merged in the near future.

@bufferoverflow
Copy link

@efueger I think this is worth to look at, WDYT?

@frakman1
Copy link

frakman1 commented Aug 18, 2020

@cheald I can't find codeclimate/codeclimate-wrapped anywhere. Where are you getting this from?

@HenningCash
Copy link

@cheald I can't find codeclimate/codeclimate-wrapped anywhere. Where are you getting this from?

I guess the image was built locally from this PR's branch and tagged codeclimate/codeclimate-wrapped.

@floh96
Copy link

floh96 commented Nov 2, 2022

@cheald Could you rebase?
@fede-moya @efueger Could you please review this pr? It is only a small change but it would make the use of codeclimate in Gitlab without docker in docker possible.

@fede-moya
Copy link
Contributor

Hi @floh96 👋🏼 !, sorry and yes I can make some room to review this one once it's rebased.

@fede-moya fede-moya self-requested a review November 2, 2022 14:05
tl;dr This helps CodeClimate engines not need intimiate docker host
knowledge.

In contexts like self-hosted Gitlab, we sometimes have a context where
we have an invoking runner like Gitlab CI running the Docker executor,
which exposes the Docker socket to the running job, so that the running
job may invoke its own Docker jobs on the host. Gitlab's top-level job
will set up some filesystem context (/builds, mounted as a Docker
volume, in the Gitlab case).

Right now, Gitlab can only support CodeClimate in a Docker-in-Docker
runner, because CodeClimate performs volume mounting for the individual
engines via Docker's --volume flag, which mounts not the path from the
invoking container, but rather a path on the docker host. This requires
that the path passed to CodeClimate as the CODECLIMATE_CODE variable
match the real host path, and in the Gitlab CI case, we don't want that,
so we have to "hide" the host with a DinD approach. However, this means
that we also don't get any layer caching between jobs, which makes
running CodeClimate prohibitively expensive, as all the engines etc have
to be downloaded for each job.

By supporting Docker's `volumes-from` mounting option, we can instead
tell the engines to inherit any mounts from the invoking orchestrator.
This permits CodeClimate to allow the top-level context set up a Docker
volume, bind it to the orchestrator, and then allow the orchestrator to
pass that to invoked children. This sidesteps the issue of the Engines
needing to know the actual host path; as long as the orchestrator's
/code directory is mounted, the children can just presume to use it
as-is.

To accomplish this, we just a) name the top-level container, and b) pass
that name via the CODECLIMATE_ORCHESTRATOR env var:

        docker run \
          --interactive --tty --rm \
          --name codeclimate_orchestrator \
          --env CODECLIMATE_ORCHESTRATOR="codeclimate_orchestrator" \
          --env CODECLIMATE_CODE="/code" \
          --volume "$PWD":/code \
          --volume /var/run/docker.sock:/var/run/docker.sock \
          --volume /tmp/cc:/tmp/cc \
          codeclimate/codeclimate-wrapped analyze

In the bare-metal case, this doesn't change anything - we're mounting
the real host path, which then gets passed to the individual children
mounted on the /code mount.

While not immediately pertinent to the CodeClimate PR, In Gitlab, we can
invoke the Gitlab codequality image like so:

    script:
        - CONTAINER_ID=$(docker ps -q -f "label=com.gitlab.gitlab-runner.job.id=${CI_JOB_ID}")
        - BUILDS_VOLUME_ID=$(docker inspect $CONTAINER_ID --format '{{ range .Mounts }}{{ if eq .Destination "/builds" }}{{ .Name }}{{ end }}{{ end }}')
        - docker run
            --rm
            --name "codeclimate_orchestrator_${CI_JOB_ID}"
            --env SOURCE_CODE="/code"
            --env CODECLIMATE_VERSION="volumes-from"
            --env ORCHESTRATOR_ID="codeclimate_orchestrator_${CI_JOB_ID}"
            --volume /var/run/docker.sock:/var/run/docker.sock
            --volume "${BUILDS_VOLUME_ID}":/code
            codequality:orch /code

("volumes-from" is my local Docker image for the altered CodeClimage
build, and "codequality:orch" is my altered Gitlab codequality image)

Because this job _must_ be executed in a context that is visible to
Docker, we can query Docker to get the current job's container ID, and
from there get the volume ID mounted as `/builds`. We then volume mount
that volume as /code, and specify /code as the "host" location of our
code to be evaluated. The orchestrator will use the passed volume as
/code, which is then passed onto the engine jobs, allowing the entire
process to run against an ephemeral Docker volume rather than requiring
a known path on the host.
@cheald
Copy link
Author

cheald commented Nov 3, 2022

Heh, wow, not often I see a 3 year old PR get necroed. Sure, I went ahead and rebased my branch onto master. It did rebase cleanly, but I didn't run any tests locally - I guess we'll see if CI still likes it!

@floh96
Copy link

floh96 commented Nov 9, 2022

fyi @fede-moya it's rebased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants