Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(docker): Checkov installation silently fails on docker build in arm64. Workaround till issue will be fixed in checkov itself #635

Merged

Conversation

antm-pp
Copy link
Contributor

@antm-pp antm-pp commented Feb 23, 2024

Put an x into the box if that apply:

  • This PR introduces breaking change.
  • This PR fixes a bug.
  • This PR adds new functionality.
  • This PR enhances existing functionality.

Description of your changes

Pending a new version of checkov (which has been requested), which bumps rustworkx >0.14.0.

This temporarily adds rust, cargo during checkov to allow [email protected] to compile, similar to how gcc etc are already added to compile cffi for similar reason (lack of musl aarch64).

Testing removal of all items after compile highlighted checkov exception due to missing gcc lib and therefore PR keeps gcc installed. It's possible that only the specific lib is needed and could be much smaller. In hindsight this was in the builder image, not in the final image, so not sure the reasoning behind trying to tidy up.

The impact of the change:

Installing packages takes about 55 seconds with or without rust/cargo so they're not adding significant time.
Compiling rustworkx takes about 330 seconds (only affecting aarch64 where not pre-compiled, and failing otherwise

The final image is inflated from ~980MB to ~1.68GB presumably by Checkov now being present. Not sure how this compares to the image size on x86_64.

Fixes #634
Fixes #633
bridgecrewio/checkov#5608
Qiskit/rustworkx#992 (comment)
Qiskit/rustworkx#1008

How can we test changes

I have built this docker image locally on MacOS M2 Max (aarch64) and am successfully able to get Checkov installed. Nothing in these minor changes would be expected to impact the behaviour of a non-arm architecture build.

…on which bumps rustworkx >0.14.0 this adds rust, cargo and keeps gcc to allow source compile for aarch64.
@antm-pp antm-pp changed the title fix-workaround: Checkov install fails aarch64. fix: Checkov install fails aarch64. Feb 23, 2024
@yermulnik
Copy link
Collaborator

This temporarily adds rust, cargo during checkov to allow [email protected] to compile, similar to how gcc etc are already added to compile cffi for similar reason (lack of musl aarch64).

@antm-pp Would it make sense 1) to leave corresponding comments in the Dockerfile for tracking and 2) to try and employ TARGETARCH (and TARGETOS?) to only add those to the platforms that indeed need them? 🤔

@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 23, 2024

@yermulnik I am open to direction. I'm not an expert in contributing, just found a solution and wanted to post it.

It looked like the cffi library was doing the same in the Dockerfile, so I adopted the same approach, without TARGETARCH. I can only comment for arm64 on Mac, don't have the facilities to test other variations, although I can see the pipeline builds multi-arch so could verify both paths if I update the PR I guess.

gcc is installed in the final image anyway, it's only the builder that installs then removes it. Likewise the significant time imapct during build only applies if your arch doesn't have a pre-compiled rustworkx, so I don't think we're causing any issues in amd64 with the current approach. As I said though, happy to take direction.

@yermulnik
Copy link
Collaborator

Oh, I did misread the 2nd part of the «The final image is inflated from ~980MB to ~1.68GB presumably by Checkov now being present» sentence and thought the size change was related to rust/cargo being added 🤦🏻 Still is odd that Checkov almost doubles the size of the final image 🤔
So, I'm fine with the changes (still need @MaxymVlasov to have his eyes on it though).

Copy link
Collaborator

@yermulnik yermulnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antm-pp Please take a look into Dockerfile linter notice and see whether you can have container image built with Checkov if you follow what linter suggests. Thanks for the contribution.

Copy link
Collaborator

@yermulnik yermulnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm looking like a nerd now, though as of look of it the list of packages was alphabetically sorted before this change. @antm-pp Sorry I didn't pay attention to this before.

The PR looks good to me though. Hence approved, though awaiting for Max to review.

@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 23, 2024

@yermulnik

Apologies, obviously the repo has pre-commit for itself, I didn't employ it locally first. Now passing! The instructions for contributing were quite hook heavy and not too much about PR process.

I noticed in the GHA multi-arch build actually the compile of rust failed due to a cargo issue. This didn't happen to me locally. I'm just running a linux/arm64 build explicit now to see if that fails for me like it did in the pipeline.

Dockerfile Outdated
@@ -66,10 +66,10 @@ RUN if [ "$INSTALL_ALL" != "false" ]; then \
RUN . /.env && \
if [ "$CHECKOV_VERSION" != "false" ]; then \
( \
apk add --no-cache gcc=~12 libffi-dev=~3 musl-dev=~1; \
apk add --no-cache gcc=~12 libffi-dev=~3 musl-dev=~1 libgcc=~12 rust=~1 cargo=~1; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions: Did you try to remove each of these dependencies, build an image, and confirm that is a minimal setup?

Please add check for GCC in https://github.com/antonbabenko/pre-commit-terraform/blob/master/.github/.container-structure-test-config.yaml

Also, I think that we need to find a way to run these tests on arm64 too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is minimal dependencies. So gcc was already present in the failing build, as used for same approach by cffi compile. Tried adding only rust, and it errored cargo still missing.

Guidance for rustworkx indicates need for compiler including rust and cargo (or to use rustup a cross platform installer).

Noticed when the pre-existing purge of gcc occurred it caused exception in running checkov in the build container, therefore applied libgcc seperately to minimise that dependency and it executes ok (for the version check). When referring to the final image I noted that gcc in full is already a dependency for the pre-commit hooks. I couldn't see obviously where it's documented what hook that dependency is for. I could add a further comment to highlight it's at least needed for checkov.

Happy to add a gcc check as requested, although I've not added the gcc dependency in the final image.

Just to note, the linux/arm64 builds have been failing for sometime, an example from 2 months ago: https://github.com/antonbabenko/pre-commit-terraform/actions/runs/7183944518/job/19563861227#step:9:741

I'm just running through some tests, it looks like the linux/arm64 build is failing because it can't pull crates.io. I tried one recommended test saying to set env-var CARGO_NET_GIT_FETCH_WITH_CLI=true which then created a dependency on git (which again is already in the final image but not the builder). Just trying a run with that, so that darwin/arm64 and linux/arm64 can both compile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add a gcc check as requested, although I've not added the gcc dependency in the final image.

Ah, really? I see removal of rust and cargo, but not libgcc. How it works then 🤔

If there no package at the end - then there nothing to test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to add a gcc check as requested, although I've not added the gcc dependency in the final image.

Ah, really? I see removal of rust and cargo, but not libgcc. How it works then 🤔

Apologies if I'm missing some bits of context: gcc != libcc. The former is the compiler collection and the latter is runtime libs only — https://pkgs.alpinelinux.org/package/edge/main/x86/gcc vs https://pkgs.alpinelinux.org/package/edge/main/x86/libgcc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we can't just add tests for macos, as Structure Test currently not support macos
#636

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the issue is the structure of the '||' command for the install. When the rust compiler call fails it generates a false allowing the other part of the command to run (intended for managing checkov==latest vs checkov==). The 2nd part has its own failure mode that doesn't actually generated an exit1. So the error gets buried and the build passes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all these combinations should be rewritten to vanilla if-then-else statements? @yermulnik

[ "$CHECKOV_VERSION" = "latest" ] && pip3 install --no-cache-dir checkov \
|| pip3 install --no-cache-dir checkov==${CHECKOV_VERSION}; \

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yep, that sort of "ternary" in Bash is not vanilla if/else and has this kind of "discomfortable" hiccups 😿
And, yes, these need to be re-written 😢 Either using vanilla if/else, or like below:

 [ "$CHECKOV_VERSION" = "latest" ] && pip3 install --no-cache-dir checkov; \ 
 [ "$CHECKOV_VERSION" != "latest" ] && pip3 install --no-cache-dir checkov==${CHECKOV_VERSION}; \

ps: I probably can try and do that, though I will need help building it and testing resulting images (@MaxymVlasov, that would be super great if you already had that automation so that I can push changes and you test build/run 🤪).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or like below 🤔

 [ "$CHECKOV_VERSION" = "latest" ] && CHECKOV_VERSION="" || CHECKOV_VERSION="==${CHECKOV_VERSION}"; \ 
 pip3 install --no-cache-dir checkov${CHECKOV_VERSION}; \ 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yermulnik done in this PR. Also, I can confirm that without @antm-pp changes to rust-cargo, it fails in arm64, when with - checks passes.

And now we have prevention of silent fails for checkov - #635 (review)

@yermulnik
Copy link
Collaborator

The instructions for contributing were quite hook heavy and not too much about PR process.

@antm-pp When you have a chance that would be great if you could contribute to that docu from the contributor point of view 😺

@MaxymVlasov MaxymVlasov added the bug Something isn't working label Feb 23, 2024
@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 23, 2024

@MaxymVlasov re PR process : Still not sure what the right way is, only which bits I was told were wrong 🤣

  • I presume forking and then PR back to here was right approach, as you don't allow local branch contribution.
  • Obviously I should have used the repos own pre-commit locally first.

@MaxymVlasov
Copy link
Collaborator

I presume forking and then PR back to here was right approach, as you don't allow local branch contribution.

@antm-pp That's the default GitHub workflow. We can't provide Write access to everyone who has a GH account. That's technically impossible (at least, was). Branch creation requires write access, AFAIK. If I miss some new GH feature - please point me to docs - I can't able to quickly find such info. Also, there is a probability, that such a feature is available only for Enterprise customers or in private beta.

@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 23, 2024

@MaxymVlasov no problem. My lack of knowledge then, literally my first public contribution. Only worked in private repos before, wasn't aware of the limitation. Assumed wrongly that it was a setting rather than a system limitation. Thanks for the correction.

@yermulnik
Copy link
Collaborator

* I presume forking and then PR back to here was right approach, as you don't allow local branch contribution.

Yep, forking and PR'ing back is the common approach that allows arbitrary people to not have a write permission in the target repo (which allows a way too much). As Max already mentioned this is common approach with public repos. This is the contribution howto by GH: https://docs.github.com/en/get-started/exploring-projects-on-github/contributing-to-a-project

* Obviously I should have used the repos own pre-commit locally first.

You've hit a drawback of pre-commit (as a whole) framework: one can skip it =( So this is more of a trust to contributors to not bypass this soft requirement =)

ps: hope you're not suffering from all this stuff given it's your first public contribution (which — your contribution — is a way much more better than a dozens of others I've seen across different repos lately). thanks for your effort and time. we do appreciate this 👍🏻

@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 23, 2024

So I am seeing some quite strange behaviour.

When I docker buildx (I'm running colima which runs ubuntu in qemu):

default (no target args, no --platform) or --platform linux/aarch64; then my original build (just adding rust, cargo) runs fine and completes locally.

However, with --platform linux/arm64 the build fails saying that during the cargo process for compiling it can't pull the crates.io index (it reports as a network/proxy issue)

In all 3 cases the container reports uname==aarch64, $TARGETOS==linux, $TARGETARCH==arm64

My limited understanding being that aarch64 is an alias of arm64 as a docker platform type. The only guess I can make is that in some way colima or the docker process on my machine creates the container in a more 'native' environment as aarch64 but seems to think it needs a custom virtual host with some odd networking for building arm64. Certainly the inside of the container seems to be consistent for both.

Whatever that issue is locally, was also present in the GHA/workflow build of linux/arm64. Adding the git dependency and the ENV VAR to use cli git fetch for cargo seems to give a consistent success for me locally. So will commit that to this PR to valid in the GHA workflow.

I've added a simple gcc check. There isn't much detail in alpine's gcc --verison, but I've regexed on the 'gcc (Alpine 12.' it gives version verification to the major we've pinned, and is string locked to the version output (rather than only matching gcc which would also match a 'gcc not found' type stdout too. I did look to test locally, but don't have a personally image repo I can send an image of that size too and recall with the container test action. I verified the regex with a grep -E inside a build container though.

Also added some notes around the package dependencies (sort of assumed what the cffi ones are from my read of that compilation). This allows them to be alphabetical, but still clear which dependency is for which compilation.

@antm-pp
Copy link
Contributor Author

antm-pp commented Feb 24, 2024

After hours of re-building containers taking 15min at a time I think the whole aarch64 vs arm64 thing is nonsense (as it should be they're the same thing). I think crates.io had an outage, and pulling from their github via the env var just changed the source.

Although they've reported no issues today, they had an extended outage with symptoms like I saw here (index unavailable) on Feb 14th. https://status.crates.io/

I'm now able to build this current PR successfully with no additions. Could one of you rerun the existing docker build job action? Run 2 as I think it'll compile fine, and demonstrate the same issue has cleared inside the github runner with no further code change.

@MaxymVlasov MaxymVlasov changed the title fix: Checkov install fails aarch64. fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself. Mar 8, 2024
Dockerfile Show resolved Hide resolved
Comment on lines +77 to +78
then pip3 install --no-cache-dir checkov || exit 1; \
else pip3 install --no-cache-dir checkov==${CHECKOV_VERSION} || exit 1; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exit 1 prevents silent fail of error: can't find Rust compiler

When the rust compiler call fails it generates a false
#635 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we can't just add comments inside the code about it. Probably, the best place is on L73

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest why at # Install pre-commit code block (lines 20-23) pip3 has no || exit 1 bits? Is ii intentional?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because pre-commit doesn't use rust?

I prefer somehow make structured tests work for arm64, rather than add exit 1 in every possible place

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because pre-commit doesn't use rust?

That's kind of weird: if pip3 install ... fails (as in returns non-zero code) then exit the shell with non-successful code (exit 1). And this is implemented for checkov install only, which means it's okay to not exit for the same e.g. with pre-commit install by pip. Does pip behave differently if it fails to install pre-commit? 🤔 Aint't Docker's RUN imply somewhat set -e? Should we try and add set -e to each RUN to explicitly let Docker RUN exit when any downstream command fails within if/else statement? 🤔 I'm a bit lost to be honest 😲 We definitely should use consistent solution for all similar expressions for consistency

Copy link
Collaborator

@yermulnik yermulnik Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rewrite it to bash scripts.

What do you think of a single script that takes args? Like install_deps.sh pre-commit, install_deps.sh checkov, install_deps.sh pre-commit checkov ..., or install_deps.sh ALL? Just to keep code in the same place and have re-usable snippets (like shell funcxtions) 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem there that they are different

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, I just moving it out and implementing logic as

COPY tools/install/ /install/

WORKDIR /bin_dir

RUN /install/pre-commit.sh
RUN /install/foo.sh

We can discuss it in next PR :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem there that they are different

Some are slightly different, whilst others use the same solutions (like extracting download URLs from GH releases API response). And also I see it odd to have a bunch of almost-the-same ten-lines shell scripts instead of one that can handle installation of all the deps one-by-one or all-at-once.
On the other hand such approach with single script would negatively impact build caching and each RUN layer would get rebuilt if the file is updated with a change to the installation steps of a specific dep =(
From this point if view I'd better stay with the current approach =)

We can discuss it in next PR :)

Makes sense 🤝 (apologies that I already outlined my thoughts — this helps me imprint them in memory 😺)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxymVlasov do not install distributions with pip separately because it will not take all things installed in the previous session into account when you run a new install command. Enumerate everything and let the dependency resolver know all your requirements together. Ideally, use pip-compile to produce and commit constraint files (lockfiles) and invoke it via pip install -r direct-deps.txt -c constraint.txt.

If you do run separate pip installs, inject a pip check invocation at the end to verify integrity.

@MaxymVlasov MaxymVlasov changed the title fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself. fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself. Mar 8, 2024
@MaxymVlasov MaxymVlasov changed the title fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself. fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself Mar 8, 2024
@MaxymVlasov MaxymVlasov changed the title fix(docker): Checkov silently fails on arm64. Workaround till issue will be fixed in checkov itself fix(docker): Checkov instalation silently fails on docker build in arm64. Workaround till issue will be fixed in checkov itself Mar 8, 2024
@MaxymVlasov MaxymVlasov changed the title fix(docker): Checkov instalation silently fails on docker build in arm64. Workaround till issue will be fixed in checkov itself fix(docker): Checkov installation silently fails on docker build in arm64. Workaround till issue will be fixed in checkov itself Mar 8, 2024
Copy link
Collaborator

@yermulnik yermulnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probs https://github.com/antonbabenko/pre-commit-terraform/pull/635/files#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R20-R23 needs || exit 1 bits too? (see comment below: #635 (comment))

ps: approved just in case || exit 1 isn't needed there.

Comment on lines +77 to +78
then pip3 install --no-cache-dir checkov || exit 1; \
else pip3 install --no-cache-dir checkov==${CHECKOV_VERSION} || exit 1; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest why at # Install pre-commit code block (lines 20-23) pip3 has no || exit 1 bits? Is ii intentional?

@MaxymVlasov MaxymVlasov merged commit f255b05 into antonbabenko:master Mar 11, 2024
9 of 10 checks passed
antonbabenko pushed a commit that referenced this pull request Mar 11, 2024
## [1.88.1](v1.88.0...v1.88.1) (2024-03-11)

### Bug Fixes

* **docker:** Checkov installation silently fails on `docker build` in arm64. Workaround till issue will be fixed in `checkov` itself ([#635](#635)) ([f255b05](f255b05))
@antonbabenko
Copy link
Owner

This PR is included in version 1.88.1 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants