Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 3 builder images cannot be used with ansible-navigator #541

Open
cidrblock opened this issue May 19, 2023 · 9 comments
Open

Version 3 builder images cannot be used with ansible-navigator #541

cidrblock opened this issue May 19, 2023 · 9 comments

Comments

@cidrblock
Copy link

Using a builder built image:

---
version: 3

images:
  base_image:
    name: registry.fedoraproject.org/fedora:38  # vanilla image!

dependencies:
 
  ansible_core: 
    package_pip: ansible-core

  ansible_runner:  
    package_pip: ansible-runner

  galaxy:
    collections:
    - ansible.utils

  python:
  - ansible-pylibssh

When used with navigator:


(venv) x1 ➜  builder_test ansible-navigator run site.yml --eei test-ee:latest --mode stdout --pp never --ll debug --la false
[WARNING]: No inventory was parsed, only implicit localhost is available
[WARNING]: error in 'jsonfile' cache plugin while trying to create cache dir
/runner/artifacts/0e1b05c3-b27c-410e-b72e-9ead558e4f40/fact_cache : b"[Errno
13] Permission denied: '/runner/artifacts'"
[WARNING]: provided hosts list is empty, only localhost is available. Note that
the implicit localhost does not match 'all'
ERROR! Invalid callback for stdout specified: awx_display
Please review the log for errors.
(venv) x1 ➜  builder_test 

I suspect the issue here is the Permission Denied, runner cannot copy it's awx_display callback plugin into the artificat directory.

The dir being mounted is 700:

drwx------ 4 bthornto bthornto   80 May 19 15:38 ansible-navigator_qg5o8i3t

and from within the ee it is inaccessable:

(venv) x1 ➜  builder_test ansible-navigator exec --eei test-ee:latest --pp never
bash-5.2$ ls -l /runner
ls: cannot open directory '/runner': Permission denied

The full invocation of the ee is as follows:

odman run --rm --tty --interactive -v 
    /home/bthornto/github/builder_test/:/home/bthornto/github/builder_test/ 
    --workdir /home/bthornto/github/builder_test 
    -v /run/user/1000/keyring/:/run/user/1000/keyring/ 
    -e SSH_AUTH_SOCK=/run/user/1000/keyring/ssh 
    -v /home/bthornto/.ssh/:/home/runner/.ssh/ 
    -v /home/bthornto/.ssh/:/root/.ssh/ 
    --group-add=root 
    --ipc=host 
    -v /tmp/ansible-navigator_qg5o8i3t/artifacts/:/runner/artifacts/:Z 
    -v /tmp/ansible-navigator_qg5o8i3t/:/runner/:Z 
    --env-file /tmp/ansible-navigator_qg5o8i3t/artifacts/067e94c2-4f4f-4162-9a98-16ba258e3189/env.list -
    -quiet --name ansible_runner_067e94c2-4f4f-4162-9a98-16ba258e3189 
    test-ee:latest ansible-playbook /home/bthornto/github/builder_test/site.yml
@nitzmahone
Copy link
Member

This one is the problem:
-v /tmp/ansible-navigator_qg5o8i3t/:/runner/:Z

/runner is the fallback workdir and homedir for ephemeral users- the container build forces the one in the EE image to be writable by the container GID0, and the entrypoint script bends over backwards to ensure the user has a valid and writeable homedir that's properly reflected in /etc/passwd, but if you're mounting over the top of it with something that's not, a lot of stuff is going to be broken. Why are we trashing the container user's homedir?

@cidrblock
Copy link
Author

I need to dig deeper, but this is as far as I got so far, we call run_command_async from ansible runner passing the /tmp/ansible-navigatorxxxxx as private_data_dir

{'container_image': 'ghcr.io/ansible/crea...ee:v0.17.0', 'process_isolation_executable': 'podman', 'process_isolation': True, 'container_volume_mounts': None, 'container_options': None, 'container_workdir': None, 'private_data_dir': '/tmp/ansible-navigat...r_9isqrv2v', 'json_mode': True, 'quiet': True, 'cancel_callback': <bound method Base.r...38fe2690>>, 'finished_callback': <bound method Base.r...38fe2690>>, 'timeout': None, 'envvars': {'ANSIBLE_NAVIGATOR_UP...T_FIXTURES': 'true'}, 'host_cwd': '/home/bthornto/githu...navigator/', ...}

I'll debug runner after a bit.

@nitzmahone
Copy link
Member

nitzmahone commented May 20, 2023

Rootless podman assumes host UID == container UID0/GID0, but since we "can't" default the container to USER root, permissions on host-shared dirs are problematic. I see a few options:

  1. Just add -u root to the container invocation- that's the only way that file ownership of things created in the container will always be "correct" on the host with rootless podman (eg owned by the real host UID/GID on the host filesystem). Without that, or a manual uidmap that basically does the same thing, files created in the container will be owned by some random UID on the host.

  2. Ensure that host-mapped dirs that need to be writable inside the container have "current host user" group ownership, and are group-writable, and set the container umask to 0002. This will still have the unfortunate side-effect of container-created files and dirs being owned by a random UID on the host, but they should still be accessible since the group is the host user's GID and the umask creates them writable by that user by default.

  3. Launch the container with --userns=keep-id . This interposes the host UID at the namespace inside the container so it's actually the same inside and out, but it still keeps the primary group as container GID0 (so you'll see ephemeral group ownership on the host for files created in the container instead of your own GID). There might be a way around that one, but it's not immediately clear to me.

I think option #3 is the most compatible choice for Navigator's typical host-agility needs as a dev tool.

@cidrblock
Copy link
Author

Any of that would need to be done by runner.... Navigator doesn't craft the command line.

Will look at the runner code later tonight.

@cidrblock
Copy link
Author

ansible-navigator run site.yml --eei test-ee --pp never --co="-uroot" --la false 

this works fine for podman but not for docker

ansible-navigator run site.yml --eei test-ee --pp never --co="--userns=keep-id" --la false

works fine for podman, until docker is installed, "OCI permission denied", appears to be the docker group membership

So the last question is, should this be fixed in runner? or in navigator? It appears -u root is the better option given the issues related to having docker and podman installed at the same time.

@cidrblock
Copy link
Author

cidrblock commented May 20, 2023

it appears userns was once in runner: ansible/ansible-runner#759

@felixfontein
Copy link
Contributor

Apparently Docker also has a rootless mode (https://docs.docker.com/engine/security/rootless/), which will make this even more ... interesting :)

@cidrblock
Copy link
Author

I went ahead and merged the navigator PR and released version 3.3.1, the tests were passing and I had good success with it locally.

I still find myself thinking container engine specific CLI requirements related to builder built execution environments might be better in runner, but I also understand touching runner can have a much bigger impact than navigator.

@laurent-indermuehle
Copy link

Thanks @cidrblock I confirm that ansible-navigator 3.3.1 fixed the issue.

@sivel sivel removed the needs_triage New item that needs to be triaged label Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants