Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared container/host download cache #447

Open
lynaghk opened this issue Sep 4, 2022 · 3 comments
Open

Shared container/host download cache #447

lynaghk opened this issue Sep 4, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@lynaghk
Copy link

lynaghk commented Sep 4, 2022

My toast tasks spent a lot of time repeatedly downloading language dependencies into, e.g., ~/.cargo/registry (Rust) and ~/.m2/ (Java / Clojure).
I've run into two problems trying to share these host dependency caches with containers.

Consider a task definition like:

image: ubuntu:22.04
tasks:
  compile-rust:
    dependencies:
      - install-rust
    mount_paths:
      - ~/.cargo/registry:/root/.cargo/registry
    command: |
      cargo build

Problem 1: Containers with mount_paths are not cachable.

Running toast compile-rust will fail with:

[ERROR] Unable to parse file /Users/dev/software/toast/scratch/toast.yml. Reason: Task compile-rust has mount_paths but does not disable caching. To fix this, set cache: false for this task.

I understand the intent here --- the contents of the mount_paths are untracked by toast and so in general caching build output could lead to reproducibility problems.
However, for the specific need of sharing the host's already-downloaded dependencies, it feels a bit different.
We're trusting language tooling (cargo, mvn, whatever) to handle downloading/validating libraries in a sensible way.
Since toast is already fine caching containers that download arbitrary stuff from the Internet (i.e., isn't aiming for bitwise-reproducible dependency tracking), it feels reasonable to ignore some of the "inputs" for the purposes of caching.

Perhaps we should break out this need out as a separate concept, say cache_paths.
Then we'd have:

  • input_paths: tracked by toast, for anything where content changes should re-run command (source code, dependency lockfiles, etc.)
  • mount_paths: untracked by toast, for transient data used by commands that never "finish" (e.g., database server)
  • cache_paths: untracked by toast, for data that a command will read/write and handle its own dependencies for.

Alternatively, we could allow specifying a "don't cache this" option for specific input paths or a "please allow caching" for containers that use mount_paths, but I think it'll be clearer to have an explicitly named concept.

Problem 2: Intended host paths vary.

I pulled a fast one with:

mount_paths:
  - ~/.cargo/registry:/root/.cargo/registry

It actually needs to be a path relative to the toast file (../../.cargo/registry) or an absolute one (/Users/kevin/.cargo/registry).
Unfortunately, both make it difficult to make toast-driven projects portable, since it'll force you to checkout the project to a specific folder or run as a specific user.

The obvious fix in this case is to write a bit of code to expand ~ into the executing user's home directory, though there be more general ways to solve this need (e.g., allowing for interpretation of environment variables in toast config values beyond just the command string).

If you agree that the need is in-scope for toast, I'm happy to submit PRs in the next few days.

@lynaghk lynaghk added the enhancement New feature or request label Sep 4, 2022
@stepchowfun
Copy link
Owner

Hi @lynaghk, thanks for these two feature requests! I can understand the desire for both. Here are my rough thoughts on them:

For Problem 1, one simple possibility is to change the error into a warning. I'm on the fence about it, though. Warnings tend to be ignored, although in this case that would be the point. If we changed this error into a warning, there are probably other similar errors we'd also want to change into warnings.

For Problem 2, I think supporting environment variable interpolation is the way to go, rather than treating home directories as a special case. You'd have to write $HOME instead of ~, but I think that's fine. However, I'm concerned about the complexity of adding this feature.

Sorry for the non-committal response. I'm still not sure about either.

@lynaghk
Copy link
Author

lynaghk commented Sep 8, 2022

No worries! There's no huge rush on my end, so definitely feel free to have a think.

Re: Problem 1, warnings don't feel like a long-term "solution", since they don't allow one to distinguish intent --- if a colleague had written the toast file, I wouldn't be sure if they'd made a mistake or if they deliberately wanted to introduce not-tracked-by-toast state into the container.

Re: Problem 2, the other tricky part of environment vars from a UX perspective is that the ones in command will variable will be interpreted within the container shell, whereas all of the other ones (presumably) will be interpreted w.r.t. the context that is running the toast command. But hopefully that'll make sense to everyone.

I already have a private fork of toast running Bollard (as discussed in #440 (comment)) so I may clean that up and have a crack at some of these other features.

I'll report on my experiences and if you want we can discuss how to merge some of the back upstream. Cheers!

@stepchowfun
Copy link
Owner

Those are good points!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants