Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: CI friendly tracking #182

Open
tduffield opened this issue Apr 15, 2020 · 5 comments
Open

Proposal: CI friendly tracking #182

tduffield opened this issue Apr 15, 2020 · 5 comments

Comments

@tduffield
Copy link
Contributor

tduffield commented Apr 15, 2020

This is a proposal for functionality that I'm needing to write. The only question is whether or not you would like to accept it in the upstream.

The crux of my issue is that I want to use dobi in my CI environment to build Docker images. I need to optimize for parallelization, so I'm using the depends Image config to build a DAG which I can use to run builds in parallel (one task per job). However, there are two issues that I'm running into with running dobi in CI, both of which revolve around the fact that I don't have a persistent host.

  1. My git clone does not preserve the original modification times, so I always rebuild no matter what.
  2. I can't rely on the .dobi directory to keep track of my records, so I end up rebuilding dependencies when I shouldn't.

My solution to this problem is to expand the existing functionality, which I refer to as file-mtime, to include more CI-friendly behavior. Specifically, instead of looking at the mtime of the context directory, I would look at the timestamp of my HEAD commit to determine if any of my files have been changed since I last rebuilt. And instead of relying on the mtime of the record file to determine when I last build my image, I would write the LastBuild timestamp into the record itself. I would use S3 to store the record files, syncing them to/from my hosts as necessary, but in theory this functionality could also be expanded to support other key-value stores such as DynamoDB, Consul, etc.

I figure that this behavior could be controlled by meta parameters like so:

meta:
    project: hosted-runtime
    hosted: true # can be controlled via ENV['DOBI_HOSTED']

When hosted is true, we would use git to determine when a file was last modified, and we would keep the LastModified value in the image record itself.

Thoughts? Questions? Concerns?

@tduffield
Copy link
Contributor Author

Just wanted to follow up that I was able to get a POC of this working on my fork. It is super slick.

@cescoferraro
Copy link
Contributor

All CI set CI=true
Maybe you don’t need the metatag

@tduffield
Copy link
Contributor Author

@cescoferraro I thought about that, but some folks might have a CI system that works with things as they are, and wouldn't want the new functionality on by default.

@dnephin
Copy link
Owner

dnephin commented Apr 18, 2020

Thank you for the proposal! When I first created dobi I had ambitions to try and make it work as a config for CI, so I think the general idea makes a lot of sense.

There is one thing I don't understand from the proposal. If dobi sees newer timestamps for the result of a task then it will skip some tasks (Ex: building an image). But how will those images and artifacts be distributed so that they are available to the CI job running dobi ?
If there is a mechanism for transferring files, could that mechanism be used to copy the .dobi files and avoid the need to read those timestamps from other places?

@tduffield
Copy link
Contributor Author

tduffield commented Apr 18, 2020

I am using aws s3 sync, where it is not possible to preserve timestamps. I chose to inject the LastModified content into the file itself to ensure that regardless of syncing implementation, you would be able to preserve the correct value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants