Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the build reproducible #182

Open
ia0 opened this issue Oct 9, 2020 · 7 comments
Open

Make the build reproducible #182

ia0 opened this issue Oct 9, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@ia0
Copy link
Member

ia0 commented Oct 9, 2020

git clone && ./setup.sh && ./deploy should be reproducible at any time for a given commit. This is currently not the case, at least because we don't commit a Cargo.lock.

This is blocking #151. The workflow is currently disabled by #181.

@ia0 ia0 assigned ia0 and gendx and unassigned ia0 Oct 9, 2020
@jmichelp
Copy link
Collaborator

This means we should really go for 2 branches:

  • on dev branch the workflow should reset + compile twice in a row and ensure that the hashes are the same. Otherwise this would mean we include randomly generated data into the firmware.
  • on master we should only merge from dev and have a workflow on merge that generates a Cargo.lock file and computes the hashes.

WDYT?

@jmichelp jmichelp added the enhancement New feature or request label Oct 12, 2020
@ia0
Copy link
Member Author

ia0 commented Oct 12, 2020

This issue is not about a problem in the continuous integration setup nor in the test itself. Both are working fine, although they could be improved as you describe. This issue is about the build not being reproducible as indicated in the title and the description.

Maybe the definition of build is reproducible needs to be clarified to explain the problem and how to check it.

A build is reproducible with time window T (e.g. 1 year) if the following property holds: For any point t in time, for all commits c in the repository, if the time of the commit tc is between t - T and t then build(t, c) = build(tc, c) where build(t, c) builds commit c at point t in time.

Because we can only run build(now, c) with now the current time, we can't check the property as stated. So we have to keep artifacts like build(ta, c) for each commit c where ta is the time when the artifact was built (usually a bit before the time of the commit tc). Then we can test that the build is reproducible with the following steps: Let now be the current time, for all commits c in the repository, if the time of the artifact ta is between now - T and now then build(now, c) should be equal to the artifact build(ta, c).

This test is currently failing as demonstrated in #180 which does a no-op change (which means that the commit is equal to its previous commit and thus both can be used interchangeably). With the notations above we have PR = master and build(t_PR, PR) != build(t_master, master), i.e. build(t_PR, master) != build(t_master, master) with t_PR - t_master < T for any reasonable T.

@ia0 ia0 added bug Something isn't working and removed enhancement New feature or request labels Oct 12, 2020
@gendx
Copy link
Collaborator

gendx commented Oct 12, 2020

I don't think this level of formalism is necessary to understand the issue. It's rather clear that over time changes in any of the dependencies affects reproducibility, and therefore we can't achieve reproducibility over time without pinning to a Cargo.lock. Likewise changes in the compiler affect reproducibility, and that's why we pin a compiler version.

So I think pinning a Cargo.lock and a compiler version are reasonable measures, with an extra workflow running as a cron job to notify when dependencies are outdated, or when the compiler is outdated. For the compiler, because a new nightly exists pretty much every day, it would probably be more reasonable to be notified when it's more than N days old (as opposed to "not the latest").

WDYT?

@gendx gendx removed their assignment Oct 12, 2020
@ia0
Copy link
Member Author

ia0 commented Oct 12, 2020

So I think pinning a Cargo.lock and a compiler version

Yes, we should do that.

an extra workflow running as a cron job to notify when dependencies are outdated, or when the compiler is outdated

This is an orthogonal improvement to the reproducibility problem. However a cron job that tests the reproducibility of old commits (maybe just some tags to avoid computational costs) would be useful to track reproducibility issues.

@gendx
Copy link
Collaborator

gendx commented Oct 14, 2020

However a cron job that tests the reproducibility of old commits (maybe just some tags to avoid computational costs) would be useful to track reproducibility issues.

This could be useful. However, if an old commit X turns out not to be reproducible, it will be forever non-reproducible and we cannot fix it - the only option is to create a different commit Y.

@ia0
Copy link
Member Author

ia0 commented Oct 14, 2020

Yes, and that would work well with a release branch process. The cron job would check the last commit of all release branches. If a release is not reproducible (for example the retention of the nightly compiler that was used expired), then we push a commit to update the compiler version for that release branch.

@kaczmarczyck
Copy link
Collaborator

#667 suggests to have binary releases. We might want to revisit reproducibility if we go for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants