Skip to content

Commit

Permalink
Document file system object content addressing
Browse files Browse the repository at this point in the history
  • Loading branch information
Ericson2314 committed Apr 11, 2024
1 parent da1e977 commit 4706b5e
Show file tree
Hide file tree
Showing 19 changed files with 96 additions and 31 deletions.
3 changes: 2 additions & 1 deletion doc/manual/redirects.js
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ const redirects = {
"gloss-closure": "glossary.html#gloss-closure",
"gloss-derivation": "glossary.html#gloss-derivation",
"gloss-deriver": "glossary.html#gloss-deriver",
"gloss-nar": "glossary.html#gloss-nar",
"gloss-nar": "store/file-system-object/content-address.html#serial-nix-archive",
"gloss-output-path": "glossary.html#gloss-output-path",
"gloss-profile": "glossary.html#gloss-profile",
"gloss-reachable": "glossary.html#gloss-reachable",
Expand Down Expand Up @@ -362,6 +362,7 @@ const redirects = {
"glossary.html": {
"gloss-local-store": "store/types/local-store.html",
"gloss-chroot-store": "store/types/local-store.html",
"gloss-nar": "store/file-system-object/content-address.html#serial-nix-archive",
},
};

Expand Down
1 change: 1 addition & 0 deletions doc/manual/src/SUMMARY.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- [Uninstalling Nix](installation/uninstall.md)
- [Nix Store](store/index.md)
- [File System Object](store/file-system-object.md)
- [Content-Addressing File System Objects](store/file-system-object/content-address.md)
- [Store Object](store/store-object.md)
- [Store Path](store/store-path.md)
- [Store Types](store/types/index.md)
Expand Down
4 changes: 3 additions & 1 deletion doc/manual/src/command-ref/nix-hash.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,13 @@ an example.
The hash is computed over a *serialisation* of each path: a dump of
the file system tree rooted at the path. This allows directories and
symlinks to be hashed as well as regular files. The dump is in the
*NAR format* produced by [`nix-store
*[Nix Archive (NAR)][Nix Archive] format* produced by [`nix-store
--dump`](@docroot@/command-ref/nix-store/dump.md). Thus, `nix-hash path`
yields the same cryptographic hash as `nix-store --dump path |
md5sum`.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

# Options

- `--flat`\
Expand Down
6 changes: 4 additions & 2 deletions doc/manual/src/command-ref/nix-store/dump.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Name

`nix-store --dump` - write a single path to a Nix Archive
`nix-store --dump` - write a single path to a [Nix Archive]

## Synopsis

`nix-store` `--dump` *path*

## Description

The operation `--dump` produces a NAR (Nix ARchive) file containing the
The operation `--dump` produces a [NAR (Nix ARchive)][Nix Archive] file containing the
contents of the file system tree rooted at *path*. The archive is
written to standard output.

Expand All @@ -33,6 +33,8 @@ but not other types of files (such as device nodes).
A Nix archive can be unpacked using `nix-store
--restore`.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

{{#include ./opt-common.md}}

{{#include ../opt-common.md}}
Expand Down
6 changes: 4 additions & 2 deletions doc/manual/src/command-ref/nix-store/export.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Name

`nix-store --export` - export store paths to a Nix Archive
`nix-store --export` - export store paths to a [Nix Archive]

## Synopsis

Expand All @@ -11,14 +11,16 @@
The operation `--export` writes a serialisation of the specified store
paths to standard output in a format that can be imported into another
Nix store with `nix-store --import`. This is like `nix-store
--dump`, except that the NAR archive produced by that command doesn’t
--dump`, except that the [Nix Archive (NAR)][Nix Archive] produced by that command doesn’t
contain the necessary meta-information to allow it to be imported into
another Nix store (namely, the set of references of the path).

This command does not produce a *closure* of the specified paths, so if
a store path references other store paths that are missing in the target
Nix store, the import will fail.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

{{#include ./opt-common.md}}

{{#include ../opt-common.md}}
Expand Down
4 changes: 3 additions & 1 deletion doc/manual/src/command-ref/nix-store/import.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Name

`nix-store --import` - import Nix Archive into the store
`nix-store --import` - import [Nix Archive] into the store

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

# Synopsis

Expand Down
3 changes: 2 additions & 1 deletion doc/manual/src/command-ref/nix-store/optimise.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ The operation `--optimise` reduces Nix store disk space usage by finding
identical files in the store and hard-linking them to each other. It
typically reduces the size of the store by something like 25-35%. Only
regular files and symlinks are hard-linked in this manner. Files are
considered identical when they have the same NAR archive serialisation:
considered identical when they have the same [Nix Archive (NAR)][Nix Archive] serialisation:
that is, regular files must have the same contents and permission
(executable or non-executable), and symlinks must have the same
contents.
Expand All @@ -38,3 +38,4 @@ hashing files in `/nix/store/qhqx7l2f1kmwihc9bnxs7rc159hsxnf3-gcc-4.1.1'
there are 114486 files with equal contents out of 215894 files in total
```

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
4 changes: 3 additions & 1 deletion doc/manual/src/command-ref/nix-store/restore.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,11 @@

## Description

The operation `--restore` unpacks a NAR archive to *path*, which must
The operation `--restore` unpacks a [Nix Archive (NAR)][Nix Archive] to *path*, which must
not already exist. The archive is read from standard input.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

{{#include ./opt-common.md}}

{{#include ../opt-common.md}}
Expand Down
7 changes: 0 additions & 7 deletions doc/manual/src/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,13 +252,6 @@

See [installables](./command-ref/new-cli/nix.md#installables) for [`nix` commands](./command-ref/new-cli/nix.md) (experimental) for details.

- [NAR]{#gloss-nar}

A *N*ix *AR*chive. This is a serialisation of a path in the Nix
store. It can contain regular files, directories and symbolic
links. NARs are generated and unpacked using `nix-store --dump`
and `nix-store --restore`.

- [``]{#gloss-emtpy-set}

The empty set symbol. In the context of profile history, this denotes a package is not present in a particular version of the profile.
Expand Down
10 changes: 7 additions & 3 deletions doc/manual/src/language/advanced-attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,20 +199,24 @@ Derivations can declare some infrequently used optional attributes.
The `outputHashMode` attribute determines how the hash is computed.
It must be one of the following two values:

- `"flat"`\
- [`"flat"`](@docroot@/store/file-system-object/content-address.md#serial-flat)

The output must be a non-executable regular file. If it isn’t,
the build fails. The hash is simply computed over the contents
of that file (so it’s equal to what Unix commands like
`sha256sum` or `sha1sum` produce).

This is the default.

- `"recursive"` or `"nar"`\
The hash is computed over the [NAR archive](@docroot@/glossary.md#gloss-nar) dump of the output
- [`"recursive"` or `"nar"`][Nix Archive]

The hash is computed over the [Nix Archive (NAR)][Nix Archive] dump of the output
(i.e., the result of [`nix-store --dump`](@docroot@/command-ref/nix-store/dump.md)). In
this case, the output can be anything, including a directory
tree.

[NAR archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

`"recursive"` is the traditional way of indicating this,
and is supported since 2005 (virtually the entire history of Nix).
`"nar"` is more clear, and consistent with other parts of Nix (such as the CLI),
Expand Down
2 changes: 1 addition & 1 deletion doc/manual/src/protocols/json/store-object-info.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Info about a [store object].

[store path]: @docroot@/glossary.md#gloss-store-path
[file system object]: @docroot@/store/file-system-object.md
[Nix Archive]: @docroot@/glossary.md#gloss-nar
[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive

## Impure fields

Expand Down
3 changes: 2 additions & 1 deletion doc/manual/src/protocols/nix-archive.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# Nix Archive (NAR) format

This is the complete specification of the Nix Archive format.
This is the complete specification of the [Nix Archive] format.
The Nix Archive format closely follows the abstract specification of a [file system object] tree,
because it is designed to serialize exactly that data structure.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#nix-archive
[file system object]: @docroot@/store/file-system-object.md

The format of this specification is close to [Extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), with the exception of the `str(..)` function / parameterized rule, which length-prefixes and pads strings.
Expand Down
6 changes: 4 additions & 2 deletions doc/manual/src/protocols/store-path.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# Complete Store Path Calculation

This is the complete specification for how store paths are calculated.
This is the complete specification for how [store path]s are calculated.

The format of this specification is close to [Extended Backus–Naur form](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form), but must deviate for a few things such as hash functions which we treat as bidirectional for specification purposes.

Regular users do *not* need to know this information --- store paths can be treated as black boxes computed from the properties of the store objects they refer to.
But for those interested in exactly how Nix works, e.g. if they are reimplementing it, this information can be useful.

[store path](@docroot@/store/store-path.md)

## Store path proper

```ebnf
Expand Down Expand Up @@ -113,7 +115,7 @@ where
Note that `id` = `"out"`, regardless of the name part of the store path.
Also note that NAR + SHA-256 must not use this case, and instead must use the `type` = `"source:" ...` case.

[Nix Archive (NAR)]: @docroot@/glossary.md#gloss-NAR
[Nix Archive (NAR)]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
[sha-256]: https://en.m.wikipedia.org/wiki/SHA-256

### Historical Note
Expand Down
4 changes: 3 additions & 1 deletion doc/manual/src/protocols/tarball-fetcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Link: <flakeref>; rel="immutable"

*flakeref* must be a tarball flakeref. It can contain the tarball flake attributes
`narHash`, `rev`, `revCount` and `lastModified`. If `narHash` is included, its
value must be the NAR hash of the unpacked tarball (as computed via
value must be the [NAR hash][Nix Archive] of the unpacked tarball (as computed via
`nix hash path`). Nix checks the contents of the returned tarball
against the `narHash` attribute. The `rev` and `revCount` attributes
are useful when the tarball flake is a mirror of a fetcher type that
Expand All @@ -40,3 +40,5 @@ Link: <https://example.org/hello/442793d9ec0584f6a6e82fa253850c8085bb150a.tar.gz

For tarball flakes, the value of the `lastModified` flake attribute is
defined as the timestamp of the newest file inside the tarball.

[Nix Archive]: @docroot@/store/file-system-object/content-address.md#serial-nix-archive
39 changes: 39 additions & 0 deletions doc/manual/src/store/file-system-object/content-address.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Content-Addressing File System Objects

Nix
What we really care about with Nix is content addressing store objects.
But since every store object has a root file system object, we need to content address them too.

## Flat Hashing

A single file object can just be hashed by its contents.
This is not enough information to encode the fact that the file system object is a file,
but if we *already* know that the FSO is a single file by other means, it is sufficient.

## Nix Archive (NAR)

To perform operations on FSOs such as computing cryptographic hashes, scanning for references, and so on, it is useful to be able to *serialise* FSOs into byte sequences, which can then be deserialised back into FSOs that are stored in the file system.
Examples of such serialisations are the ZIP and TAR file formats.
However, for our purposes these formats have two problems:

- They do not have a canonical serialisation, meaning that given an FSO, there can
be many different serialisations.
For instance, TAR files can have variable amounts of padding between archive members;
and some archive formats leave the order of directory entries undefined.
This is bad because we use serialisation to compute cryptographic hashes over FSOs, and therefore require the serialisation to be unique.
Otherwise, the hash value can depend on implementation details or environment
settings of the serialiser.

- They store more information than we have in our notion of FSOs, such as time stamps.
This can cause FSOs that Nix should consider equal to hash to different values on different machines, just because the dates differ.

- As a practical consideration, the TAR format is the only truly universal format in the Unix environment.
It has many problems, such as an inability to deal with long file names and files larger than 233 bytes.
Current implementations such as GNU Tar work around these limitations in various ways.

For these reasons, Nix has its very own archive format—the Nix Archive (NAR) format.

The more general solution is a custom serialization avoiding the problems described above.
The Nix Archive (NAR) format is designed to fit this requirements.

The exact specification of the Nix Archive format is in `protocols/nix-archive.md`
15 changes: 10 additions & 5 deletions src/libcmd/misc-store-flags.cc
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,11 @@ Args::Flag fileIngestionMethod(FileIngestionMethod * method)
How to compute the hash of the input.
One of:
- `nar` (the default): Serialises the input as an archive (following the [_Nix Archive Format_](https://edolstra.github.io/pubs/phd-thesis.pdf#page=101)) and passes that to the hash function.
- [`nar`](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)) (the default):
Serialises the input as an Nix Archive and passes that to the hash function.
- `flat`: Assumes that the input is a single file and directly passes it to the hash function;
- [`flat`](@docroot@/store/file-system-object/content-address.md#serial-flat))`:
Assumes that the input is a single file and directly passes it to the hash function;
)",
.labels = {"file-ingestion-method"},
.handler = {[method](std::string s) {
Expand All @@ -97,16 +99,19 @@ Args::Flag contentAddressMethod(ContentAddressMethod * method)
return Args::Flag {
.longName = "mode",
// FIXME indentation carefully made for context, this is messed up.
// FIXME link to store object content-addressing not file system object content addressing once we have a page.
.description = R"(
How to compute the content-address of the store object.
One of:
- `nar` (the default): Serialises the input as an archive (following the [_Nix Archive Format_](https://edolstra.github.io/pubs/phd-thesis.pdf#page=101)) and passes that to the hash function.
- [`nar`](@docroot@/store/file-system-object/content-address.md#serial-nix-archive)) (the default):
Serialises the input as an Nix Archive and passes that to the hash function.
- `flat`: Assumes that the input is a single file and directly passes it to the hash function;
- [`flat`](@docroot@/store/file-system-object/content-address.md#serial-flat))`:
Assumes that the input is a single file and directly passes it to the hash function;
- `text`: Like `flat`, but used for
[derivations](@docroot@/glossary.md#store-derivation) serialized in store object and
[derivations](@docroot@/glossary.md#store-derivation) serialized in store object and
[`builtins.toFile`](@docroot@/language/builtins.html#builtins-toFile).
For advanced use-cases only;
for regular usage prefer `nar` and `flat.
Expand Down
2 changes: 1 addition & 1 deletion src/libexpr/primops/fetchTree.cc
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ static RegisterPrimOp primop_fetchTree({
Fetch a file system tree or a plain file using one of the supported backends and return an attribute set with:
- the resulting fixed-output [store path](@docroot@/glossary.md#gloss-store-path)
- the corresponding [NAR](@docroot@/glossary.md#gloss-nar) hash
- the corresponding [NAR](@docroot@/store/file-system-object/content-address.md#serial-nix-archive) hash
- backend-specific metadata (currently not documented). <!-- TODO: document output attributes -->
*input* must be an attribute set with the following attributes:
Expand Down
6 changes: 6 additions & 0 deletions src/libutil/file-content-address.hh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ namespace nix {
/**
* An enumeration of the ways we can serialize file system
* objects.
*
* See `file-system-object/content-address.md` in the manual for a
* user-facing description of this concept.
*/
enum struct FileSerialisationMethod : uint8_t {
/**
Expand Down Expand Up @@ -79,6 +82,9 @@ HashResult hashPath(
/**
* An enumeration of the ways we can ingest file system
* objects, producing a hash or digest.
*
* See `file-system-object/content-address.md` in the manual for a
* user-facing description of this concept.
*/
enum struct FileIngestionMethod : uint8_t {
/**
Expand Down
2 changes: 1 addition & 1 deletion src/nix/unix/daemon.cc
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ struct AuthorizationSettings : Config {
this, {"root"}, "trusted-users",
R"(
A list of user names, separated by whitespace.
These users will have additional rights when connecting to the Nix daemon, such as the ability to specify additional [substituters](#conf-substituters), or to import unsigned [NARs](@docroot@/glossary.md#gloss-nar).
These users will have additional rights when connecting to the Nix daemon, such as the ability to specify additional [substituters](#conf-substituters), or to import unsigned [NARs](@docroot@/store/file-system-object/content-address.md#serial-nix-archive).
You can also specify groups by prefixing names with `@`.
For instance, `@wheel` means all users in the `wheel` group.
Expand Down

0 comments on commit 4706b5e

Please sign in to comment.