Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build.files and negative globs to rattler-build #819

Merged
merged 27 commits into from
Jun 4, 2024

Conversation

wolfv
Copy link
Member

@wolfv wolfv commented Apr 22, 2024

This changes all places where lists of globs are accepted to accept either a list of globs or a map with include / exclude. It also adds a new include_files key to the build map that allows to filter files that should be part of the package.

It looks like this:

build:
  files:
    include:
      - lib/*
    exclude:
      - lib/*.exe

The include operation is done before the exclude operation, meaning that the list of files is first filtered by the include globs, and then the remaining files are filtered by the exclude globs.

There are a few open questions:

  • should exclude globs work when there are no include globs? Meaning, we assume that all files are included by default (ie. implicit include: ["**/*"])?
  • Do we like the include / exclude map? Or should we have a list, where one item can be exclude: [...]? Or should we mark the negative globs, for example with ~ lib/*.exe or ^ lib/*.exe? The ! operator doesn't work as it denotes something in YAML.

TODO

  • implement exclude serialization in serde

@wolfv wolfv changed the title start working on inlcude files [skip ci] Add include_files and negative globs to rattler-build Jun 1, 2024
@wolfv
Copy link
Member Author

wolfv commented Jun 1, 2024

We should think about what an empty include and some "exclude" mean. Currently an empty include means that no paths are matched.

We could change the behavior to mean that all paths are matched, and the exclude subtracts from that. Or at least we could offer a function that works this way.

@wolfv
Copy link
Member Author

wolfv commented Jun 1, 2024

While implementing I came up with a few questions.

Should only negative (exclude) globs be allowed? To filter from all paths? The implicit include would be a match-all-glob then. Or maybe that should only be the behavior if there is a exclude (but no include).

Maybe @carterbox has input here :)

@wolfv
Copy link
Member Author

wolfv commented Jun 2, 2024

We currently don't handle the case with foo/ as we do in other places. I think we should handle that case as well.

We might also want to refactor the CopyFiles function to use the GlobVec so that we only have a single way of doing these things across all of rattler-build.

@carterbox
Copy link

carterbox commented Jun 2, 2024

should exclude globs work when there are no include globs? Meaning, we assume that all files are included by default (ie. implicit include: ["**/*"])?

The idea behind what I intended to implemented over in conda-build is that what you install with files include/exclude, is prefix (P) intersect installed-by-this-package (A) intersect include (I) subtract exclude (E). (P ∩ A ∩ I) - E. If any of P,A,I are an empty set, then the expression evaluates to an empty set. So no, there is not an implicit include: ["**/*"].

build:
  include_files:
    include: []

OR

build:
  include_files: []

Implies the empty set.

A related question is whether it matters when you subtract the set E? It doesn't because all of the other operations are intersect.

Do we like the include / exclude map? Or should we have a list, where one item can be exclude: [...]? Or should we mark the negative globs, for example with ~ lib/.exe or ^ lib/.exe? The ! operator doesn't work as it denotes something in YAML.

Adding a special marker for "not" seems contrary to YAML, and would be more difficult to implement. So I prefer the use separate lists for include and exclude. Also, I believe that glob already gives special meaning to "!".

@carterbox
Copy link

Lemme know if you want me to try and review your actual implementation. I'm not too familiar with rust, but I could give it a try. For python we use the standard library set operations. Does rust have a set template class?

@wolfv
Copy link
Member Author

wolfv commented Jun 3, 2024

@carterbox would love it if you can review the implementation - most of the magic is in the globvec.rs is_match function :)

I might add some more tests with YAML so that we can easily put together some more test cases.

The nice thing about this implementaiton is that it would work the same across all of the recipe (e.g. any place that accepts a list of globs now also accepts a include / exclude pair.

Do you think we should error when we find a dictionary that contains only exclude then?

@wolfv
Copy link
Member Author

wolfv commented Jun 4, 2024

@carterbox - i see that you commented in conda-build that the files thing also matches files from other host dependencies. That seems like a pretty glaring issue in the conda-build implementation that makes it obviously impractical to default to include-all (when an exclude is defined).

Since we don't have that issue, we are planning to also allow for a exclude without an include that would be purely subtractive.

@wolfv wolfv changed the title Add include_files and negative globs to rattler-build Add build.files and negative globs to rattler-build Jun 4, 2024
@wolfv wolfv merged commit 6f2c48e into prefix-dev:main Jun 4, 2024
17 checks passed
@wolfv wolfv deleted the include-files branch June 4, 2024 21:21
@carterbox
Copy link

Do you think we should error when we find a dictionary that contains only exclude then?

No. It's not an error to subtract from the empty set, just a silly thing to do. Errors are for when the user asks to do something is not possible / correct.

For example,

for i in range([]):
  print(i)

Looping over nothing does not raise an error.

Since we don't have that issue, we are planning to also allow for a exclude without an include that would be purely subtractive.

Yes, I think the default behavior of conda-build for file lists is not good. Unfortunately, I don't have the time right now to think about what the default values for rattler should be for all the cases of users leaving keys undefined or empty.

@wolfv
Copy link
Member Author

wolfv commented Jun 4, 2024

@carterbox the logic is now as follows:

  • nothing given means nothing matched (except under certain circumstances, like files, where the default is to match all the new files). So it's handled on a case-by-case basis, depending on which globs we're looking at.
  • if only include is given, then it's just include globs
  • if only exclude is given, we use an implicit "**" for the include (ie. subtract from everything). I think this is obvious and non-surprising behavior. Users are free to use include or include + exclude if that's what they want.
  • if exclude and include are given, we first match against include and then subtract everything that matches exclude.

Let me know if that looks bad for you!

But essentially it will allow you to write something like:

files:
  exclude: ["include/"]

to exclude everything except the header files, which is equivalent to:

files:
  include: ["**"]
  exclude: ["include/"]

Note: we do only consider new files for files. If people want to match existing files, they need to use the always_include_files key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants