Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make EvalTask track resolved output paths #403

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

bioball
Copy link
Contributor

@bioball bioball commented Apr 5, 2024

This makes Gradle track getOutputPaths() and getMultipleFileOutputPaths. These represent the actual outputs of the EvalTask.

This reverts commit 3c6df1f.

The reason that this task is untracked is because the output file
and multiple file output dir both accept placeholder values, which get
swapped out with actual values during eval.
import org.pkl.cli.CliEvaluator;
import org.pkl.cli.CliEvaluatorOptions;

@UntrackedTask(because = "Output file names are known only after execution")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain it a bit better here as well (that the output files contain placeholders)? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

@StefMa StefMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👌

@translatenix
Copy link
Contributor

I see no indication that EvalTask supports placeholders. It only accepts RegularFileProperty getOutputFile() and DirectoryProperty getMultipleFileOutputDir(), which clearly shouldn't contain placeholders.

@StefMa
Copy link
Contributor

StefMa commented Apr 5, 2024

You can read it up here:
https://pkl-lang.org/main/current/pkl-gradle/index.html#configuration-options

Screenshot_20240405-195950.png

So yes it's a bit confusing that you can put a file into it that can contain a placeholder that doesn't work with Gradle because the file location is unknown at that point.

@translatenix
Copy link
Contributor

translatenix commented Apr 5, 2024

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

@StefMa
Copy link
Contributor

StefMa commented Apr 5, 2024

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

If we would use a normal String as input it would work right? 🤔

@translatenix
Copy link
Contributor

I think EvalTask could have separate Property<String> properties and invalidate outputs if those are set.
But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

@bioball
Copy link
Contributor Author

bioball commented Apr 5, 2024

But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

Great idea! Updated the PR to add new properties that are tracked.

@bioball bioball changed the title Revert "Turn EvalTask into a tracked Gradle task" Make EvalTask track resolved output paths Apr 5, 2024
Copy link
Contributor

@translatenix translatenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left two comments. Otherwise LGTM.

public abstract DirectoryProperty getMultipleFileOutputDir();

@Input
@Optional
public abstract Property<String> getExpression();

@Internal
public Provider<CliEvaluator> getCliEvaluator() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Provider helps for this and the following methods, as they are only called if and when the information is needed.

Update: Maybe this is correct and enables task dependency tracking for getOutputPaths() and getMultipleFileOutputPaths(). I can't find any Gradle docs on this.

Update 2: Found this in the docs:

A provider may represent a task output. Such a provider carries information about the task producing its value. When this provider is attached to an input of another task, Gradle will automatically determine the task dependencies based on this connection.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Hm... probably not, in that case. I'm guessing we need to reference the inputs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know. I think that @Injecting a ProviderFactory might be the right solution. Could check task dependency tracking with an automated or manual test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link. Switching to the property API (e.g. SetProperty) works!

Copy link
Contributor

@translatenix translatenix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two more naming suggestions for your consideration.


@OutputFiles
@Optional
public SetProperty<File> getOutputPaths() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would getEffectiveOutputFiles() be a better name? Makes clear that it’s related to getOutputFile() and not meant to be configured directly.


@OutputDirectories
@Optional
public SetProperty<File> getMultipleFileOutputPaths() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is getEffectiveMultipleFileOutputDirs() a better name?

public abstract ObjectFactory getObjectFactory();

private CliEvaluator getCliEvaluator() {
return new CliEvaluator(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this results in multiple CliEvaluator instances, each of which recomputes CliEvaluator.outputFiles and CliEvaluator.outputDirectories.

Copy link

@netvl netvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a reason why we originally made EvalTask untracked.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time. In other words, it is expected that Gradle knows exact values of all input and output properties of a task, in order to do its caching and invalidation.

This, in turn, implies that it is not possible (under the model constraints) to change or otherwise dynamically compute values of properties at the task execution time. As an example, this is not feasible:

class ExampleTask : DefaultTask() {
    @get:Output
    val someProperty: Property<String>

    @TaskAction
    fun doWork() {
        someProperty.set("xyz")
    }
}

To be absolutely precise, this is possible to write, and it will kind of work as expected, however, it will not play nicely with caching and invalidation, because Gradle will read input and output property values before it executes the task and build its task state and cache based on those values, and it will not track any changes happening to properties during the task execution.

There are tasks in Gradle which may produce files whose names are unknown before execution, like compilation or code generation. However, in these cases these tasks' output properties are not individual files but directories, whose names are statically known.

In the EvalTask specifically, due to the nature of patterns, I originally assumed that the specific output file and directory names will not be known until evaluation. Additionally, because often you need to produce a file into a specific directory (for example, evaluate a config.pkl file into a config.yml file which must be present in the same directory), you cannot really declare the parent directory of the output pattern to be an "output directory", because it may interfere with other tasks and in general is not "controlled" by the eval task.

Because of this, I originally made this task untracked, because it ensures that the cache consistency is not violated, and that the task is invalidated at appropriate times.

Given the above, it is only realistically possible to track the outputs of this task in the following cases:

  1. Computing output paths from a pattern does not, in fact, require Pkl evaluation. Looking at how CliEvaluator is used here, it seems like it might be the case. If this is true, then idiomatic description of this behavior would be different - I will describe it below.
  2. We are okay with evaluating Pkl files at configuration time. I personally don't think this is a correct approach, and would vote for the task to remain untracked, because the whole point of splitting task configuration and evaluation is to avoid costly computations at configuration time and to have consistent inputs which do not depend on build script evaluation order. Still, if this is the case, I would still recommend to reimplement the logic as described below.

So, if we assume scenario 1, the core idea would be to build CliEvaluator as Provider<CliEvaluator> derived from other properties in the class, and then derive Provider<File> and FileCollection from that:

private Provider<CliEvaluator> getCliEvaluator() {
    return getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
}

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return getCliEvaluator().map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        getCliEvaluator().map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

We can also cache the CliEvaluator instance, but this will require some extra code. Following an example in here, we can make a utility function:

public static <T> Provider<T> cache(ObjectFactory objects, Class<T> elementType, Provider<T> provider) {
    var property = objects.property(elementType);
    property.value(provider);
    property.disallowChanges();
    property.finalizeValueOnRead();
    return property;
}

Then, define the CliEvaluator property like this:

private final Property<CliEvaluator> cliEvaluator = cache(
    getObjects(),
    CliEvaluator.class,
    getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
);

and use that as the source of output properties:

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return cliEvaluator.map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        cliEvaluator.map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

I haven’t tested yet that this works fully correctly (I’m going to do it right now), and I do see that all of this is kind of clunky and more complicated than the original approach, but unfortunately that’s what is required here if we are to follow recommended Gradle practices.


Actually I have just remembered another reason why it makes sense to keep the task untracked — Pkl evaluation can depend on external, non-file inputs, like environment variables. Thus it is possible for an evaluation task to become cached when it should not be, for example: the user changed an env var and re-run the task => Pkl script depends on the env var but there is no way for Gradle to track it => Pkl is not re-evaluated, old values are still used => the user is confused.


As a side note — I would recommend not making every property @Optional by default. For example, properties like outputFormat always contain some default value from the spec, thus they should not really be optional.

Comment on lines +71 to +75
public SetProperty<File> getOutputPaths() {
var ret = getObjectFactory().setProperty(File.class);
ret.value(getCliEvaluator().getOutputFiles()).disallowChanges();
return ret;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this approach has some issues.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

In this case, however, just creating the SetProperty instance requires creating the CliEvaluator object, which in turn evaluates other properties in the task. This is, unfortunately, does not follow Gradle's model of lazy properties properly.

Here is a limited and artificial example which illustrates the point:

// buildSrc/src/main/java/tasks/EvalTask.java

public abstract class EvalTask extends DefaultTask {
  @Internal
  public abstract RegularFileProperty getOutputFile();

  @Inject
  public abstract ObjectFactory getObjectFactory();

  private Set<File> getCliEvaluatorFiles() {
    return Set.of(getOutputFile().get().getAsFile());
  }

  @OutputFiles
  @Optional
  public SetProperty<File> getOutputPaths() {
    var ret = getObjectFactory().setProperty(File.class);
    ret.value(getCliEvaluatorFiles()).disallowChanges();
    return ret;
  }
}
// build.gradle.kts

import tasks.EvalTask

val evalSomething by tasks.creating(EvalTask::class) {
    // outputFile is not set
}

val copySomewhere by tasks.creating(Copy::class) {
    from(evalSomething.outputPaths)
    into("whatever")
}

evalSomething.run {
    outputFile = layout.projectDirectory.file("settings.gradle.kts")
}

This build will fail at configuration time with an exception like this:

* What went wrong:
Cannot query the value of task ':evalSomething' property 'outputFile' because it has no value available.

This is actually the reason why in that link that was submitted in the earlier discussion (on Gradle forums) it was done differently, along these lines:

@OutputFile
abstract RegularFileProperty getOutputFile()

Producer() {
    outputFile.value(getOutputFileName().flatMap(name -> project.layout.buildDirectory.file(name))).disallowChanges()
}

Not only is the actual instance of the property instantiated by Gradle here (via abstract method injection), it is configured with a provider (derived from a property here), not with a directly computed value, like in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Pkl evaluation can depend on external, non-file inputs, like environment variables.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Evaluating Pkl files at configuration time is not OK. But looking at CliEvaluator, I think that "computing output paths from a pattern does not, in fact, require Pkl evaluation" is correct.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

Surely it must be possible for a computed input/output property to depend on another property of the same task? I don't see anything expensive here, with the possible exception of CliEvaluator.outputFiles/outputDirectories.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Yes, it most definitely can, but it does not contradict what I'm saying. I might not have expressed it fully clearly, but essentially what I wanted to say is that in Gradle model, file collections must be fully configured before the task execution time. It does not mean that their contents can't be dynamic - they most certainly can, like class files produced by compilation tasks, and they can be absent before some task is executed - but that their configuration can't be dynamic, i.e. you cannot add new sources to a file collection or a file tree at task execution time.

When I was implementing the first version of the evaluation task, I assumed that it is not possible to determine exact file names and directory names from patterns without evaluation, i.e. without executing the task logic first. Because patterns are flexible, it would've been impossible to statically configure a FileCollection representing output files/directories, and therefore I had to make the tasks untracked.

Since apparently it is possible to resolve patterns and determine outputs without running evaluation first, therefore, it should be possible to determine the full set of output files/directories at configuration time, and therefore we can make the task tracked.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Fair point, forgot that these items were there. Yes, in this case evaluation should be self-contained and thus cached.

@netvl
Copy link

netvl commented Apr 9, 2024

Btw, let me correct myself - it seems I forgot the code base a bit - of course there can be multiple output files with single output pattern; then the logic would look like this:

  @OutputFiles
  @Optional
  public Provider<Set<File>> getEffectiveOutputFilePaths() {
      return cliEvaluator.map(CliEvaluator::getOutputFiles);
  }
  
  @OutputDirectories
  @Optional
  public Provider<Set<File>> getOutputDirectories() {
      return cliEvaluator.map(CliEvaluator::getOutputDirectories);
  }

I also switched to Provider<Set<File>> instead of FileCollection, because unfortunately file collections are more cumbersome to create than providers, especially if empty providers is a possibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants