Make EvalTask track resolved output paths #403

bioball · 2024-04-05T17:25:33Z

This makes Gradle track getOutputPaths() and getMultipleFileOutputPaths. These represent the actual outputs of the EvalTask.

This reverts commit 3c6df1f. The reason that this task is untracked is because the output file and multiple file output dir both accept placeholder values, which get swapped out with actual values during eval.

StefMa · 2024-04-05T17:28:16Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

 import org.pkl.cli.CliEvaluator;
 import org.pkl.cli.CliEvaluatorOptions;

+@UntrackedTask(because = "Output file names are known only after execution")


Maybe explain it a bit better here as well (that the output files contain placeholders)? 🤔

StefMa

LGTM 👌

translatenix · 2024-04-05T17:55:57Z

I see no indication that EvalTask supports placeholders. It only accepts RegularFileProperty getOutputFile() and DirectoryProperty getMultipleFileOutputDir(), which clearly shouldn't contain placeholders.

StefMa · 2024-04-05T18:01:22Z

You can read it up here:
https://pkl-lang.org/main/current/pkl-gradle/index.html#configuration-options

So yes it's a bit confusing that you can put a file into it that can contain a placeholder that doesn't work with Gradle because the file location is unknown at that point.

translatenix · 2024-04-05T18:05:08Z

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

StefMa · 2024-04-05T18:13:14Z

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

If we would use a normal String as input it would work right? 🤔

translatenix · 2024-04-05T18:28:34Z

I think EvalTask could have separate Property<String> properties and invalidate outputs if those are set.
But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

bioball · 2024-04-05T20:36:42Z

But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

Great idea! Updated the PR to add new properties that are tracked.

translatenix

Left two comments. Otherwise LGTM.

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

translatenix · 2024-04-05T21:47:59Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

 public abstract DirectoryProperty getMultipleFileOutputDir();

 @Input
 @Optional
 public abstract Property<String> getExpression();

+ @Internal
+ public Provider<CliEvaluator> getCliEvaluator() {


I don't think Provider helps for this and the following methods, as they are only called if and when the information is needed.

Update: Maybe this is correct and enables task dependency tracking for getOutputPaths() and getMultipleFileOutputPaths(). I can't find any Gradle docs on this.

Update 2: Found this in the docs:

A provider may represent a task output. Such a provider carries information about the task producing its value. When this provider is attached to an input of another task, Gradle will automatically determine the task dependencies based on this connection.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Hm... probably not, in that case. I'm guessing we need to reference the inputs.

IIRC getProject would also fail if using configuration cache.
See https://docs.gradle.org/current/userguide/configuration_cache.html#config_cache:requirements:use_project_during_execution

Good to know. I think that @Injecting a ProviderFactory might be the right solution. Could check task dependency tracking with an automated or manual test.

Related: https://discuss.gradle.org/t/auto-create-task-dependecies-in-case-of-computed-outputfile-provider-regularfile/45381

Thanks for the link. Switching to the property API (e.g. SetProperty) works!

translatenix

LGTM. Two more naming suggestions for your consideration.

translatenix · 2024-04-08T22:47:34Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+
+ @OutputFiles
+ @Optional
+ public SetProperty<File> getOutputPaths() {


Would getEffectiveOutputFiles() be a better name? Makes clear that it’s related to getOutputFile() and not meant to be configured directly.

translatenix · 2024-04-08T22:48:55Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+
+ @OutputDirectories
+ @Optional
+ public SetProperty<File> getMultipleFileOutputPaths() {


Is getEffectiveMultipleFileOutputDirs() a better name?

translatenix · 2024-04-08T23:18:21Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+ public abstract ObjectFactory getObjectFactory();
+
+ private CliEvaluator getCliEvaluator() {
+ return new CliEvaluator(


Note that this results in multiple CliEvaluator instances, each of which recomputes CliEvaluator.outputFiles and CliEvaluator.outputDirectories.

netvl

There was a reason why we originally made EvalTask untracked.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time. In other words, it is expected that Gradle knows exact values of all input and output properties of a task, in order to do its caching and invalidation.

This, in turn, implies that it is not possible (under the model constraints) to change or otherwise dynamically compute values of properties at the task execution time. As an example, this is not feasible:

class ExampleTask : DefaultTask() {
    @get:Output
    val someProperty: Property<String>

    @TaskAction
    fun doWork() {
        someProperty.set("xyz")
    }
}

To be absolutely precise, this is possible to write, and it will kind of work as expected, however, it will not play nicely with caching and invalidation, because Gradle will read input and output property values before it executes the task and build its task state and cache based on those values, and it will not track any changes happening to properties during the task execution.

There are tasks in Gradle which may produce files whose names are unknown before execution, like compilation or code generation. However, in these cases these tasks' output properties are not individual files but directories, whose names are statically known.

In the EvalTask specifically, due to the nature of patterns, I originally assumed that the specific output file and directory names will not be known until evaluation. Additionally, because often you need to produce a file into a specific directory (for example, evaluate a config.pkl file into a config.yml file which must be present in the same directory), you cannot really declare the parent directory of the output pattern to be an "output directory", because it may interfere with other tasks and in general is not "controlled" by the eval task.

Because of this, I originally made this task untracked, because it ensures that the cache consistency is not violated, and that the task is invalidated at appropriate times.

Given the above, it is only realistically possible to track the outputs of this task in the following cases:

Computing output paths from a pattern does not, in fact, require Pkl evaluation. Looking at how CliEvaluator is used here, it seems like it might be the case. If this is true, then idiomatic description of this behavior would be different - I will describe it below.
We are okay with evaluating Pkl files at configuration time. I personally don't think this is a correct approach, and would vote for the task to remain untracked, because the whole point of splitting task configuration and evaluation is to avoid costly computations at configuration time and to have consistent inputs which do not depend on build script evaluation order. Still, if this is the case, I would still recommend to reimplement the logic as described below.

So, if we assume scenario 1, the core idea would be to build CliEvaluator as Provider<CliEvaluator> derived from other properties in the class, and then derive Provider<File> and FileCollection from that:

private Provider<CliEvaluator> getCliEvaluator() {
    return getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
}

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return getCliEvaluator().map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        getCliEvaluator().map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

We can also cache the CliEvaluator instance, but this will require some extra code. Following an example in here, we can make a utility function:

public static <T> Provider<T> cache(ObjectFactory objects, Class<T> elementType, Provider<T> provider) {
    var property = objects.property(elementType);
    property.value(provider);
    property.disallowChanges();
    property.finalizeValueOnRead();
    return property;
}

Then, define the CliEvaluator property like this:

private final Property<CliEvaluator> cliEvaluator = cache(
    getObjects(),
    CliEvaluator.class,
    getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
);

and use that as the source of output properties:

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return cliEvaluator.map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        cliEvaluator.map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

I haven’t tested yet that this works fully correctly (I’m going to do it right now), and I do see that all of this is kind of clunky and more complicated than the original approach, but unfortunately that’s what is required here if we are to follow recommended Gradle practices.

Actually I have just remembered another reason why it makes sense to keep the task untracked — Pkl evaluation can depend on external, non-file inputs, like environment variables. Thus it is possible for an evaluation task to become cached when it should not be, for example: the user changed an env var and re-run the task => Pkl script depends on the env var but there is no way for Gradle to track it => Pkl is not re-evaluated, old values are still used => the user is confused.

As a side note — I would recommend not making every property @Optional by default. For example, properties like outputFormat always contain some default value from the spec, thus they should not really be optional.

netvl · 2024-04-09T00:19:19Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+ public SetProperty<File> getOutputPaths() {
+ var ret = getObjectFactory().setProperty(File.class);
+ ret.value(getCliEvaluator().getOutputFiles()).disallowChanges();
+ return ret;
+ }


Unfortunately this approach has some issues.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

In this case, however, just creating the SetProperty instance requires creating the CliEvaluator object, which in turn evaluates other properties in the task. This is, unfortunately, does not follow Gradle's model of lazy properties properly.

Here is a limited and artificial example which illustrates the point:

// buildSrc/src/main/java/tasks/EvalTask.java public abstract class EvalTask extends DefaultTask { @Internal public abstract RegularFileProperty getOutputFile(); @Inject public abstract ObjectFactory getObjectFactory(); private Set<File> getCliEvaluatorFiles() { return Set.of(getOutputFile().get().getAsFile()); } @OutputFiles @Optional public SetProperty<File> getOutputPaths() { var ret = getObjectFactory().setProperty(File.class); ret.value(getCliEvaluatorFiles()).disallowChanges(); return ret; } }

// build.gradle.kts import tasks.EvalTask val evalSomething by tasks.creating(EvalTask::class) { // outputFile is not set } val copySomewhere by tasks.creating(Copy::class) { from(evalSomething.outputPaths) into("whatever") } evalSomething.run { outputFile = layout.projectDirectory.file("settings.gradle.kts") }

This build will fail at configuration time with an exception like this:

* What went wrong: Cannot query the value of task ':evalSomething' property 'outputFile' because it has no value available.

This is actually the reason why in that link that was submitted in the earlier discussion (on Gradle forums) it was done differently, along these lines:

@OutputFile abstract RegularFileProperty getOutputFile() Producer() { outputFile.value(getOutputFileName().flatMap(name -> project.layout.buildDirectory.file(name))).disallowChanges() }

Not only is the actual instance of the property instantiated by Gradle here (via abstract method injection), it is configured with a provider (derived from a property here), not with a directly computed value, like in this PR.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Pkl evaluation can depend on external, non-file inputs, like environment variables.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Evaluating Pkl files at configuration time is not OK. But looking at CliEvaluator, I think that "computing output paths from a pattern does not, in fact, require Pkl evaluation" is correct.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

Surely it must be possible for a computed input/output property to depend on another property of the same task? I don't see anything expensive here, with the possible exception of CliEvaluator.outputFiles/outputDirectories.

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Yes, it most definitely can, but it does not contradict what I'm saying. I might not have expressed it fully clearly, but essentially what I wanted to say is that in Gradle model, file collections must be fully configured before the task execution time. It does not mean that their contents can't be dynamic - they most certainly can, like class files produced by compilation tasks, and they can be absent before some task is executed - but that their configuration can't be dynamic, i.e. you cannot add new sources to a file collection or a file tree at task execution time.

When I was implementing the first version of the evaluation task, I assumed that it is not possible to determine exact file names and directory names from patterns without evaluation, i.e. without executing the task logic first. Because patterns are flexible, it would've been impossible to statically configure a FileCollection representing output files/directories, and therefore I had to make the tasks untracked.

Since apparently it is possible to resolve patterns and determine outputs without running evaluation first, therefore, it should be possible to determine the full set of output files/directories at configuration time, and therefore we can make the task tracked.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Fair point, forgot that these items were there. Yes, in this case evaluation should be self-contained and thus cached.

netvl · 2024-04-09T03:50:52Z

Btw, let me correct myself - it seems I forgot the code base a bit - of course there can be multiple output files with single output pattern; then the logic would look like this:

  @OutputFiles
  @Optional
  public Provider<Set<File>> getEffectiveOutputFilePaths() {
      return cliEvaluator.map(CliEvaluator::getOutputFiles);
  }
  
  @OutputDirectories
  @Optional
  public Provider<Set<File>> getOutputDirectories() {
      return cliEvaluator.map(CliEvaluator::getOutputDirectories);
  }

I also switched to Provider<Set<File>> instead of FileCollection, because unfortunately file collections are more cumbersome to create than providers, especially if empty providers is a possibility.

Revert "Turn EvalTask into a tracked Gradle task"

2c069b3

This reverts commit 3c6df1f. The reason that this task is untracked is because the output file and multiple file output dir both accept placeholder values, which get swapped out with actual values during eval.

StefMa reviewed Apr 5, 2024

View reviewed changes

Update comment with detailed explanation

d5ba796

StefMa approved these changes Apr 5, 2024

View reviewed changes

holzensp approved these changes Apr 5, 2024

View reviewed changes

Make EvalTask tracked again, and add derived output paths

2faa6aa

bioball changed the title ~~Revert "Turn EvalTask into a tracked Gradle task"~~ Make EvalTask track resolved output paths Apr 5, 2024

translatenix reviewed Apr 5, 2024

View reviewed changes

bioball added 4 commits April 5, 2024 15:05

Make getOutputFormat() a tracked input

57096dc

Create a provider via provider factory

632b443

Use SetProperty to enable proper dependency tracking

983d4b5

Style fix

f68dbd4

translatenix reviewed Apr 8, 2024

View reviewed changes

netvl reviewed Apr 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make EvalTask track resolved output paths #403

Make EvalTask track resolved output paths #403

bioball commented Apr 5, 2024 •

edited

StefMa Apr 5, 2024

bioball Apr 5, 2024

StefMa left a comment

translatenix commented Apr 5, 2024

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024 •

edited

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024

bioball commented Apr 5, 2024

translatenix left a comment

translatenix Apr 5, 2024 •

edited

bioball Apr 6, 2024

StefMa Apr 6, 2024

translatenix Apr 6, 2024

translatenix Apr 6, 2024

bioball Apr 8, 2024

translatenix left a comment

translatenix Apr 8, 2024 •

edited

translatenix Apr 8, 2024

translatenix Apr 8, 2024

netvl left a comment

netvl Apr 9, 2024

translatenix Apr 9, 2024 •

edited

netvl Apr 9, 2024

netvl commented Apr 9, 2024

Make EvalTask track resolved output paths #403

Are you sure you want to change the base?

Make EvalTask track resolved output paths #403

Conversation

bioball commented Apr 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefMa left a comment

Choose a reason for hiding this comment

translatenix commented Apr 5, 2024

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024 • edited

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024

bioball commented Apr 5, 2024

translatenix left a comment

Choose a reason for hiding this comment

translatenix Apr 5, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix left a comment

Choose a reason for hiding this comment

translatenix Apr 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix Apr 9, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netvl commented Apr 9, 2024

bioball commented Apr 5, 2024 •

edited

translatenix commented Apr 5, 2024 •

edited

translatenix Apr 5, 2024 •

edited

translatenix Apr 8, 2024 •

edited

translatenix Apr 9, 2024 •

edited