Improve configuration of native image compilation #455

translatenix · 2024-04-25T21:25:55Z

Solve msgpack issue with --initialize-at-run-time.
Use quick build mode for 40% faster compilation and 20% smaller executable (tested on Linux amd64). If necessary, this can be disabled for release builds. Note that with the move from GraalVM CE 22 to Oracle GraalVM 23, native image compilation time has doubled and binary size has increased to 120 MiB on Linux amd64.
Use Intel Skylake (2015) CPU features for amd64 executables.
Remove options that are commented out.

bioball · 2024-04-26T22:31:04Z

pkl-cli/pkl-cli.gradle.kts

+ // disable automatic support for JVM CLI options (puts our main class in full control of argument parsing)
+ ,"-H:-ParseRuntimeOptions"
+ // quick build mode: 40% faster compilation, 20% smaller (but presumably also slower) executable
+ ,"-Ob"


Good addition! Let's use buildInfo.isReleaseBuild to toggle this off for our actual releases.

Done. Note that binary size keeps increasing (now at 120 MiB for amd64). I think it's worth trying to switch to --initialize-at-runtime for most things other than the Truffle interpreter. I also tried switching to the G1 garbage collector, but this added another 20 MiB.

In addition to Truffle, we also want to initialize the standard library at build time, to cut down the overhead of Pkl execution.

There's certainly some tuning that we can do here, though.

pkl-cli/pkl-cli.gradle.kts

bioball · 2024-04-26T23:22:35Z

pkl-cli/pkl-cli.gradle.kts

+ // Tested on ZEN 2.
+ // Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT,
+ // LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT
+ , if (buildInfo.arch == "amd64") "-march=skylake" else ""


What does this do? Does this mean native executables are not compatible with Intel CPUs older than 2015, and AMD CPUs older than 2017? And if so, what do we get from this?

Yes, this raises the baseline for supported CPUs from "x64-v3" (default) to "skylake". In return, the native image compiler can use many additional CPU features/instructions, resulting in faster and more efficient assembly code.
I arrived at this by checking the output of -march=list. "skylake" seemed to be a good middle ground between supporting a wide range of CPUs and leveraging many CPU features.

https://www.graalvm.org/latest/reference-manual/native-image/overview/BuildOutput/#recommendation-cpu

For a native Java app that runs on known (cloud) hardware, it would make sense to be more aggressive.

How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results. I'm on an M1 machine, so I can't really test this myself alas.

To run tests, you can ./gradlew pkl-core:testLinuxExecutableAmd64 assuming you're driving this through WSL.

These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.

CC @holzensp @stackoverflow for comments

Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.

I agree with @bioball and @stackoverflow. We're also considering spending a version-cycle on GraalVM/Truffle/native-image updates and perf (including binary size). For now, let's be quite conservative.

translatenix · 2024-04-29T22:52:58Z

How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results.

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.

Your call. I feel that “skylake” is fairly conservative yet enables most CPU features. “x64-v3” is an arbitrary GraalVM default—we should figure out what’s right for Pkl. Also, this is easy to undo depending on feedback.

bioball · 2024-04-29T23:03:02Z

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

Not yet. We have some very lightweight benchmark tests, but we don't have an environment (yet) that can run workloads with bare metal isolation. Until then, the best we can do is run benchmarks locally then compare.

translatenix · 2024-04-29T23:47:26Z

Until then, the best we can do is run benchmarks locally then compare

But what benchmarks do you run? ./gradlew pkl-core:testLinuxExecutableAmd64 doesn’t strike me as a meaningful benchmark.

bioball · 2024-04-30T00:26:41Z

There is an in-language stdlib module that we use to write one-off benchmarks when we want to test specific things when iterating locally.

There's also the bench:jmh task, although we haven't been using that nearly as much.

The suggestion of running testLinuxExecutableAmd64 is just a coarse-grained way to get some signal, but definitely not a real benchmark.

bioball · 2024-04-30T00:37:51Z

Here's a quick benchmark that might be a good starting point for comparison. Save this to a file somewhere, then pkl eval <file>.pkl to execute the benchmark.

amends "pkl:Benchmark"

import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/internal/ModulesGenerator.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/Parser.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/URI.pkl"

local githubActionJson = read("https://json.schemastore.org/github-action.json")

microbenchmarks {
  ["json schema generator"] {
    expression = 
      let (parsed = Parser.parse(githubActionJson))
        new ModulesGenerator {
          rootSchema = parsed
          baseUri = URI.parse("file:///foo")!!
        }
          .modules
          .map((it) -> it.moduleNode.output.text)
  }
}

stackoverflow · 2024-04-30T13:59:56Z

.circleci/config.yml

@@ -753,6 +753,9 @@ workflows:
 - gradle-check-jdk21:
 requires:
 - hold
+ - pkl-cli-linux-alpine-amd64-snapshot:
+ requires:
+ - hold


Did you change config.pkl and forgot to commit, or someone else forgot to commit their changes to config.pkl?

Good catch. This commit belongs to a different PR, and I don't know how it ended up here. Removed.

stackoverflow · 2024-04-30T14:09:46Z

pkl-cli/pkl-cli.gradle.kts

+ // Tested on ZEN 2.
+ // Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT,
+ // LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT
+ , if (buildInfo.arch == "amd64") "-march=skylake" else ""


Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.

translatenix · 2024-05-01T08:23:15Z

Here's a quick benchmark that might be a good starting point for comparison.

On my laptop, the variance of this benchmark is too high to draw any conclusions about-march settings.
However, I can tell that quick build mode significantly degrades performance, which isn't surprising.

-march=x86-64-v3 requires Intel Haswell (2013) or AMD Excavator (2015), which isn't far off from -march=skylake.
To move this PR forward, I'll remove -march=skylake.

bioball

Sounds good, thanks for running that benchmark for us!

The one remaining request is to undo the change to --add-opens here.

translatenix · 2024-05-01T15:37:54Z

The one remaining request is to undo the change to --add-opens here.

I’m still working on that. I’m not convinced that --add-opens is required. After all, it isn’t required when running on the JVM.

translatenix · 2024-05-02T18:32:12Z

Ready to review from my side. See commit messages for details.

The "Improve BinaryEvaluatorSnippetTests(Engine)" refactoring isn't essential because I ultimately decided to add NativeServerTest instead of NativeBinaryEvaluatorSnippetTests. But I think it's a worthwhile improvement regardless.

- Solve msgpack issue with `--initialize-at-run-time`. - Use quick build mode for non-release builds: 40% faster compilation, 20% smaller executable. - Remove options that were commented out.

- Extract AbstractSnippetTestsEngine from AbstractLanguageSnippetTestsEngine and reuse it for BinaryEvaluatorSnippetTestEngine. - Rename BinaryEvaluatorSnippetTestEngine to BinaryEvaluatorSnippetTestsEngine. - Rename expected output files to end in ".yml.pkl" to satisfy expectation of AbstractSnippetTestsEngine. - Suppress bogus IntelliJ warnings in expected output files. - Add trailing newline to expected output files.

Motivation: - improve test coverage of server mode - verify that "--initialize-at-run-time=org.msgpack.core.buffer.DirectBufferAccess" works fine and opening JDK internals with "--add-opens" isn't required Changes: - split ServerTest into JvmServerTest and NativeServerTest - extract native executable paths to PklExecutablePaths

bioball · 2024-05-03T02:47:59Z

Look like you're right; --add-opens is not needed :D

bioball

Great work!

I think let's keep the language snippet stuff the way it is right now (see comment). But the other changes here look good to me.

bioball · 2024-05-03T03:35:18Z

pkl-commons-test/src/main/kotlin/org/pkl/commons/test/AbstractSnippetTestsEngine.kt

+ // replace line number with equivalent number of 'x' characters to keep formatting intact
+ (result.groups[1]!!.value) + "x".repeat(result.groups[3]!!.value.length) + " |"
+ }
+}


I don't know if this change improves our code; this moves some fields that are only relevant to the language snippet test engine (e.g. the package server, and also selection, which is a quality of life thing when working on snippet tests for pkl-core.

I think let's keep it the way it was. Also, we might move the binary evaluator into pkl-core, which means the binary evaluator snippet tests will change then, making this abstraction for naught.

It fixes dozens of IntelliJ warnings and eliminates code duplication. PackageServer is defined in commons-test and only started if needed.

I agree that binary evaluator belongs into pkl-core. Ideally, every interaction with the evaluator, even within the same JVM, would use the binary protocol.

Let's just suppress those warnings for now, then; once we move it into pkl-core, this will all change again, and we'd probably end up removing this abstraction layer if it exists.

The refactoring makes binary evaluator snippet tests more similar to snippet tests in pkl-core and eliminates code duplication between the two. It also generates output files with suppress warnings comments. I don't understand why you want to discard it.

The abstraction might create more work for us later on if we eliminate BinaryEvaluatorSnippetTests later on if we move the binary evaluator into pkl-core.

Also, this would give us: LanguageSnippetTestsEngine -> AbstractLanguageSnippetTestsEngine -> AbstractSnippetTestsEngine -> InputOutputTestEngine -> HierarchicalTestEngine. That's just a lot of complexity that adds cognitive burden when trying to understand our test code.

It also moves properties that either shouldn't be moved, or don't have any meaning for the binary evaluator snippet test (val selection: String = "" is a property that exists for the purpose of working on the snippet tests in pkl-core).

To reduce the friction, I'd be willing to accept this PR if we moved these members into AbstractLanguageSnippetTestsEngine:

packageServer

selection

includedTests

excludedTests

Path.getProjectDir()

bioball · 2024-05-03T03:40:44Z

pkl-commons-test/src/main/kotlin/org/pkl/commons/test/PklExecutablePaths.kt

+ val firstExisting: Path
+ get() = existing.first()


Suggested change

val firstExisting: Path

get() = existing.first()

val firstExisting: Path

get() {

require(existing.isNotEmpty()) {

"Native executable not found on system. Create one with `./gradlew buildNative`."

}

return existing.first()

}

added a slightly modified version that's similar to existing code

bioball · 2024-05-03T03:43:03Z

pkl-server/src/test/kotlin/org/pkl/server/NativeServerTest.kt

+ client.close()
+ server.destroy()
+ }
+}


Very nice! Great addition!

I just hope that nativeTest.dependsOn(":pkl-cli:assembleNative") is OK from a build cache perspective. (assembleNative is a lifecycle task that doesn't declare outputs.)

bioball requested changes Apr 26, 2024

View reviewed changes

translatenix force-pushed the graalvm-23 branch from f7b28a0 to cb539d6 Compare April 27, 2024 00:39

stackoverflow reviewed Apr 30, 2024

View reviewed changes

translatenix force-pushed the graalvm-23 branch 2 times, most recently from cb539d6 to 561589c Compare May 1, 2024 07:41

translatenix force-pushed the graalvm-23 branch from 561589c to 2ee13f0 Compare May 1, 2024 08:52

bioball requested changes May 1, 2024

View reviewed changes

translatenix added 4 commits May 2, 2024 13:17

Improve configuration of native image compilation

980d24e

- Solve msgpack issue with `--initialize-at-run-time`. - Use quick build mode for non-release builds: 40% faster compilation, 20% smaller executable. - Remove options that were commented out.

Eliminate duplicate task configuration code

f65af91

translatenix force-pushed the graalvm-23 branch from 98c136f to 48c4433 Compare May 2, 2024 20:17

bioball requested changes May 3, 2024

View reviewed changes

Incorporate review feedback

55de23e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve configuration of native image compilation #455

Improve configuration of native image compilation #455

translatenix commented Apr 25, 2024

bioball Apr 26, 2024

translatenix Apr 27, 2024

bioball Apr 29, 2024

bioball Apr 26, 2024 •

edited

translatenix Apr 27, 2024 •

edited

bioball Apr 29, 2024 •

edited

stackoverflow Apr 30, 2024

holzensp May 1, 2024

translatenix commented Apr 29, 2024

bioball commented Apr 29, 2024 •

edited

translatenix commented Apr 29, 2024

bioball commented Apr 30, 2024 •

edited

bioball commented Apr 30, 2024 •

edited

stackoverflow Apr 30, 2024

translatenix Apr 30, 2024

stackoverflow Apr 30, 2024

translatenix commented May 1, 2024 •

edited

bioball left a comment •

edited

translatenix commented May 1, 2024 •

edited

translatenix commented May 2, 2024 •

edited

bioball commented May 3, 2024

bioball left a comment

bioball May 3, 2024

translatenix May 3, 2024 •

edited

bioball May 3, 2024 •

edited

translatenix May 4, 2024 •

edited

bioball May 6, 2024

bioball May 3, 2024

translatenix May 4, 2024 •

edited

bioball May 3, 2024

translatenix May 4, 2024

Improve configuration of native image compilation #455

Are you sure you want to change the base?

Improve configuration of native image compilation #455

Conversation

translatenix commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bioball Apr 26, 2024 • edited

Choose a reason for hiding this comment

translatenix Apr 27, 2024 • edited

Choose a reason for hiding this comment

bioball Apr 29, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix commented Apr 29, 2024

bioball commented Apr 29, 2024 • edited

translatenix commented Apr 29, 2024

bioball commented Apr 30, 2024 • edited

bioball commented Apr 30, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix commented May 1, 2024 • edited

bioball left a comment • edited

Choose a reason for hiding this comment

translatenix commented May 1, 2024 • edited

translatenix commented May 2, 2024 • edited

bioball commented May 3, 2024

bioball left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix May 3, 2024 • edited

Choose a reason for hiding this comment

bioball May 3, 2024 • edited

Choose a reason for hiding this comment

translatenix May 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix May 4, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bioball Apr 26, 2024 •

edited

translatenix Apr 27, 2024 •

edited

bioball Apr 29, 2024 •

edited

bioball commented Apr 29, 2024 •

edited

bioball commented Apr 30, 2024 •

edited

bioball commented Apr 30, 2024 •

edited

translatenix commented May 1, 2024 •

edited

bioball left a comment •

edited

translatenix commented May 1, 2024 •

edited

translatenix commented May 2, 2024 •

edited

translatenix May 3, 2024 •

edited

bioball May 3, 2024 •

edited

translatenix May 4, 2024 •

edited

translatenix May 4, 2024 •

edited