Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve configuration of native image compilation #455

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

translatenix
Copy link
Contributor

  • Solve msgpack issue with --initialize-at-run-time.
  • Use quick build mode for 40% faster compilation and 20% smaller executable (tested on Linux amd64). If necessary, this can be disabled for release builds. Note that with the move from GraalVM CE 22 to Oracle GraalVM 23, native image compilation time has doubled and binary size has increased to 120 MiB on Linux amd64.
  • Use Intel Skylake (2015) CPU features for amd64 executables.
  • Remove options that are commented out.

// disable automatic support for JVM CLI options (puts our main class in full control of argument parsing)
,"-H:-ParseRuntimeOptions"
// quick build mode: 40% faster compilation, 20% smaller (but presumably also slower) executable
,"-Ob"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition! Let's use buildInfo.isReleaseBuild to toggle this off for our actual releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that binary size keeps increasing (now at 120 MiB for amd64). I think it's worth trying to switch to --initialize-at-runtime for most things other than the Truffle interpreter. I also tried switching to the G1 garbage collector, but this added another 20 MiB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to Truffle, we also want to initialize the standard library at build time, to cut down the overhead of Pkl execution.

There's certainly some tuning that we can do here, though.

pkl-cli/pkl-cli.gradle.kts Show resolved Hide resolved
// Tested on ZEN 2.
// Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT,
// LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT
, if (buildInfo.arch == "amd64") "-march=skylake" else ""
Copy link
Contributor

@bioball bioball Apr 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Does this mean native executables are not compatible with Intel CPUs older than 2015, and AMD CPUs older than 2017? And if so, what do we get from this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this raises the baseline for supported CPUs from "x64-v3" (default) to "skylake". In return, the native image compiler can use many additional CPU features/instructions, resulting in faster and more efficient assembly code.
I arrived at this by checking the output of -march=list. "skylake" seemed to be a good middle ground between supporting a wide range of CPUs and leveraging many CPU features.

https://www.graalvm.org/latest/reference-manual/native-image/overview/BuildOutput/#recommendation-cpu

For a native Java app that runs on known (cloud) hardware, it would make sense to be more aggressive.

Copy link
Contributor

@bioball bioball Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results. I'm on an M1 machine, so I can't really test this myself alas.

To run tests, you can ./gradlew pkl-core:testLinuxExecutableAmd64 assuming you're driving this through WSL.

These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.

CC @holzensp @stackoverflow for comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @bioball and @stackoverflow. We're also considering spending a version-cycle on GraalVM/Truffle/native-image updates and perf (including binary size). For now, let's be quite conservative.

@translatenix
Copy link
Contributor Author

How much perf do we gain from targeting skylake? Can you enable the native CLI builds to CI so we can compare it to the default architecture? Or, if you're on an x86 machine, feel free to just run some tests locally and report the results.

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

These CLIs are libraries, and have a broad set of use-cases. For that reason, I'm thinking that we should probably be conservative here. We probably shouldn't just bump the version, but if there's a significant perf gain, we can consider adding this as an additional architecture.

Your call. I feel that “skylake” is fairly conservative yet enables most CPU features. “x64-v3” is an arbitrary GraalVM default—we should figure out what’s right for Pkl. Also, this is easy to undo depending on feedback.

@bioball
Copy link
Contributor

bioball commented Apr 29, 2024

Does Pkl have a comprehensive benchmark suite? If you want proof, that’s probably the only way.

Not yet. We have some very lightweight benchmark tests, but we don't have an environment (yet) that can run workloads with bare metal isolation. Until then, the best we can do is run benchmarks locally then compare.

@translatenix
Copy link
Contributor Author

Until then, the best we can do is run benchmarks locally then compare

But what benchmarks do you run? ./gradlew pkl-core:testLinuxExecutableAmd64 doesn’t strike me as a meaningful benchmark.

@bioball
Copy link
Contributor

bioball commented Apr 30, 2024

There is an in-language stdlib module that we use to write one-off benchmarks when we want to test specific things when iterating locally.

There's also the bench:jmh task, although we haven't been using that nearly as much.

The suggestion of running testLinuxExecutableAmd64 is just a coarse-grained way to get some signal, but definitely not a real benchmark.

@bioball
Copy link
Contributor

bioball commented Apr 30, 2024

Here's a quick benchmark that might be a good starting point for comparison. Save this to a file somewhere, then pkl eval <file>.pkl to execute the benchmark.

amends "pkl:Benchmark"

import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/internal/ModulesGenerator.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/Parser.pkl"
import "package://pkg.pkl-lang.org/pkl-pantry/[email protected]#/URI.pkl"

local githubActionJson = read("https://json.schemastore.org/github-action.json")

microbenchmarks {
  ["json schema generator"] {
    expression = 
      let (parsed = Parser.parse(githubActionJson))
        new ModulesGenerator {
          rootSchema = parsed
          baseUri = URI.parse("file:///foo")!!
        }
          .modules
          .map((it) -> it.moduleNode.output.text)
  }
}

@@ -753,6 +753,9 @@ workflows:
- gradle-check-jdk21:
requires:
- hold
- pkl-cli-linux-alpine-amd64-snapshot:
requires:
- hold
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you change config.pkl and forgot to commit, or someone else forgot to commit their changes to config.pkl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This commit belongs to a different PR, and I don't know how it ended up here. Removed.

// Tested on ZEN 2.
// Feature list: CX8, CMOV, FXSR, MMX, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT,
// LZCNT, AVX, AVX2, AES, CLMUL, BMI1, BMI2, FMA, AMD_3DNOW_PREFETCH, ADX, FLUSHOPT
, if (buildInfo.arch == "amd64") "-march=skylake" else ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Dan that we should be conservative here, even though skylake is almost 9 years old. Specially without any performance data to compare. If the performance bump is noticeable we can create a new architecture release or just drop old CPUs in the new version.

@translatenix translatenix force-pushed the graalvm-23 branch 2 times, most recently from cb539d6 to 561589c Compare May 1, 2024 07:41
@translatenix
Copy link
Contributor Author

translatenix commented May 1, 2024

Here's a quick benchmark that might be a good starting point for comparison.

On my laptop, the variance of this benchmark is too high to draw any conclusions about-march settings.
However, I can tell that quick build mode significantly degrades performance, which isn't surprising.

-march=x86-64-v3 requires Intel Haswell (2013) or AMD Excavator (2015), which isn't far off from -march=skylake.
To move this PR forward, I'll remove -march=skylake.

Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thanks for running that benchmark for us!

The one remaining request is to undo the change to --add-opens here.

@translatenix
Copy link
Contributor Author

translatenix commented May 1, 2024

The one remaining request is to undo the change to --add-opens here.

I’m still working on that. I’m not convinced that --add-opens is required. After all, it isn’t required when running on the JVM.

@translatenix
Copy link
Contributor Author

translatenix commented May 2, 2024

Ready to review from my side. See commit messages for details.

The "Improve BinaryEvaluatorSnippetTests(Engine)" refactoring isn't essential because I ultimately decided to add NativeServerTest instead of NativeBinaryEvaluatorSnippetTests. But I think it's a worthwhile improvement regardless.

- Solve msgpack issue with `--initialize-at-run-time`.
- Use quick build mode for non-release builds:
  40% faster compilation, 20% smaller executable.
- Remove options that were commented out.
- Extract AbstractSnippetTestsEngine from AbstractLanguageSnippetTestsEngine
  and reuse it for BinaryEvaluatorSnippetTestEngine.
- Rename BinaryEvaluatorSnippetTestEngine to BinaryEvaluatorSnippetTestsEngine.
- Rename expected output files to end in ".yml.pkl"
  to satisfy expectation of AbstractSnippetTestsEngine.
- Suppress bogus IntelliJ warnings in expected output files.
- Add trailing newline to expected output files.
Motivation:
- improve test coverage of server mode
- verify that "--initialize-at-run-time=org.msgpack.core.buffer.DirectBufferAccess"
  works fine and opening JDK internals with "--add-opens" isn't required

Changes:
- split ServerTest into JvmServerTest and NativeServerTest
- extract native executable paths to PklExecutablePaths
@bioball
Copy link
Contributor

bioball commented May 3, 2024

Look like you're right; --add-opens is not needed :D

Copy link
Contributor

@bioball bioball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

I think let's keep the language snippet stuff the way it is right now (see comment). But the other changes here look good to me.

// replace line number with equivalent number of 'x' characters to keep formatting intact
(result.groups[1]!!.value) + "x".repeat(result.groups[3]!!.value.length) + " |"
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this change improves our code; this moves some fields that are only relevant to the language snippet test engine (e.g. the package server, and also selection, which is a quality of life thing when working on snippet tests for pkl-core.

I think let's keep it the way it was. Also, we might move the binary evaluator into pkl-core, which means the binary evaluator snippet tests will change then, making this abstraction for naught.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fixes dozens of IntelliJ warnings and eliminates code duplication. PackageServer is defined in commons-test and only started if needed.

I agree that binary evaluator belongs into pkl-core. Ideally, every interaction with the evaluator, even within the same JVM, would use the binary protocol.

Copy link
Contributor

@bioball bioball May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just suppress those warnings for now, then; once we move it into pkl-core, this will all change again, and we'd probably end up removing this abstraction layer if it exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring makes binary evaluator snippet tests more similar to snippet tests in pkl-core and eliminates code duplication between the two. It also generates output files with suppress warnings comments. I don't understand why you want to discard it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abstraction might create more work for us later on if we eliminate BinaryEvaluatorSnippetTests later on if we move the binary evaluator into pkl-core.

Also, this would give us: LanguageSnippetTestsEngine -> AbstractLanguageSnippetTestsEngine -> AbstractSnippetTestsEngine -> InputOutputTestEngine -> HierarchicalTestEngine. That's just a lot of complexity that adds cognitive burden when trying to understand our test code.

It also moves properties that either shouldn't be moved, or don't have any meaning for the binary evaluator snippet test (val selection: String = "" is a property that exists for the purpose of working on the snippet tests in pkl-core).

To reduce the friction, I'd be willing to accept this PR if we moved these members into AbstractLanguageSnippetTestsEngine:

  • packageServer
  • selection
  • includedTests
  • excludedTests
  • Path.getProjectDir()

Comment on lines 35 to 36
val firstExisting: Path
get() = existing.first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val firstExisting: Path
get() = existing.first()
val firstExisting: Path
get() {
require(existing.isNotEmpty()) {
"Native executable not found on system. Create one with `./gradlew buildNative`."
}
return existing.first()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a slightly modified version that's similar to existing code

client.close()
server.destroy()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Great addition!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just hope that nativeTest.dependsOn(":pkl-cli:assembleNative") is OK from a build cache perspective. (assembleNative is a lifecycle task that doesn't declare outputs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants