New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create a tool that checks zig's cpu feature detection vs llvm's #19793
Comments
andy@bark ~> ls ~/local/llvm18-assert/bin/
amdgpu-arch git-clang-format llvm-cxxmap llvm-mc llvm-size
analyze-build hmaptool llvm-debuginfo-analyzer llvm-mca llvm-split
bugpoint intercept-build llvm-debuginfod llvm-ml llvm-stress
c-index-test ld64.lld llvm-debuginfod-find llvm-modextract llvm-strings
clang ld.lld llvm-diff llvm-mt llvm-strip
clang++ llc llvm-dis llvm-nm llvm-symbolizer
clang-18 lld llvm-dlltool llvm-objcopy llvm-tblgen
clang-check lld-link llvm-dwarfdump llvm-objdump llvm-tli-checker
clang-cl lli llvm-dwarfutil llvm-opt-report llvm-undname
clang-cpp llvm-addr2line llvm-dwp llvm-otool llvm-windres
clang-extdef-mapping llvm-ar llvm-exegesis llvm-pdbutil llvm-xray
clang-format llvm-as llvm-extract llvm-profdata nvptx-arch
clang-linker-wrapper llvm-bcanalyzer llvm-gsymutil llvm-profgen opt
clang-offload-bundler llvm-bitcode-strip llvm-ifs llvm-ranlib sancov
clang-offload-packager llvm-cat llvm-install-name-tool llvm-rc sanstats
clang-refactor llvm-cfi-verify llvm-jitlink llvm-readelf scan-build
clang-rename llvm-config llvm-lib llvm-readobj scan-build-py
clang-repl llvm-cov llvm-libtool-darwin llvm-readtapi scan-view
clang-scan-deps llvm-c-test llvm-link llvm-reduce verify-uselistorder
clang-tblgen llvm-cvtres llvm-lipo llvm-remarkutil wasm-ld
diagtool llvm-cxxdump llvm-lto llvm-rtdyld
dsymutil llvm-cxxfilt llvm-lto2 llvm-sim You would think that one of these would have the ability to print the set of detected native CPU features. I haven't figured out if such functionality already exists or not. |
ok, I figured out one way to do it: andy@bark ~/tmp> cat empty.c
int main(int argc, char **argv) {
return 0;
}
andy@bark ~/tmp> ~/local/llvm18-assert/bin/clang -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="znver4" "target-features"="+64bit,+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx10.1-256,-avx10.1-512,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-fma4,-hreset,-kl,-lwp,-movdir64b,-movdiri,-pconfig,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-rtm,-serialize,-sgx,-sha512,-sm3,-sm4,-tbm,-tsxldtrk,-uintr,-usermsr,-waitpkg,-widekl,-xop" } Then compare with zig: andy@bark ~/tmp> ~/src/zig/build-release/stage4/bin/zig cc -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone sspstrong uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="4" "target-cpu"="znver3" "target-features"="+64bit,+adx,+aes,+allow-light-256-bit,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+branchfusion,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+f16c,+fast-15bytenop,+fast-bextr,+fast-lzcnt,+fast-movbe,+fast-scalar-fsqrt,+fast-scalar-shift-masks,+fast-variable-perlane-shuffle,+fast-vector-fsqrt,+fma,+fsgsbase,+fsrm,+fxsr,+gfni,+invpcid,+lzcnt,+macrofusion,+mmx,+movbe,+mwaitx,+nopl,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sbb-dep-breaking,+sha,+shstk,+slow-shld,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+vzeroupper,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-16bit-mode,-32bit-mode,-3dnow,-3dnowa,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-ermsb,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,-false-deps-popcnt,-false-deps-range,-fast-11bytenop,-fast-7bytenop,-fast-gather,-fast-hops,-fast-shld-rotate,-fast-variable-crosslane-shuffle,-fast-vector-shift-masks,-faster-shift-than-shuffle,-fma4,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,-idivq-to-divl,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,-movdir64b,-movdiri,-no-bypass-delay,-no-bypass-delay-blend,-no-bypass-delay-mov,-no-bypass-delay-shuffle,-pad-short-functions,-pconfig,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefer-movmsk-over-vtest,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,-serialize,-seses,-sgx,-sha512,-slow-3ops-lea,-slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-sm3,-sm4,-soft-float,-sse-unaligned-mem,-tagged-globals,-tbm,-tsxldtrk,-tuning-fast-imm-vector-shift,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,-waitpkg,-widekl,-xop" "tune-cpu"="generic" } |
Using this method we see that llvm18 branch gets the CPU wrong ("target-cpu"="znver4" vs "target-cpu"="znver3") as well as coming up with different set of features: --- 1 2024-04-28 13:42:58.412083487 -0700
+++ 2 2024-04-28 13:43:38.671987175 -0700
@@ -1,6 +1,7 @@
+64bit
+adx
+aes
++allow-light-256-bit
+avx
+avx2
+avx512bf16
@@ -17,6 +18,7 @@
+avx512vpopcntdq
+bmi
+bmi2
++branchfusion
+clflushopt
+clwb
+clzero
@@ -24,17 +26,27 @@
+crc32
+cx16
+cx8
-+evex512
+f16c
++fast-15bytenop
++fast-bextr
++fast-lzcnt
++fast-movbe
++fast-scalar-fsqrt
++fast-scalar-shift-masks
++fast-variable-perlane-shuffle
++fast-vector-fsqrt
+fma
+fsgsbase
++fsrm
+fxsr
+gfni
+invpcid
+lzcnt
++macrofusion
+mmx
+movbe
+mwaitx
++nopl
+pclmul
+pku
+popcnt
@@ -44,8 +56,10 @@
+rdrnd
+rdseed
+sahf
++sbb-dep-breaking
+sha
+shstk
++slow-shld
+sse
+sse2
+sse3
@@ -55,19 +69,22 @@
+ssse3
+vaes
+vpclmulqdq
++vzeroupper
+wbnoinvd
+x87
+xsave
+xsavec
+xsaveopt
+xsaves
+-16bit-mode
+-32bit-mode
+-3dnow
+-3dnowa
-amx-bf16
-amx-complex
-amx-fp16
-amx-int8
-amx-tile
--avx10.1-256
--avx10.1-512
-avx512er
-avx512fp16
-avx512pf
@@ -80,27 +97,78 @@
-cldemote
-cmpccxadd
-enqcmd
+-ermsb
+-false-deps-getmant
+-false-deps-lzcnt-tzcnt
+-false-deps-mulc
+-false-deps-mullq
+-false-deps-perm
+-false-deps-popcnt
+-false-deps-range
+-fast-11bytenop
+-fast-7bytenop
+-fast-gather
+-fast-hops
+-fast-shld-rotate
+-fast-variable-crosslane-shuffle
+-fast-vector-shift-masks
+-faster-shift-than-shuffle
-fma4
+-harden-sls-ijmp
+-harden-sls-ret
-hreset
+-idivl-to-divb
+-idivq-to-divl
-kl
+-lea-sp
+-lea-uses-ag
+-lvi-cfi
+-lvi-load-hardening
-lwp
-movdir64b
-movdiri
+-no-bypass-delay
+-no-bypass-delay-blend
+-no-bypass-delay-mov
+-no-bypass-delay-shuffle
+-pad-short-functions
-pconfig
+-prefer-128-bit
+-prefer-256-bit
+-prefer-mask-registers
+-prefer-movmsk-over-vtest
-prefetchi
-prefetchwt1
-ptwrite
-raoint
+-retpoline
+-retpoline-external-thunk
+-retpoline-indirect-branches
+-retpoline-indirect-calls
-rtm
-serialize
+-seses
-sgx
-sha512
+-slow-3ops-lea
+-slow-incdec
+-slow-lea
+-slow-pmaddwd
+-slow-pmulld
+-slow-two-mem-ops
+-slow-unaligned-mem-16
+-slow-unaligned-mem-32
-sm3
-sm4
+-soft-float
+-sse-unaligned-mem
+-tagged-globals
-tbm
-tsxldtrk
+-tuning-fast-imm-vector-shift
-uintr
--usermsr
+-use-glm-div-sqrt-costs
+-use-slm-arith-costs
-waitpkg
-widekl
-xop Notably, In conclusion, we don't need a tool, but rather need to put this process into the upgrade instructions. |
Is LLVM misidentifying a znver3 as znver4? |
I'm not sure what the correct answer is yet but this host is a AMD Ryzen 9 7950X, and currently zig is identifying it as znver3 while LLVM 18 is identifying it as a znver4. Likely LLVM is correct here since I have not yet touched the CPU detection logic in the llvm18 upgrade branch of Zig. I'm working on a tool to help identify when Zig and LLVM disagree on the host CPU and its feature set so that we can be sure the detection logic is working correctly. Edit: looks like the correct answer is Zen 4, so LLVM is indeed correct here. |
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793
Alright I've implemented this in the
--- /home/andy/tmp/1 2024-04-28 19:56:07.407829265 -0700
+++ /home/andy/tmp/2 2024-04-28 19:56:25.154802676 -0700
@@ -1,4 +1,4 @@
-znver3
+znver4
-16bit-mode
-32bit-mode
-3dnow
@@ -54,7 +54,7 @@
-egpr
-enqcmd
-ermsb
--evex512
++evex512
+f16c
-false-deps-getmant
-false-deps-lzcnt-tzcnt This makes it clear what is happening: CPU feature detection failed to pick up that it was a znver4, and thereby missed turning on the CPU features enabled for that model, which includes |
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793
With each LLVM upgrade, both LLVM and Zig's target CPU feature set changes. However, Zig does its own host CPU feature detection rather than relying on LLVM's implementation, because Zig also needs it for its own backends.
Even so, we need a way to find out when the CPU feature detection code needs an update. So let's create some tooling to help with the upgrade process.
The tool runs zig's CPU feature detection on the host, and then runs LLVM's CPU feature detection on the host, and then provides a diff. This diff then helps the Zig maintainer doing the LLVM upgrade (usually me) to know if the CPU feature detection code needs to be updated.
The text was updated successfully, but these errors were encountered: