create a tool that checks zig's cpu feature detection vs llvm's #19793

andrewrk · 2024-04-28T20:15:09Z

With each LLVM upgrade, both LLVM and Zig's target CPU feature set changes. However, Zig does its own host CPU feature detection rather than relying on LLVM's implementation, because Zig also needs it for its own backends.

Even so, we need a way to find out when the CPU feature detection code needs an update. So let's create some tooling to help with the upgrade process.

The tool runs zig's CPU feature detection on the host, and then runs LLVM's CPU feature detection on the host, and then provides a diff. This diff then helps the Zig maintainer doing the LLVM upgrade (usually me) to know if the CPU feature detection code needs to be updated.

andrewrk · 2024-04-28T20:35:38Z

andy@bark ~> ls ~/local/llvm18-assert/bin/
amdgpu-arch             git-clang-format    llvm-cxxmap              llvm-mc          llvm-size
analyze-build           hmaptool            llvm-debuginfo-analyzer  llvm-mca         llvm-split
bugpoint                intercept-build     llvm-debuginfod          llvm-ml          llvm-stress
c-index-test            ld64.lld            llvm-debuginfod-find     llvm-modextract  llvm-strings
clang                   ld.lld              llvm-diff                llvm-mt          llvm-strip
clang++                 llc                 llvm-dis                 llvm-nm          llvm-symbolizer
clang-18                lld                 llvm-dlltool             llvm-objcopy     llvm-tblgen
clang-check             lld-link            llvm-dwarfdump           llvm-objdump     llvm-tli-checker
clang-cl                lli                 llvm-dwarfutil           llvm-opt-report  llvm-undname
clang-cpp               llvm-addr2line      llvm-dwp                 llvm-otool       llvm-windres
clang-extdef-mapping    llvm-ar             llvm-exegesis            llvm-pdbutil     llvm-xray
clang-format            llvm-as             llvm-extract             llvm-profdata    nvptx-arch
clang-linker-wrapper    llvm-bcanalyzer     llvm-gsymutil            llvm-profgen     opt
clang-offload-bundler   llvm-bitcode-strip  llvm-ifs                 llvm-ranlib      sancov
clang-offload-packager  llvm-cat            llvm-install-name-tool   llvm-rc          sanstats
clang-refactor          llvm-cfi-verify     llvm-jitlink             llvm-readelf     scan-build
clang-rename            llvm-config         llvm-lib                 llvm-readobj     scan-build-py
clang-repl              llvm-cov            llvm-libtool-darwin      llvm-readtapi    scan-view
clang-scan-deps         llvm-c-test         llvm-link                llvm-reduce      verify-uselistorder
clang-tblgen            llvm-cvtres         llvm-lipo                llvm-remarkutil  wasm-ld
diagtool                llvm-cxxdump        llvm-lto                 llvm-rtdyld
dsymutil                llvm-cxxfilt        llvm-lto2                llvm-sim

You would think that one of these would have the ability to print the set of detected native CPU features. I haven't figured out if such functionality already exists or not.

andrewrk · 2024-04-28T20:38:26Z

ok, I figured out one way to do it:

andy@bark ~/tmp> cat empty.c 
int main(int argc, char **argv) {
    return 0;
}
andy@bark ~/tmp> ~/local/llvm18-assert/bin/clang -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="znver4" "target-features"="+64bit,+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx10.1-256,-avx10.1-512,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-fma4,-hreset,-kl,-lwp,-movdir64b,-movdiri,-pconfig,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-rtm,-serialize,-sgx,-sha512,-sm3,-sm4,-tbm,-tsxldtrk,-uintr,-usermsr,-waitpkg,-widekl,-xop" }

Then compare with zig:

andy@bark ~/tmp> ~/src/zig/build-release/stage4/bin/zig cc -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone sspstrong uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="4" "target-cpu"="znver3" "target-features"="+64bit,+adx,+aes,+allow-light-256-bit,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+branchfusion,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+f16c,+fast-15bytenop,+fast-bextr,+fast-lzcnt,+fast-movbe,+fast-scalar-fsqrt,+fast-scalar-shift-masks,+fast-variable-perlane-shuffle,+fast-vector-fsqrt,+fma,+fsgsbase,+fsrm,+fxsr,+gfni,+invpcid,+lzcnt,+macrofusion,+mmx,+movbe,+mwaitx,+nopl,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sbb-dep-breaking,+sha,+shstk,+slow-shld,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+vzeroupper,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-16bit-mode,-32bit-mode,-3dnow,-3dnowa,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-ermsb,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,-false-deps-popcnt,-false-deps-range,-fast-11bytenop,-fast-7bytenop,-fast-gather,-fast-hops,-fast-shld-rotate,-fast-variable-crosslane-shuffle,-fast-vector-shift-masks,-faster-shift-than-shuffle,-fma4,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,-idivq-to-divl,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,-movdir64b,-movdiri,-no-bypass-delay,-no-bypass-delay-blend,-no-bypass-delay-mov,-no-bypass-delay-shuffle,-pad-short-functions,-pconfig,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefer-movmsk-over-vtest,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,-serialize,-seses,-sgx,-sha512,-slow-3ops-lea,-slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-sm3,-sm4,-soft-float,-sse-unaligned-mem,-tagged-globals,-tbm,-tsxldtrk,-tuning-fast-imm-vector-shift,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,-waitpkg,-widekl,-xop" "tune-cpu"="generic" }

andrewrk · 2024-04-28T20:45:25Z

Using this method we see that llvm18 branch gets the CPU wrong ("target-cpu"="znver4" vs "target-cpu"="znver3") as well as coming up with different set of features:

--- 1	2024-04-28 13:42:58.412083487 -0700
+++ 2	2024-04-28 13:43:38.671987175 -0700
@@ -1,6 +1,7 @@
 +64bit
 +adx
 +aes
++allow-light-256-bit
 +avx
 +avx2
 +avx512bf16
@@ -17,6 +18,7 @@
 +avx512vpopcntdq
 +bmi
 +bmi2
++branchfusion
 +clflushopt
 +clwb
 +clzero
@@ -24,17 +26,27 @@
 +crc32
 +cx16
 +cx8
-+evex512
 +f16c
++fast-15bytenop
++fast-bextr
++fast-lzcnt
++fast-movbe
++fast-scalar-fsqrt
++fast-scalar-shift-masks
++fast-variable-perlane-shuffle
++fast-vector-fsqrt
 +fma
 +fsgsbase
++fsrm
 +fxsr
 +gfni
 +invpcid
 +lzcnt
++macrofusion
 +mmx
 +movbe
 +mwaitx
++nopl
 +pclmul
 +pku
 +popcnt
@@ -44,8 +56,10 @@
 +rdrnd
 +rdseed
 +sahf
++sbb-dep-breaking
 +sha
 +shstk
++slow-shld
 +sse
 +sse2
 +sse3
@@ -55,19 +69,22 @@
 +ssse3
 +vaes
 +vpclmulqdq
++vzeroupper
 +wbnoinvd
 +x87
 +xsave
 +xsavec
 +xsaveopt
 +xsaves
+-16bit-mode
+-32bit-mode
+-3dnow
+-3dnowa
 -amx-bf16
 -amx-complex
 -amx-fp16
 -amx-int8
 -amx-tile
--avx10.1-256
--avx10.1-512
 -avx512er
 -avx512fp16
 -avx512pf
@@ -80,27 +97,78 @@
 -cldemote
 -cmpccxadd
 -enqcmd
+-ermsb
+-false-deps-getmant
+-false-deps-lzcnt-tzcnt
+-false-deps-mulc
+-false-deps-mullq
+-false-deps-perm
+-false-deps-popcnt
+-false-deps-range
+-fast-11bytenop
+-fast-7bytenop
+-fast-gather
+-fast-hops
+-fast-shld-rotate
+-fast-variable-crosslane-shuffle
+-fast-vector-shift-masks
+-faster-shift-than-shuffle
 -fma4
+-harden-sls-ijmp
+-harden-sls-ret
 -hreset
+-idivl-to-divb
+-idivq-to-divl
 -kl
+-lea-sp
+-lea-uses-ag
+-lvi-cfi
+-lvi-load-hardening
 -lwp
 -movdir64b
 -movdiri
+-no-bypass-delay
+-no-bypass-delay-blend
+-no-bypass-delay-mov
+-no-bypass-delay-shuffle
+-pad-short-functions
 -pconfig
+-prefer-128-bit
+-prefer-256-bit
+-prefer-mask-registers
+-prefer-movmsk-over-vtest
 -prefetchi
 -prefetchwt1
 -ptwrite
 -raoint
+-retpoline
+-retpoline-external-thunk
+-retpoline-indirect-branches
+-retpoline-indirect-calls
 -rtm
 -serialize
+-seses
 -sgx
 -sha512
+-slow-3ops-lea
+-slow-incdec
+-slow-lea
+-slow-pmaddwd
+-slow-pmulld
+-slow-two-mem-ops
+-slow-unaligned-mem-16
+-slow-unaligned-mem-32
 -sm3
 -sm4
+-soft-float
+-sse-unaligned-mem
+-tagged-globals
 -tbm
 -tsxldtrk
+-tuning-fast-imm-vector-shift
 -uintr
--usermsr
+-use-glm-div-sqrt-costs
+-use-slm-arith-costs
 -waitpkg
 -widekl
 -xop

Notably, evex512 is missing, which was causing llvm/llvm-project#90356

In conclusion, we don't need a tool, but rather need to put this process into the upgrade instructions.

topperc · 2024-04-29T01:55:12Z

Is LLVM misidentifying a znver3 as znver4?

andrewrk · 2024-04-29T02:04:41Z

I'm not sure what the correct answer is yet but this host is a AMD Ryzen 9 7950X, and currently zig is identifying it as znver3 while LLVM 18 is identifying it as a znver4. Likely LLVM is correct here since I have not yet touched the CPU detection logic in the llvm18 upgrade branch of Zig.

I'm working on a tool to help identify when Zig and LLVM disagree on the host CPU and its feature set so that we can be sure the detection logic is working correctly.

Edit: looks like the correct answer is Zen 4, so LLVM is indeed correct here.

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk · 2024-04-29T03:03:39Z

Alright I've implemented this in the llvm18 branch in dc6cb4c. It uses LLVMGetHostCPUName and LLVMGetHostCPUFeatures and usage looks like this:

zig detect-cpu > ~/tmp/1
zig detect-cpu --llvm > ~/tmp/2
diff -u ~/tmp/{1,2}

--- /home/andy/tmp/1    2024-04-28 19:56:07.407829265 -0700
+++ /home/andy/tmp/2    2024-04-28 19:56:25.154802676 -0700
@@ -1,4 +1,4 @@
-znver3
+znver4
 -16bit-mode
 -32bit-mode
 -3dnow
@@ -54,7 +54,7 @@
 -egpr
 -enqcmd
 -ermsb
--evex512
++evex512
 +f16c
 -false-deps-getmant
 -false-deps-lzcnt-tzcnt

This makes it clear what is happening: CPU feature detection failed to pick up that it was a znver4, and thereby missed turning on the CPU features enabled for that model, which includes evex512.

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. backend-llvm The LLVM backend outputs an LLVM IR Module. labels Apr 28, 2024

andrewrk added this to the 0.13.0 milestone Apr 28, 2024

andrewrk added a commit that referenced this issue Apr 29, 2024

add detect-cpu subcommand for debugging CPU features

dc6cb4c

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk added a commit that referenced this issue Apr 30, 2024

add detect-cpu subcommand for debugging CPU features

6ca39bf

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk added a commit that referenced this issue May 1, 2024

add detect-cpu subcommand for debugging CPU features

e47e4b8

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk added a commit that referenced this issue May 3, 2024

add detect-cpu subcommand for debugging CPU features

d07d69c

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk added a commit that referenced this issue May 8, 2024

add detect-cpu subcommand for debugging CPU features

276de2e

This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot during LLVM upgrades. closes #19793

andrewrk closed this as completed in 78002db May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create a tool that checks zig's cpu feature detection vs llvm's #19793

create a tool that checks zig's cpu feature detection vs llvm's #19793

andrewrk commented Apr 28, 2024

andrewrk commented Apr 28, 2024

andrewrk commented Apr 28, 2024 •

edited

andrewrk commented Apr 28, 2024 •

edited

topperc commented Apr 29, 2024

andrewrk commented Apr 29, 2024 •

edited

andrewrk commented Apr 29, 2024

create a tool that checks zig's cpu feature detection vs llvm's #19793

create a tool that checks zig's cpu feature detection vs llvm's #19793

Comments

andrewrk commented Apr 28, 2024

andrewrk commented Apr 28, 2024

andrewrk commented Apr 28, 2024 • edited

andrewrk commented Apr 28, 2024 • edited

topperc commented Apr 29, 2024

andrewrk commented Apr 29, 2024 • edited

andrewrk commented Apr 29, 2024

andrewrk commented Apr 28, 2024 •

edited

andrewrk commented Apr 28, 2024 •

edited

andrewrk commented Apr 29, 2024 •

edited