Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create a tool that checks zig's cpu feature detection vs llvm's #19793

Closed
andrewrk opened this issue Apr 28, 2024 · 6 comments
Closed

create a tool that checks zig's cpu feature detection vs llvm's #19793

andrewrk opened this issue Apr 28, 2024 · 6 comments
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

With each LLVM upgrade, both LLVM and Zig's target CPU feature set changes. However, Zig does its own host CPU feature detection rather than relying on LLVM's implementation, because Zig also needs it for its own backends.

Even so, we need a way to find out when the CPU feature detection code needs an update. So let's create some tooling to help with the upgrade process.

The tool runs zig's CPU feature detection on the host, and then runs LLVM's CPU feature detection on the host, and then provides a diff. This diff then helps the Zig maintainer doing the LLVM upgrade (usually me) to know if the CPU feature detection code needs to be updated.

@andrewrk
Copy link
Member Author

andy@bark ~> ls ~/local/llvm18-assert/bin/
amdgpu-arch             git-clang-format    llvm-cxxmap              llvm-mc          llvm-size
analyze-build           hmaptool            llvm-debuginfo-analyzer  llvm-mca         llvm-split
bugpoint                intercept-build     llvm-debuginfod          llvm-ml          llvm-stress
c-index-test            ld64.lld            llvm-debuginfod-find     llvm-modextract  llvm-strings
clang                   ld.lld              llvm-diff                llvm-mt          llvm-strip
clang++                 llc                 llvm-dis                 llvm-nm          llvm-symbolizer
clang-18                lld                 llvm-dlltool             llvm-objcopy     llvm-tblgen
clang-check             lld-link            llvm-dwarfdump           llvm-objdump     llvm-tli-checker
clang-cl                lli                 llvm-dwarfutil           llvm-opt-report  llvm-undname
clang-cpp               llvm-addr2line      llvm-dwp                 llvm-otool       llvm-windres
clang-extdef-mapping    llvm-ar             llvm-exegesis            llvm-pdbutil     llvm-xray
clang-format            llvm-as             llvm-extract             llvm-profdata    nvptx-arch
clang-linker-wrapper    llvm-bcanalyzer     llvm-gsymutil            llvm-profgen     opt
clang-offload-bundler   llvm-bitcode-strip  llvm-ifs                 llvm-ranlib      sancov
clang-offload-packager  llvm-cat            llvm-install-name-tool   llvm-rc          sanstats
clang-refactor          llvm-cfi-verify     llvm-jitlink             llvm-readelf     scan-build
clang-rename            llvm-config         llvm-lib                 llvm-readobj     scan-build-py
clang-repl              llvm-cov            llvm-libtool-darwin      llvm-readtapi    scan-view
clang-scan-deps         llvm-c-test         llvm-link                llvm-reduce      verify-uselistorder
clang-tblgen            llvm-cvtres         llvm-lipo                llvm-remarkutil  wasm-ld
diagtool                llvm-cxxdump        llvm-lto                 llvm-rtdyld
dsymutil                llvm-cxxfilt        llvm-lto2                llvm-sim

You would think that one of these would have the ability to print the set of detected native CPU features. I haven't figured out if such functionality already exists or not.

@andrewrk
Copy link
Member Author

andrewrk commented Apr 28, 2024

ok, I figured out one way to do it:

andy@bark ~/tmp> cat empty.c 
int main(int argc, char **argv) {
    return 0;
}
andy@bark ~/tmp> ~/local/llvm18-assert/bin/clang -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="znver4" "target-features"="+64bit,+adx,+aes,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+evex512,+f16c,+fma,+fsgsbase,+fxsr,+gfni,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx10.1-256,-avx10.1-512,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-fma4,-hreset,-kl,-lwp,-movdir64b,-movdiri,-pconfig,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-rtm,-serialize,-sgx,-sha512,-sm3,-sm4,-tbm,-tsxldtrk,-uintr,-usermsr,-waitpkg,-widekl,-xop" }

Then compare with zig:

andy@bark ~/tmp> ~/src/zig/build-release/stage4/bin/zig cc -c -emit-llvm empty.c -march=native -S
andy@bark ~/tmp> grep attributes empty.ll
attributes #0 = { noinline nounwind optnone sspstrong uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="4" "target-cpu"="znver3" "target-features"="+64bit,+adx,+aes,+allow-light-256-bit,+avx,+avx2,+avx512bf16,+avx512bitalg,+avx512bw,+avx512cd,+avx512dq,+avx512f,+avx512ifma,+avx512vbmi,+avx512vbmi2,+avx512vl,+avx512vnni,+avx512vpopcntdq,+bmi,+bmi2,+branchfusion,+clflushopt,+clwb,+clzero,+cmov,+crc32,+cx16,+cx8,+f16c,+fast-15bytenop,+fast-bextr,+fast-lzcnt,+fast-movbe,+fast-scalar-fsqrt,+fast-scalar-shift-masks,+fast-variable-perlane-shuffle,+fast-vector-fsqrt,+fma,+fsgsbase,+fsrm,+fxsr,+gfni,+invpcid,+lzcnt,+macrofusion,+mmx,+movbe,+mwaitx,+nopl,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sbb-dep-breaking,+sha,+shstk,+slow-shld,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+vzeroupper,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves,-16bit-mode,-32bit-mode,-3dnow,-3dnowa,-amx-bf16,-amx-complex,-amx-fp16,-amx-int8,-amx-tile,-avx512er,-avx512fp16,-avx512pf,-avx512vp2intersect,-avxifma,-avxneconvert,-avxvnni,-avxvnniint16,-avxvnniint8,-cldemote,-cmpccxadd,-enqcmd,-ermsb,-false-deps-getmant,-false-deps-lzcnt-tzcnt,-false-deps-mulc,-false-deps-mullq,-false-deps-perm,-false-deps-popcnt,-false-deps-range,-fast-11bytenop,-fast-7bytenop,-fast-gather,-fast-hops,-fast-shld-rotate,-fast-variable-crosslane-shuffle,-fast-vector-shift-masks,-faster-shift-than-shuffle,-fma4,-harden-sls-ijmp,-harden-sls-ret,-hreset,-idivl-to-divb,-idivq-to-divl,-kl,-lea-sp,-lea-uses-ag,-lvi-cfi,-lvi-load-hardening,-lwp,-movdir64b,-movdiri,-no-bypass-delay,-no-bypass-delay-blend,-no-bypass-delay-mov,-no-bypass-delay-shuffle,-pad-short-functions,-pconfig,-prefer-128-bit,-prefer-256-bit,-prefer-mask-registers,-prefer-movmsk-over-vtest,-prefetchi,-prefetchwt1,-ptwrite,-raoint,-retpoline,-retpoline-external-thunk,-retpoline-indirect-branches,-retpoline-indirect-calls,-rtm,-serialize,-seses,-sgx,-sha512,-slow-3ops-lea,-slow-incdec,-slow-lea,-slow-pmaddwd,-slow-pmulld,-slow-two-mem-ops,-slow-unaligned-mem-16,-slow-unaligned-mem-32,-sm3,-sm4,-soft-float,-sse-unaligned-mem,-tagged-globals,-tbm,-tsxldtrk,-tuning-fast-imm-vector-shift,-uintr,-use-glm-div-sqrt-costs,-use-slm-arith-costs,-waitpkg,-widekl,-xop" "tune-cpu"="generic" }

@andrewrk
Copy link
Member Author

andrewrk commented Apr 28, 2024

Using this method we see that llvm18 branch gets the CPU wrong ("target-cpu"="znver4" vs "target-cpu"="znver3") as well as coming up with different set of features:

--- 1	2024-04-28 13:42:58.412083487 -0700
+++ 2	2024-04-28 13:43:38.671987175 -0700
@@ -1,6 +1,7 @@
 +64bit
 +adx
 +aes
++allow-light-256-bit
 +avx
 +avx2
 +avx512bf16
@@ -17,6 +18,7 @@
 +avx512vpopcntdq
 +bmi
 +bmi2
++branchfusion
 +clflushopt
 +clwb
 +clzero
@@ -24,17 +26,27 @@
 +crc32
 +cx16
 +cx8
-+evex512
 +f16c
++fast-15bytenop
++fast-bextr
++fast-lzcnt
++fast-movbe
++fast-scalar-fsqrt
++fast-scalar-shift-masks
++fast-variable-perlane-shuffle
++fast-vector-fsqrt
 +fma
 +fsgsbase
++fsrm
 +fxsr
 +gfni
 +invpcid
 +lzcnt
++macrofusion
 +mmx
 +movbe
 +mwaitx
++nopl
 +pclmul
 +pku
 +popcnt
@@ -44,8 +56,10 @@
 +rdrnd
 +rdseed
 +sahf
++sbb-dep-breaking
 +sha
 +shstk
++slow-shld
 +sse
 +sse2
 +sse3
@@ -55,19 +69,22 @@
 +ssse3
 +vaes
 +vpclmulqdq
++vzeroupper
 +wbnoinvd
 +x87
 +xsave
 +xsavec
 +xsaveopt
 +xsaves
+-16bit-mode
+-32bit-mode
+-3dnow
+-3dnowa
 -amx-bf16
 -amx-complex
 -amx-fp16
 -amx-int8
 -amx-tile
--avx10.1-256
--avx10.1-512
 -avx512er
 -avx512fp16
 -avx512pf
@@ -80,27 +97,78 @@
 -cldemote
 -cmpccxadd
 -enqcmd
+-ermsb
+-false-deps-getmant
+-false-deps-lzcnt-tzcnt
+-false-deps-mulc
+-false-deps-mullq
+-false-deps-perm
+-false-deps-popcnt
+-false-deps-range
+-fast-11bytenop
+-fast-7bytenop
+-fast-gather
+-fast-hops
+-fast-shld-rotate
+-fast-variable-crosslane-shuffle
+-fast-vector-shift-masks
+-faster-shift-than-shuffle
 -fma4
+-harden-sls-ijmp
+-harden-sls-ret
 -hreset
+-idivl-to-divb
+-idivq-to-divl
 -kl
+-lea-sp
+-lea-uses-ag
+-lvi-cfi
+-lvi-load-hardening
 -lwp
 -movdir64b
 -movdiri
+-no-bypass-delay
+-no-bypass-delay-blend
+-no-bypass-delay-mov
+-no-bypass-delay-shuffle
+-pad-short-functions
 -pconfig
+-prefer-128-bit
+-prefer-256-bit
+-prefer-mask-registers
+-prefer-movmsk-over-vtest
 -prefetchi
 -prefetchwt1
 -ptwrite
 -raoint
+-retpoline
+-retpoline-external-thunk
+-retpoline-indirect-branches
+-retpoline-indirect-calls
 -rtm
 -serialize
+-seses
 -sgx
 -sha512
+-slow-3ops-lea
+-slow-incdec
+-slow-lea
+-slow-pmaddwd
+-slow-pmulld
+-slow-two-mem-ops
+-slow-unaligned-mem-16
+-slow-unaligned-mem-32
 -sm3
 -sm4
+-soft-float
+-sse-unaligned-mem
+-tagged-globals
 -tbm
 -tsxldtrk
+-tuning-fast-imm-vector-shift
 -uintr
--usermsr
+-use-glm-div-sqrt-costs
+-use-slm-arith-costs
 -waitpkg
 -widekl
 -xop

Notably, evex512 is missing, which was causing llvm/llvm-project#90356

In conclusion, we don't need a tool, but rather need to put this process into the upgrade instructions.

@topperc
Copy link

topperc commented Apr 29, 2024

Is LLVM misidentifying a znver3 as znver4?

@andrewrk
Copy link
Member Author

andrewrk commented Apr 29, 2024

I'm not sure what the correct answer is yet but this host is a AMD Ryzen 9 7950X, and currently zig is identifying it as znver3 while LLVM 18 is identifying it as a znver4. Likely LLVM is correct here since I have not yet touched the CPU detection logic in the llvm18 upgrade branch of Zig.

I'm working on a tool to help identify when Zig and LLVM disagree on the host CPU and its feature set so that we can be sure the detection logic is working correctly.

Edit: looks like the correct answer is Zen 4, so LLVM is indeed correct here.

andrewrk added a commit that referenced this issue Apr 29, 2024
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot
during LLVM upgrades.

closes #19793
@andrewrk
Copy link
Member Author

Alright I've implemented this in the llvm18 branch in dc6cb4c. It uses LLVMGetHostCPUName and LLVMGetHostCPUFeatures and usage looks like this:

zig detect-cpu > ~/tmp/1
zig detect-cpu --llvm > ~/tmp/2
diff -u ~/tmp/{1,2} 
--- /home/andy/tmp/1    2024-04-28 19:56:07.407829265 -0700
+++ /home/andy/tmp/2    2024-04-28 19:56:25.154802676 -0700
@@ -1,4 +1,4 @@
-znver3
+znver4
 -16bit-mode
 -32bit-mode
 -3dnow
@@ -54,7 +54,7 @@
 -egpr
 -enqcmd
 -ermsb
--evex512
++evex512
 +f16c
 -false-deps-getmant
 -false-deps-lzcnt-tzcnt

This makes it clear what is happening: CPU feature detection failed to pick up that it was a znver4, and thereby missed turning on the CPU features enabled for that model, which includes evex512.

andrewrk added a commit that referenced this issue Apr 30, 2024
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot
during LLVM upgrades.

closes #19793
andrewrk added a commit that referenced this issue May 1, 2024
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot
during LLVM upgrades.

closes #19793
andrewrk added a commit that referenced this issue May 3, 2024
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot
during LLVM upgrades.

closes #19793
andrewrk added a commit that referenced this issue May 8, 2024
This brings back `detectNativeCpuWithLLVM` so that we can troubleshoot
during LLVM upgrades.

closes #19793
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-llvm The LLVM backend outputs an LLVM IR Module. enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

2 participants