Skip to content

Commit

Permalink
ggml : rewrite silu and softmax for cpu
Browse files Browse the repository at this point in the history
This change upstreams llamafile's vectorized expf() functions. This lets
us compute softmax and silu more accurately than the short[65536] lookup
table that GGML previously used to make this operation go faster. We can
support aarch64 and sse2+ with the worst case rounding error of 2 ulp. I
wrote avx2 and avx512 implementations as well but they didn't offer much
advantage compared to sse2+fma to be worth the code complexity.
  • Loading branch information
jart committed May 9, 2024
1 parent f98eb31 commit e8c8fd3
Showing 1 changed file with 157 additions and 193 deletions.

0 comments on commit e8c8fd3

Please sign in to comment.