Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sph methods slow? #191

Open
AshtonSBradley opened this issue Jan 29, 2023 · 4 comments
Open

sph methods slow? #191

AshtonSBradley opened this issue Jan 29, 2023 · 4 comments

Comments

@AshtonSBradley
Copy link

Maybe this is totally expected, but:

  • A lot slower than FFT. It has been a while, but I seem to recall (going back to 2018) that the first version was much more comparable to FFT
  • Allocations that don't go away with in-place (hard to make an efficient time evolution algorithm in this case)
  • Which method is the bottleneck depends on size
using FastTransforms
using FFTW
using LinearAlgebra

n,m = 1024,2047

x = randn(ComplexF64,n,m)
# FFTW
@time fft(x);
  0.033259 seconds (259 allocations: 32.001 MiB)
P! = plan_fft!(x);
@time P!*x; 
  0.035154 seconds (232 allocations: 14.969 KiB)

# Spherical 
P = plan_sph2fourier(x)
PS = plan_sph_synthesis(x)

@time PS*(P*x);
  0.235185 seconds (549 allocations: 127.972 MiB, 1.57% gc time)
@time P*x; # slow
  0.145820 seconds (8 allocations: 63.969 MiB, 1.85% gc time)
@time PS*x;
  0.051036 seconds (545 allocations: 64.003 MiB)

@time lmul!(P,x);  # slow
  0.139856 seconds (4 allocations: 31.984 MiB, 0.91% gc time)
@time lmul!(PS,x);
  0.047418 seconds (539 allocations: 32.018 MiB)

## small array
n,m = 60,121
x = randn(ComplexF64,n,m);

# FFTW
@time fft(x);
  0.000560 seconds (242 allocations: 129.836 KiB)
P! = plan_fft!(x);
@time P!*x; 
  0.000411 seconds (218 allocations: 14.492 KiB)

# Spherical 
P = plan_sph2fourier(x);
PS = plan_sph_synthesis(x);

@time PS*(P*x);
  0.101827 seconds (19.31 k allocations: 1.626 MiB)

@time P*x; 
  0.000266 seconds (8 allocations: 227.406 KiB)
@time PS*x;  # slow
  0.031964 seconds (20.48 k allocations: 1.440 MiB)

@time lmul!(P,x);  
  0.000364 seconds (4 allocations: 113.594 KiB)
@time lmul!(PS,x);  # slow
  0.037324 seconds (20.38 k allocations: 1.326 MiB)
@AshtonSBradley
Copy link
Author

julia> versioninfo()
Julia Version 1.9.0-beta3
Commit 24204a73447 (2023-01-18 07:20 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.4.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_PKG_DEVDIR = /Users/abradley/Dropbox/Julia/Dev
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code
  JULIA_PKG_SERVER = us-west.pkg.julialang.org

@MikaelSlevinsky
Copy link
Member

I'd have to check later, but my first guesses are that the allocations come from using a complex array. (You can also use complex spherical harmonics as spinsph with spin 0 which are designed for complex arrays from the beginning.)

The architecture of M1 macs is different from the Intel macs so they use a different binary with different levels of optimization. You can double check it's using all cores, and can also try compiling from source with different optimization flags, but that's about it for now I think.

@AshtonSBradley
Copy link
Author

AshtonSBradley commented Jan 29, 2023

Thanks for suggesting spinsph which fixes allocations nicely.

This is what I get on an intel mac. For large arrays its similar, roughly 4-5 times longer than FFTW

However, the small array handling is a lot better, closer to factor of 2

using FastTransforms
using FFTW
using LinearAlgebra

n,m = 1024,2047
x = randn(ComplexF64,n,m)
## FFTW
@time fft(x);
P! = plan_fft!(x);
@time P!*x;         # 0.031843 seconds (231 allocations: 14.219 KiB)

## Spherical 
P = plan_sph2fourier(x) 
PS = plan_sph_synthesis(x)

@time PS*(P*x);     # 0.165899 seconds (557 allocations: 127.970 MiB, 6.21% gc time)
@time P*x;          # slow 0.114058 seconds (8 allocations: 63.969 MiB, 5.08% gc time)
@time PS*x;         #  0.056220 seconds (545 allocations: 64.001 MiB, 11.33% gc time)

@time lmul!(P,x);   # slow 0.094217 seconds (4 allocations: 31.984 MiB)
@time lmul!(PS,x);  # 0.045611 seconds (535 allocations: 32.016 MiB)

## SpinSph
P = plan_spinsph2fourier(x,0)
PS = plan_spinsph_synthesis(x,0)
@time PS*(P*x);     # 0.175541 seconds (545 allocations: 64.001 MiB, 3.77% gc time)
@time P*x;          # slow 0.129498 seconds (2 allocations: 31.984 MiB)
@time PS*x;         #  0.039490 seconds (529 allocations: 32.016 MiB)

@time lmul!(P,x);   # slow 0.127229 seconds
@time lmul!(PS,x);  # 0.033086 seconds (538 allocations: 32.344 KiB)

n,m = 60,121
x = randn(ComplexF64,n,m);

## FFTW
@time fft(x);
P! = plan_fft!(x);
@time P!*x;         # 0.000676 seconds (219 allocations: 13.875 KiB)

## Spherical 
P = plan_sph2fourier(x);
PS = plan_sph_synthesis(x);

@time PS*(P*x);     # 0.013577 seconds (20.43 k allocations: 1.563 MiB)

@time P*x;          # 0.000360 seconds (8 allocations: 227.406 KiB)
@time PS*x;         # slow 0.013644 seconds (20.39 k allocations: 1.340 MiB)

@time lmul!(P,x);   # 0.000267 seconds (4 allocations: 113.594 KiB)
@time lmul!(PS,x);  # 0.011682 seconds (20.47 k allocations: 1.232 MiB)

## SpinSph
P = plan_spinsph2fourier(x,0)
PS = plan_spinsph_synthesis(x,0)
@time PS*(P*x);     # 0.001753 seconds (439 allocations: 252.734 KiB)
@time P*x;          # 0.000314 seconds (2 allocations: 113.484 KiB)
@time PS*x;         # 0.001559 seconds (423 allocations: 138.812 KiB)

@time lmul!(P,x);   # 0.000212 seconds
@time lmul!(PS,x);  # 0.001416 seconds (428 allocations: 25.547 KiB)

@AshtonSBradley
Copy link
Author

julia> versioninfo()
Julia Version 1.9.0-beta3
Commit 24204a73447 (2023-01-18 07:20 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin21.4.0)
CPU: 16 × Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 8 on 8 virtual cores
Environment:
JULIA_NUM_THREADS = 8
JULIA_PKG_DEVDIR = /Users/abradley/Dropbox/Julia/Dev
JULIA_EDITOR = code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants