-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SIMD #3057
Comments
Hi. Not yet, but it's very likely we'll have these in the future |
Changed title because Atomics are already implemented |
Is anyone currently working on this? I would be happy to help if I can. Could you tell me what probably needs to be done and how complex it would be? |
SIMD is supported at the optimization level (usually in loops) by LLVM natively, for explicit "I want to guarantee this will execute the exact SMD calls i specify" support, this will have to be supported in the codegen and language. LLVM represents this using vector types, and support for these will have to be added to crystal for basic mathematical operations on multiple values at once. For more specialized architecture-specific SIMD instructions, you can create a shard wrapping inline assembler calls. |
Thanks for the quick explanation! I'm pretty sure adding SIMD types to the language shouldn't be that hard because their implementation is probably very similar to the ones of the existing primitives (like I have no clue about assembly but I don't think it's a good idea to write native code for a language based on LLVM (which apparently has proper SIMD support). |
After experimenting a bit I realized that LLVM is actually very smart when it comes to SIMD. For example, I can create two vectors of 32 single-precision floats and add them together like this: %vp1 = alloca <32 x i32>
%v1 = load <32 x i32>, <32 x i32>* %vp1
%vp2 = alloca <32 x i32>
%v2 = load <32 x i32>, <32 x i32>* %vp2
%sum = add <32 x i32> %v1, %v2 Of course, such big vectors can't be stored inside any sort of registers. LLVM uses multiple registers to store these (at least I think so): vmovdqa 256(%rsp), %ymm0
vmovdqa 288(%rsp), %ymm1
vmovdqa 320(%rsp), %ymm2
vmovdqa 352(%rsp), %ymm3
vpaddd 160(%rsp), %ymm1, %ymm1
vpaddd 192(%rsp), %ymm2, %ymm2
vpaddd 224(%rsp), %ymm3, %ymm3
vpaddd 128(%rsp), %ymm0, %ymm0 Please don't ask me how any of this works, I just see many different AVX registers being used subsequently. |
If you want, you can try experimenting with this. The implementation seems similar to the one for StaticArray, but instead of llvm array you must use LLVM vector. Searching for StaticArray in the compiler's source code could give you a path for doing it. It also seems something fun to experiment with. |
I actually think vectors could be implemented mostly the same way as integers/floats/booleans. I mean, most IR instructions which work for scalars also work for vectors. For example, you can |
Quick update: I wrote a very early implementation of a |
I'm really not sure how the "front end" of the macro check_int
{% raise "Method #{@def.name}#{@def.args} is only supported for vectors of integers" unless [Int8, Int16, Int32, Int64, Int128, UInt8, UInt16, UInt32, UInt64, UInt128].includes? T %}
end and then call them in methods that should only be available to some vectors. But this doesn't work with |
Maybe make the primitive methods private and add the check for public methods? |
@asterite Thanks, weird thing I didn't come up with this by myself. |
What about specialized structs like Int32.vector(1, 2, 3) => Int32::Vector(3) It would avoid the issue of manually raising (let the compiler do its job) and allow a nicer API for int or float specific calls, for example |
@ysbaddaden I really like that approach. It lets us omit all the "hacky" parts of the aforementioned implementation while also emphasizing the element type. I'll change the implementation to this. |
I need some more help with the implementation. I'm trying to add the vector structs to the built-in types in types["Bool::Vector"] = vec_bool = @vec_bool = BoolVectorType.new self, bool, "Vector", value, ["N"], bool
vec_bool.struct = true
vec_bool.can_be_stored = false This is also goes for the integer and float types. The arguments for the # <top level>
struct SomeVectorElementType
struct Vector(N)
...
end
end But this is not the case. Unless I'm explicitly requiring |
You need to do: types["Bool"].types["Vector"] = vec_bool = ... Types contain other types. Like in code. |
Or just |
Though namespacing things like this makes little sense. Probably In any case, without a concrete use case I don't know why we'd like to eventually merge this. |
Thanks, works fine now. I'd use vectors for CG, but SIMD is mostly useful when performing the same arithmetic on lots of data over and over again. Of course you can cross your fingers and hope the optimizer does its job, but I like to be sure my code is properly optimized. Either way, I think vectors are nice to have just for the sake of convenience (no need to loop over arrays). |
SIMD is useful whenever there are operations to repeat over a set of values. Instead of manually operating on each value, you can use a single instruction to operate on many at once, supposedly speeding things up (at least since SSE). It can be useful in geographical operations (k-means) over 2d or 3d points, to project polygons on a map, matrix transformations, applying transformations to rgba pixels... It can also be used in PRNG and digest calculations, see for example https://github.com/lemire/simdpcg |
FYI, vector arithmetic and conversions are now working fine. I was able to reuse all the codegen logic from the scalar operations. However, boolean vectors don't work at all, every element gets printed as |
I'm not sure how the class Matrix(T, N, M) # T = element type; N = no. of rows; M = no. of columns
@rows : T::Vector(M)[N] # undefined constant T::Vector
@rows : Vector(T, M)[N] # works
end I'd like to hear some thoughts on #3298 (comment) |
@malte-v Given that |
@asterite I don't think having 12 different matrix classes makes sense because no one would ever AND two matrices or use some other int-specific operation. Using macro loops here will produce duplicate code. edit: Wording |
I am curious what is the status on this? |
I wrote a Vector struct to use in raylib once, and i was inspired by how crystal's stdlib for "Complex" numbers changed the Numbers class to allow to write complexs numbers like |
I have been working on this myself to the point where I could port A brief overview:
I would like to hear some initial feedback first; if we agree this is the general direction we want to take, I'll submit an RFC later and then we could sort out all the details. |
I love it, awesome work @HertzDevil 🫶 Starting an RFC sounds like a good idea. |
Hello. Have you support for atomics and SIMD in Crystal language?
The text was updated successfully, but these errors were encountered: