Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support Intel #895

Closed
wants to merge 9 commits into from

Conversation

cromefire
Copy link
Contributor

@cromefire cromefire commented Nov 25, 2023

Realized via Intel MKL (oneAPI)

Fixes #631 (Intel)

Depends on #902

@cromefire
Copy link
Contributor Author

cromefire commented Nov 25, 2023

Current state is that the Intel image builds, just returns 501 on the API call without any logged error at all.
image

The AMD image doesn't quite build yet, it has some sort of error in the linking stage:

25.06   = note: /usr/bin/ld: /root/workspace/target/release/build/llama-cpp-bindings-f80cf7b588122741/out/build/libllama.a(ggml-cuda.cu.o): undefined reference to symbol 'rocblas_initialize'
25.06           /usr/bin/ld: /opt/rocm-5.7.0/lib/librocblas.so.3: error adding symbols: DSO missing from command line
25.06           collect2: error: ld returned 1 exit status

I added the arg, but it doesn't seem to want to apply it:

println!("cargo:rustc-link-arg=-Wl,--copy-dt-needed-entries");

@cromefire cromefire changed the title Add support Intel and AMD hardware feat: Add support Intel and AMD hardware Nov 25, 2023
@cromefire
Copy link
Contributor Author

Okay, gut the 501 under control by adding options for OneAPI and ROCm to it, now I just gotta test it (it runs at least, but not sure whether it's using the GPU) and I somehow need to get the ROCm build under control.

@cromefire
Copy link
Contributor Author

So in theory the intel container should work now... but for some reason it just doesn't want to use the GPU...

@wsxiaoys
Copy link
Member

So in theory the intel container should work now... but for some reason it just doesn't want to use the GPU...

is there sth like NVIDIA container toolkit need to be installed for oneAPI?

@cromefire
Copy link
Contributor Author

cromefire commented Nov 26, 2023

Nope, just pass through /dev/dri (--device /dev/dri) (with that sycl-ls correctly reports the GPU, but llama.cpp doesn't seem to actually use it)

@cromefire
Copy link
Contributor Author

cromefire commented Nov 26, 2023

So the ROCm image is now definitely working. TabbyML/DeepseekCoder-6.7B was just locking up the GPU entierly for some reason, but TabbyML/DeepseekCoder-1.3B runs inference in like 5ms on my RX 7900 XTX.

Edit: Big model works as well, though not as snappy (although the difference isn't too bad? That all definitely needs more investigation). If it works, it works, but especially when switching models the GPU seems like the GPU kinda hangs and I need to reboot. It's also always reported at 100% usage.

@wsxiaoys
Copy link
Member

So the ROCm image is now definitely working. TabbyML/DeepseekCoder-6.7B was just locking up the GPU entierly for some reason, but TabbyML/DeepseekCoder-1.3B runs inference in like 5ms on my RX 7900 XTX.

Great! You might consider extracting ROCm as individual PR for review to get it checked in.

@cromefire
Copy link
Contributor Author

cromefire commented Nov 26, 2023

Yeah I'll see tomorrow whether I can get a handle on oneAPI or whether I'll postpone that and extract ROCm, but it's 4:30 AM for me, so I really need to do that tomorrow (technically later today).

Also I really hate C and it's library linking nonsense... that cost me so much time with this....

@cromefire
Copy link
Contributor Author

Also note to my future self: I need to figure out whats happening with the cuda_devices list and the Frontend and match that for ROCm and oneAPI if possible.

@cromefire
Copy link
Contributor Author

@wsxiaoys also would be great I the intellj extension would be available for Rust Rover, then I could write Tabby code using Tabby. Probably just a setting or so.

@cromefire cromefire changed the title feat: Add support Intel and AMD hardware feat: Add support Intel Nov 26, 2023
@cromefire
Copy link
Contributor Author

cromefire commented Nov 26, 2023

AMD stuff is "moved" to #902, because it already works pretty okay.

Regarding the Intel stuff I'm slowly getting insane, as I had it already "working", but it just doesn't want to actually offload anything to the GPU. and most of the time just doesn't reference SYCL at all. @wsxiaoys Could we get llama.cpp as a shared library or so? That sounds way easier

@wsxiaoys
Copy link
Member

Could we get llama.cpp as a shared library or so? That sounds way easier

To confirm, you've been able to make llama.cpp itself work on Intel Arc, but not for tabby, correct?

@icycodes
Copy link
Member

@wsxiaoys also would be great I the intellj extension would be available for Rust Rover, then I could write Tabby code using Tabby. Probably just a setting or so.

Hi, @cromefire
Thank you for this suggestion. I noticed that the latest Rust Rover preview version is v233, but the Tabby plugin's metadata currently states that it supports versions v222-232. We should update this range in the next release.
If you want to try it out before the next release, you can build the plugin locally and install it from a file. Related: #903

@itlackey
Copy link

itlackey commented Dec 3, 2023

AMD stuff is "moved" to #902, because it already works pretty okay.

Regarding the Intel stuff I'm slowly getting insane, as I had it already "working", but it just doesn't want to actually offload anything to the GPU. and most of the time just doesn't reference SYCL at all. @wsxiaoys Could we get llama.cpp as a shared library or so? That sounds way easier

llama.cpp currently does not use SYCL and the OpenCL implementation uses the CPU for most of the processing. I had taking a run at this a while back and got it working but insanely slow. I have sense found out this is a known issue with llama.cpp. There is currently a PR to get SYCL working correctly.

ggerganov/llama.cpp#2690

There is also a Vulkan support PR
ggerganov/llama.cpp#2059

Unfortunately without one of these being merged into llama.cpp Intel dGPUs are going to be very slow.

@cromefire
Copy link
Contributor Author

cromefire commented Dec 3, 2023

llama.cpp currently does not use SYCL

Well that explains it... Well I'll update and wait then...

Vulkan of course also sounds awesome, if it's pretty close to CUDA/HIP/SYCL, because that seems like it should be the standard backend for something like TabbyML then, because it'd run everywhere.

@itlackey
Copy link

itlackey commented Dec 3, 2023

It would brle great! I am hoping one or the other get merged soon.

BTW here are some things I put together to test llama.cpp on Arc. The logs so the current speeds I am getting.
https://github.com/itlackey/llama.cpp-opencl

@cromefire
Copy link
Contributor Author

cromefire commented Dec 3, 2023

I am hoping one or the other get merged soon.

I do think both would be good, Vulkan makes a nice and easy default backend, but SYCL might be faster.

Vulkan BTW also an easy solution for AMD on Windows.

@cromefire
Copy link
Contributor Author

cromefire commented Dec 4, 2023

The logs so the current speeds I am getting.

Have you tried SYCL vs. Vulkan vs. OpenCL by any chance? (If they actually already run...) Because it sounds like OpenCL is pretty useless right now. Also how did you test, are there any benchmarks available? Would be cool for users, even of something higher level like tabby to know what works best.

@itlackey
Copy link

itlackey commented Dec 4, 2023

I have not, but hope to try the vulkan fork this week. It seems like that branch is more complete than the SYCL.

@cromefire
Copy link
Contributor Author

cromefire commented Dec 4, 2023

I have not, but hope to try the vulkan fork this week. It seems like that branch is more complete than the SYCL.

Be sure to report (also how you did the tests), I'd really like to test Vulkan vs. ROCm on AMD as well (as ROCm doesn't work on Windows (yet)).

# Conflicts:
#	crates/llama-cpp-bindings/Cargo.toml
#	crates/llama-cpp-bindings/build.rs
#	crates/tabby/Cargo.toml
#	crates/tabby/src/main.rs
@wsxiaoys
Copy link
Member

Closing as vulkan support will be released in 0.10

@wsxiaoys wsxiaoys closed this Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intel Arc / XPU support
4 participants