-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stupid question, any way to avoid running out of memory? #22
Comments
As far as I remember, Apple Silicon Macs have the memory shared between CPU and GPU, and that's why you can do a lot of stuff also with 8GB Macs. I don't know about memory swap on MLX, but I guess, given the architecture of Apple Silicon, that it is not the case. What Mac are you using? RAM is very important |
I've got an M1 with 32G of RAM, but I guess the context length is so long that even that isn't enough, at least when using MLX. |
If it put an input of 17,000 tokens into
model.generate(x, temperature)
I getlibc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 19081554496 bytes which is greater than the maximum allowed buffer size of 17179869184 bytes.
I guess it is trying to use the mac GPU? Or if regular memory, it can't swap? I can run this Llama 3 8b instruct with regular Transformers, it is just really slow.
There's no flag for
use_swap=True
or anything like that, right?The text was updated successfully, but these errors were encountered: