Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2-7B mobile app crashes on Samsung S23 8GB RAM #3599

Open
salykova opened this issue May 14, 2024 · 8 comments
Open

Llama2-7B mobile app crashes on Samsung S23 8GB RAM #3599

salykova opened this issue May 14, 2024 · 8 comments
Assignees
Labels
Android Android building and execution related. module: extension Related to extension built on top of runtime, e.g. pybindings, data loader, etc. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@salykova
Copy link
Contributor

salykova commented May 14, 2024

Hi all,

I've succefully compiled .pte with default quantization parameters and tokenizer.bin for LLama2-7B according to the tutorial. However, during the inference the android app crashes with no error message (I assume its due to insufficient RAM). Is it currently possible to run 7B models on 8GB RAM phones?

P.S. ~ 4GB RAM out of 8GB is available

@iseeyuan
Copy link
Contributor

@kirklandsign Could you help looking at it?

@iseeyuan
Copy link
Contributor

@salykova could you provide more details on how to reproduce it, including the phone models, the Android version, etc.?

@iseeyuan iseeyuan added module: extension Related to extension built on top of runtime, e.g. pybindings, data loader, etc. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module Android Android building and execution related. labels May 14, 2024
@salykova
Copy link
Contributor Author

salykova commented May 14, 2024

@iseeyuan

Device: Samsung S23, 8GB RAM, Android 14
Pytorch 0.2 stable branch
Java 17 JDK
Android SDK API Level 34
Android NDK 25.0.8775105

Steps to reproduce:

  1. Follow the guide https://github.com/pytorch/executorch/tree/v0.2.0/examples/models/llama2 to create .pte and tokenizer.bin models for llama2-7b-chat (default params, default quant 128 groupwise)
  2. Build android app following https://pytorch.org/executorch/0.2/llm/llama-demo-android.html
  3. The app crashes with no error message immediately after clicking "generate"

@salykova
Copy link
Contributor Author

salykova commented May 15, 2024

@iseeyuan @kirklandsign

I've also tested both LLama2-7B and LLama3-8B models via adb binary-based approach and the inference works. Seems like the problem with the App or it requires much more RAM than adb-binary

@mergennachin
Copy link
Contributor

cc @digantdesai - regarding s23 :)

@salykova salykova changed the title Llama2-7B mobile app crashes on Android with 8GB RAM Llama2-7B mobile app crashes on Samsung S23 8GB RAM May 15, 2024
@kirklandsign
Copy link
Contributor

Hi @salykova the app requires slightly more RAM for Dalvik VM and graphics. However, I think the main issue is for binary, it has a high priority, so when OOM killer kills processes, it's usually killed last. I have seen situations like SystemUI killed before the binary killed.

For the app, it is a normal user app, and usually it's killed before system processes.

Unfortunately I don't have a good solution at the moment. I do see 8GB RAM runs sometimes, and 16GB runs almost all the time.

@salykova
Copy link
Contributor Author

salykova commented May 15, 2024

Hi @kirklandsign thanks for your response! Is it not possible to give the app higher priority? Sorry, I'm not an Android Developer and have no experience with it.

Also, I've found this option in the documentation https://developer.android.com/guide/topics/manifest/application-element.html#largeHeap. Can this increase stability of the app in theory?

@kirklandsign
Copy link
Contributor

Hi @salykova unfortunately I don't have a good way to adjust the priority, especially for non-rooted device. android:largeHeap is for Dalvik heap space, but the RAM consumption happens in native layer so it doesn't help. I also tried with android:persistent but doesn't help either. So I am not sure how to fix this kind of issue at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Android Android building and execution related. module: extension Related to extension built on top of runtime, e.g. pybindings, data loader, etc. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants