LLaMA 2 support for pre-training #77

philschmid · 2023-07-19T06:07:00Z

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

juliensalinas · 2023-07-19T07:42:29Z

That would be awesome 🥇

philschmid · 2023-07-19T08:14:05Z

Hello,

Are you planning to add support for LLaMA 2 to further pretrain the models?

I know 7B and 13B should have the same architecture, would be good if you can confirm that it works. Also if there are plans for the 70B (GQA).

windmaple · 2023-07-20T03:25:57Z

+1

young-geng · 2023-07-22T01:53:48Z

Indeed this would be useful. Let me look into that.

erfanzar · 2023-08-12T11:30:43Z

I have implemented a version of that but I haven't checked that yet I used the same architecture as EasyLM in some parts
https://github.com/erfanzar/EasyDeL/blob/main/EasyDel/modules/llama/modelling_llama_flax.py

iliemihai · 2023-08-17T16:12:34Z

Has anyone tried implementing further pre-training in Flax/JAX to run it on TPU ?

Taekyoon mentioned this issue Sep 6, 2023

Are you guys planning to implement GQA? #93

Closed

Provide feedback