Skip to content

33b Model on 4xA100 (40GB) OOM #666

Answered by psinger
AZ777xx asked this question in Q&A
Discussion options

You must be logged in to vote

I have never seen any real performance degredation doing lora in int4. The final weights will be merged back and you can put the model into production in any precision.

Deepspeed has issues with generation inference, I would recommend switching the metric to Perplexity, which will do raw logit evaluation. This should speed up the validation speed significantly.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@AZ777xx
Comment options

@psinger
Comment options

Answer selected by AZ777xx
@AZ777xx
Comment options

@psinger
Comment options

@AZ777xx
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants