About RepLLaMA #103

sunxiaojie99 · 2024-01-11T06:46:38Z

Hi~I am trying to reproduce the results of RepLLaMA. I have an a800 GPU. If I start training RepLLaMA from scratch with your code, it may take 80 hours? I want to know if this is normal? If possible, I would like to know the time cost when training RepLLaMA (lora) on the msmarco passage and doc datasets? Thank you very much. @MXueguang

MXueguang · 2024-01-11T17:41:22Z

Hi Xiaojie,
I trained repllama (passage) on 16 v100 32g gpu, which took me around 1 day.
I think 80 hours on a single a800 GPU is a reasonable time.
On msmarco-doc, if the max input length is set as 2048, it will take 3 days on 16 gpus.

sunxiaojie99 · 2024-01-12T01:53:57Z

Hi Xueguang, @MXueguang

Thank you very much for sharing your code. However, when I tested it on a small test MSMARCO passage corpus (the first 100 passages), I encountered an issue: after encoding, the embeddings of some passages turned out to be NaN. Have you experienced this problem?

The part of your code that I modified is located here:

tevatron/examples/repllama/utils.py

Line 41 in 2e5d00e

attn_output = xops.memory_efficient_attention(

. I made these changes for two reasons: 1) xformers was not functioning correctly in my environment. If possible, i want to know the reason why you reset the forward function, Is this step necessary? 2) the attention_mask input in the custom_forward function did not seem to be utilized in the subsequent code. Does this mean that the padding positions will still receive attention?

Please forgive my limited experience in this area. Your insights would be greatly appreciated.

Here are the changes I made:

# Original code
        attn_weights = None
        attn_output = xops.memory_efficient_attention(
            query_states.transpose(1, 2), key_states.transpose(1, 2), value_states.transpose(1, 2),
            attn_bias=xops.LowerTriangularMask()
        ).reshape(bsz, q_len, self.hidden_size)

Modified to:

        # Scale queries for dot-product attention
        query_states = query_states / (self.head_dim ** 0.5)

        # Dot-product attention, [bsz, num_heads, q_len, head_dim]*[bsz, num_head, head_dim, q_len]
        attn_scores = torch.matmul(query_states, key_states.transpose(-2, -1))
        
        # Apply lower triangular mask
        if attn_scores.size(1) == attn_scores.size(2):
            # Only square matrices require masking
            mask = torch.tril(torch.ones_like(attn_scores.float())).type_as(attn_scores)
            attn_scores = attn_scores.masked_fill(mask == 0, float('-inf'))
        
        # Apply attention mask
        if attention_mask is not None:
            attn_scores = attn_scores + attention_mask

        attn_probs = softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_probs, value_states)

        attn_output = attn_output.transpose(1, 2).reshape(bsz, q_len, self.hidden_size)

MXueguang · 2024-01-12T02:20:34Z

my transformers version is 4.31.0. I think later version has some issue here
it is ok to remove the flash attention replacement and use default llama class.
I'll update the code to make it fit latest transformers, and I am trying to do a refactor here https://github.com/texttron/tevatron/tree/refactor

btw, repllama code in tevatron is a re-implementation, and due to limited resource I didn't get chance to do very detailed tests. Feel free to let me know any issues there.

sunxiaojie99 · 2024-01-12T03:03:09Z

ok~ so I only need to comment out this line of code replace_with_xformers_attention() in train.py? I will run it again to check if everything is normal. thank you!

MXueguang · 2024-01-12T03:19:32Z

so I only need to comment out this line of code replace_with_xformers_attention() in train.py

yes, in train.py and encode.py

sunxiaojie99 · 2024-01-13T12:04:18Z

Hi Xueguang, I think I've found the issue with the NaN embedding. I've noticed that when we use fp16 during encoding, this problem occurs. However, when we switch to fp32, everything seems fine. By the way, could I ask you to provide the training data (or the co-condenser hard negative) for MSMARCO-passage/doc used in your paper 'Fine-Tuning LLaMA for Multi-Stage Text Retrieval'?

MXueguang · 2024-01-13T21:54:16Z

its a bit weird fp16 not works...the model was finetuned with fp16...I'll take a look.

I created a training data for repllama in tevatron format can be downloaded here
https://www.dropbox.com/scl/fi/pkm1mtgfobae9kuesp7dr/train-tevatron.jsonl?rlkey=2thutc4zkozr9jp4zbbrz5rvi&dl=0

MXueguang · 2024-01-14T03:56:21Z

Hi @sunxiaojie99, are you getting similar training log as #104?

sunxiaojie99 · 2024-01-14T05:08:47Z

Hi @sunxiaojie99, are you getting similar training log as #104?

I just completed the test on the small corpus. I will run the entire process later and then confirm this.

sunxiaojie99 · 2024-01-14T05:18:59Z

its a bit weird fp16 not works...the model was finetuned with fp16...I'll take a look.

I created a training data for repllama in tevatron format can be downloaded here https://www.dropbox.com/scl/fi/pkm1mtgfobae9kuesp7dr/train-tevatron.jsonl?rlkey=2thutc4zkozr9jp4zbbrz5rvi&dl=0

Thanks for sharing! Does this JSON file contain both the MSMARCO passage and document datasets?
By the way, bfp16 is actually used during fine-tuning. When I test using bfp16 during encoding, the NaN issue doesn't appear either. So, I guess the fine-tuning process will run smoothly.

MXueguang · 2024-01-14T05:29:37Z

I train repllama on v100 gpus which only supports fp16. When I add implementation to tevatron I worked on A6000 so bf16 also work. But the released model was trained on fp16. I'll take a look at the NaN issue next week.

The data in above link is the training data for passage ranking.
document data is bigger, I'll upload it later.

sunxiaojie99 · 2024-01-14T05:35:51Z

I train repllama on v100 gpus which only supports fp16. When I add implementation to tevatron I worked on A6000 so bf16 also work. But the released model was trained on fp16. I'll take a look at the NaN issue next week.

The data in above link is the training data for passage ranking. document data is bigger, I'll upload it later.

Okay, I sincerely appreciate your help! Please remind me when the document data is ready.

sunxiaojie99 · 2024-01-23T05:33:36Z

Hi Xueguang,

Sorry to bother you again. I have completed the training process for RepLLaMa. However, it seems that encoding the msmarco passage corpus requires at least 300 hours. I've noticed that Tevatron doesn't support multi-GPU encoding. Could you tell me how long the encoding process took for you? Also, is the document data ready? Haha.

MXueguang · 2024-01-23T05:41:42Z

Hi Xiaojie,

300 hours on single gpu is reasonable.
tevatron dosent support multi-gpu encoding, but a efficient way is to encode the corpus by shard, and run that in parallel.
A example below.

mkdir beir_embedding_scifact
for s in 0 1 2 3;
do
CUDA_VISIBLE_DEVICES=$s python encode.py \
  --output_dir=temp \
  --model_name_or_path castorini/repllama-v1-7b-lora-passage \
  --tokenizer_name meta-llama/Llama-2-7b-hf \
  --fp16 \
  --per_device_eval_batch_size 16 \
  --p_max_len 512 \
  --dataset_name Tevatron/beir-corpus:scifact \
  --encoded_save_path beir_embedding_scifact/corpus_scifact.${s}.pkl \
  --encode_num_shard 4 \
  --encode_shard_index ${s} &
done

oops.. thanks for the reminder...uploading the document data now.

MXueguang · 2024-01-23T07:16:54Z

Hi Xiaojie, the processed training data for document ranking is big and hard to upload.
Below is a slim verision, with processd corpus and training data but need a process to convert to tevatron format.
https://www.dropbox.com/scl/fi/rbxa9u0dusa4g3fh8sn9j/repllama-doc-slim-corpus.jsonl?rlkey=8ddybs8xt8lq723hks0y2uhku&dl=0
https://www.dropbox.com/scl/fi/sz3oqve6tln2hird03cxv/repllama-doc-slim-train.jsonl?rlkey=t1kjx1wdxky4zjo3zglo6yxzq&dl=0

sunxiaojie99 · 2024-01-23T07:26:04Z

Hi Xiaojie, the processed training data for document ranking is big and hard to upload. Below is a slim verision, with processd corpus and training data but need a process to convert to tevatron format. https://www.dropbox.com/scl/fi/rbxa9u0dusa4g3fh8sn9j/repllama-doc-slim-corpus.jsonl?rlkey=8ddybs8xt8lq723hks0y2uhku&dl=0 https://www.dropbox.com/scl/fi/sz3oqve6tln2hird03cxv/repllama-doc-slim-train.jsonl?rlkey=t1kjx1wdxky4zjo3zglo6yxzq&dl=0

Ok, thanks! Actually, I think I only need the CoCondenser-MaxP hard negatives for the document ranking data to reliably reproduce the results of the paper. By the way, is the slim version obtained by sampling a smaller proportion?

MXueguang · 2024-01-23T07:28:17Z

the hard negatives should be top100 bm25 and top 100 cocondenser, but document contents are not saved in the training data. to save the space

sunxiaojie99 · 2024-01-23T07:54:32Z

the hard negatives should be top100 bm25 and top 100 cocondenser, but document contents are not saved in the training data. to save the space

Okay ~ Is it convenient to tell me other parameters, such as the size of p

MXueguang · 2024-01-29T16:57:09Z

Hi @sunxiaojie99 , sorry I missed your latest comment.
what do you mean size of p? the truncation size? for msmarco document, we truncate the document by 10 sentences, with a slide window of 5 sentences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About RepLLaMA #103

About RepLLaMA #103

sunxiaojie99 commented Jan 11, 2024 •

edited

MXueguang commented Jan 11, 2024 •

edited

sunxiaojie99 commented Jan 12, 2024 •

edited

MXueguang commented Jan 12, 2024 •

edited

sunxiaojie99 commented Jan 12, 2024

MXueguang commented Jan 12, 2024 •

edited

sunxiaojie99 commented Jan 13, 2024

MXueguang commented Jan 13, 2024

MXueguang commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

MXueguang commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 23, 2024

MXueguang commented Jan 23, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 23, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 29, 2024

About RepLLaMA #103

About RepLLaMA #103

Comments

sunxiaojie99 commented Jan 11, 2024 • edited

MXueguang commented Jan 11, 2024 • edited

sunxiaojie99 commented Jan 12, 2024 • edited

MXueguang commented Jan 12, 2024 • edited

sunxiaojie99 commented Jan 12, 2024

MXueguang commented Jan 12, 2024 • edited

sunxiaojie99 commented Jan 13, 2024

MXueguang commented Jan 13, 2024

MXueguang commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

MXueguang commented Jan 14, 2024

sunxiaojie99 commented Jan 14, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 23, 2024

MXueguang commented Jan 23, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 23, 2024

sunxiaojie99 commented Jan 23, 2024

MXueguang commented Jan 29, 2024

sunxiaojie99 commented Jan 11, 2024 •

edited

MXueguang commented Jan 11, 2024 •

edited

sunxiaojie99 commented Jan 12, 2024 •

edited

MXueguang commented Jan 12, 2024 •

edited

MXueguang commented Jan 12, 2024 •

edited