Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with contrastive loss in pretrain stage #94

Open
tien-ngnvan opened this issue Nov 7, 2023 · 1 comment
Open

Problem with contrastive loss in pretrain stage #94

tien-ngnvan opened this issue Nov 7, 2023 · 1 comment

Comments

@tien-ngnvan
Copy link

tien-ngnvan commented Nov 7, 2023

Thanks for your great work. I meet the problem when using the same hyperparameters in NQ example pre-train on the second stage like coCondenser (we call uptrain stage with contrastive loss). Our template includes 1 query, 1 positive and 10 negative passages with our custom dataloader using a streaming mode dataset (dataset includes two languages with 25M triplet samples), our model based on bert-base-multilingual-cased has been continuing pretrain with MLM loss curve. It seems pre-train on contrastive loss can not be converged, here is the training script

python -m torch.distributed.launch --nproc_per_node=8 -m asymmetric.train \
    --model_name_or_path 'asymmetric/checkpoint-10000' \
    --streaming \
    --output $saved_path \
    --do_train \
    --train_dir 'data/train' \
    --max_steps 10000 \
    --per_device_train_batch_size 32 \
    --dataset_num_proc 2 \
    --train_n_passages 8 \
    --gc_q_chunk_size 8 \
    --gc_p_chunk_size 64 \
    --untie_encoder \
    --negatives_x_device \
    --learning_rate 5e-4 \
    --weight_decay 1e-2 \
    --warmup_ratio 0.1 \
    --save_steps 1000 \
    --save_total_limit 20 \
    --logging_steps 50 \
    --q_max_len 128 \
    --p_max_len 384 \
    --fp16 \
    --report_to 'wandb' \
    --overwrite_output_dir

image

@luyug
Copy link
Contributor

luyug commented Dec 22, 2023

To understand this better, can you elaborate on what hardware is used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants