Skip to content

Fix MoE EP rank when TP is set at the same time #1957

Fix MoE EP rank when TP is set at the same time

Fix MoE EP rank when TP is set at the same time #1957

Re-run triggered May 13, 2024 19:48
Status Failure
Total duration 1h 49m 42s
Artifacts

cicd-main.yml

on: pull_request
cicd-cluster-clean
3s
cicd-cluster-clean
gpu-test
3s
gpu-test
cicd-test-container-setup
10m 33s
cicd-test-container-setup
L0_Unit_Tests_GPU
20m 0s
L0_Unit_Tests_GPU
L0_Unit_Tests_CPU
23m 53s
L0_Unit_Tests_CPU
L2_Community_LLM_Checkpoints_tests_Llama
1m 14s
L2_Community_LLM_Checkpoints_tests_Llama
L2_Community_LLM_Checkpoints_tests_StarCoder
1m 2s
L2_Community_LLM_Checkpoints_tests_StarCoder
L2_Community_LLM_Checkpoints_tests_Falcon
38s
L2_Community_LLM_Checkpoints_tests_Falcon
ASR_dev_run_Speech_to_Text
39s
ASR_dev_run_Speech_to_Text
ASR_dev_run_Speech_to_Text_WPE_-_CitriNet
39s
ASR_dev_run_Speech_to_Text_WPE_-_CitriNet
ASR_dev_run_Speech_Pre-training_-_CitriNet
37s
ASR_dev_run_Speech_Pre-training_-_CitriNet
ASR_dev_run_Speech_To_Text_Finetuning
46s
ASR_dev_run_Speech_To_Text_Finetuning
ASR_dev_run_Speech_to_Text_WPE_-_Conformer
36s
ASR_dev_run_Speech_to_Text_WPE_-_Conformer
ASR_dev_run-part_two_Speech_to_Text_WPE_-_Squeezeformer
35s
ASR_dev_run-part_two_Speech_to_Text_WPE_-_Squeezeformer
L2_Speech_to_Text_EMA
59s
L2_Speech_to_Text_EMA
L2_Speaker_dev_run_Speaker_Recognition
33s
L2_Speaker_dev_run_Speaker_Recognition
L2_Speaker_dev_run_Speaker_Diarization
37s
L2_Speaker_dev_run_Speaker_Diarization
L2_Speaker_dev_run_Speech_to_Label
34s
L2_Speaker_dev_run_Speech_to_Label
L2_Speaker_dev_run_Speaker_Diarization_with_ASR_Inference
1m 1s
L2_Speaker_dev_run_Speaker_Diarization_with_ASR_Inference
L2_Speaker_dev_run_Clustering_Diarizer_Inference
32s
L2_Speaker_dev_run_Clustering_Diarizer_Inference
L2_Speaker_dev_run_Neural_Diarizer_Inference
1m 5s
L2_Speaker_dev_run_Neural_Diarizer_Inference
L2_Speaker_dev_run_Multispeaker_ASR_Data_Simulation
31s
L2_Speaker_dev_run_Multispeaker_ASR_Data_Simulation
L2_ASR_Multi-dataloader_dev_run_Speech_to_Text_multi-dataloader
40s
L2_ASR_Multi-dataloader_dev_run_Speech_to_Text_multi-dataloader
L2_ASR_Multi-dataloader_dev_run_Speech_to_Label_multi-dataloader
35s
L2_ASR_Multi-dataloader_dev_run_Speech_to_Label_multi-dataloader
L2_ASR_Adapters_Linear_Adapters
37s
L2_ASR_Adapters_Linear_Adapters
L2_ASR_Adapters_RelPos_MHA_Adapters
38s
L2_ASR_Adapters_RelPos_MHA_Adapters
L2_Speech_Transcription_Speech_to_Text_Transcribe
34s
L2_Speech_Transcription_Speech_to_Text_Transcribe
L2_Transducer_alignment_Running_pytest
1m 26s
L2_Transducer_alignment_Running_pytest
L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav
2m 25s
L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Eng_CitriNet_with_wav
L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Ru_QN_with_mp3
2m 40s
L2_Segmentation_Tool_Parallel_ctc_segmentation_test_L2_Ru_QN_with_mp3
L2_G2P_Models_G2P_Conformer_training_evaluation_and_inference
1m 0s
L2_G2P_Models_G2P_Conformer_training_evaluation_and_inference
L2_G2P_Models_HeteronymClassificationModel_training_evaluation_and_inference
1m 53s
L2_G2P_Models_HeteronymClassificationModel_training_evaluation_and_inference
L2_Dialogue_Classification_Intent_and_slot_classification_using_SGDQA
49s
L2_Dialogue_Classification_Intent_and_slot_classification_using_SGDQA
L2_Dialogue_Classification_Intent_and_slot_classification_using_IntentSlotClassificationModel
1m 34s
L2_Dialogue_Classification_Intent_and_slot_classification_using_IntentSlotClassificationModel
L2_Dialogue_Classification_Intent_classification_using_ZeroShotIntentModel
1m 55s
L2_Dialogue_Classification_Intent_classification_using_ZeroShotIntentModel
L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel
1m 24s
L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel
L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel_BART_Classifier
1m 8s
L2_Dialogue_Classification_Design_Intent_classification_using_ZeroShotIntentModel_BART_Classifier
L2_Dialogue_Classification_Design_Intent_classification_using_DialogueNearestNeighbourModel
38s
L2_Dialogue_Classification_Design_Intent_classification_using_DialogueNearestNeighbourModel
L2_Dialogue_Generation_Dialogue_Answer_Extender_using_DialogueS2SGenerationModel
53s
L2_Dialogue_Generation_Dialogue_Answer_Extender_using_DialogueS2SGenerationModel
L2_Dialogue_Generation_Dialogue_SGD_Based_Answer_Extender_using_DialogueS2SGenerationModel
39s
L2_Dialogue_Generation_Dialogue_SGD_Based_Answer_Extender_using_DialogueS2SGenerationModel
L2_COPY_Dialogue_Answer_Extender_using_DialogueGPTGenerationModel
54s
L2_COPY_Dialogue_Answer_Extender_using_DialogueGPTGenerationModel
L2_Duplex_Text_Normalization_with_Tarred_dataset
1m 29s
L2_Duplex_Text_Normalization_with_Tarred_dataset
L2_BERT_Text_Classification_with_BERT_Test
39s
L2_BERT_Text_Classification_with_BERT_Test
L2_Parallel_BERT_Question-Answering_SQUAD_v1_1
38s
L2_Parallel_BERT_Question-Answering_SQUAD_v1_1
L2_Parallel_BERT_Question-Answering_SQUAD_v2_0
36s
L2_Parallel_BERT_Question-Answering_SQUAD_v2_0
L2_Parallel_BART_Question-Answering_SQUAD_v1_1
42s
L2_Parallel_BART_Question-Answering_SQUAD_v1_1
L2_Parallel_BART_Question-Answering_SQUAD_v2_0
38s
L2_Parallel_BART_Question-Answering_SQUAD_v2_0
L2_Parallel_GPT2_Question-Answering_SQUAD_v1_1
41s
L2_Parallel_GPT2_Question-Answering_SQUAD_v1_1
L2_Parallel_GPT2_Question-Answering_SQUAD_v2_0
37s
L2_Parallel_GPT2_Question-Answering_SQUAD_v2_0
L2_Intent_and_Slot_Classification_Tasks_Intent_and_Slot_Classification
37s
L2_Intent_and_Slot_Classification_Tasks_Intent_and_Slot_Classification
L2_Intent_and_Slot_Classification_Tasks_Multi-Label_Intent_and_Slot_Classification
45s
L2_Intent_and_Slot_Classification_Tasks_Multi-Label_Intent_and_Slot_Classification
L2_Parallel_NLP_Examples2_NER_finetuning_from_pretrained_Test
43s
L2_Parallel_NLP_Examples2_NER_finetuning_from_pretrained_Test
L2_Parallel_NLP_Examples2_Punctuation_and_capitalization_finetuning_from_pretrained_test
43s
L2_Parallel_NLP_Examples2_Punctuation_and_capitalization_finetuning_from_pretrained_test
L2_Parallel_NLP_Examples2_NER_with_TurkuNLP__bert-base-finnish-cased-v1
36s
L2_Parallel_NLP_Examples2_NER_with_TurkuNLP__bert-base-finnish-cased-v1
L2_Parallel_NLP_Examples2_Evaluation_script_for_Token_Classification
40s
L2_Parallel_NLP_Examples2_Evaluation_script_for_Token_Classification
L2_Parallel_NLP_Examples2_Evaluation_script_for_Punctuation
1m 8s
L2_Parallel_NLP_Examples2_Evaluation_script_for_Punctuation
L2_Parallel_NLP_Examples2_Punctuation_Capitalization_2GPUs_with_DistilBERT_Finetuning_on_other_data
1m 46s
L2_Parallel_NLP_Examples2_Punctuation_Capitalization_2GPUs_with_DistilBERT_Finetuning_on_other_data
Punctuation_Capitalization_tarred_dataset_create_and_use_tarred_dataset
2m 1s
Punctuation_Capitalization_tarred_dataset_create_and_use_tarred_dataset
Punctuation_Capitalization_Using_model-common_datasets_parameters-label_vocab_dir
1m 45s
Punctuation_Capitalization_Using_model-common_datasets_parameters-label_vocab_dir
Punctuation_Capitalization_inference_Restore_punctuation_and_capitalization_in_long_text
1m 18s
Punctuation_Capitalization_inference_Restore_punctuation_and_capitalization_in_long_text
L2_Pretraining_BERT_pretraining_from_Text
39s
L2_Pretraining_BERT_pretraining_from_Text
L2_Pretraining_BERT_from_Preprocessed
45s
L2_Pretraining_BERT_from_Preprocessed
L2_Entity_Linking_Self_Alignment_Pretraining_BERT
2m 6s
L2_Entity_Linking_Self_Alignment_Pretraining_BERT
L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Post-LN
1m 7s
L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Post-LN
L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Pre-LN
51s
L2_NMT_Attention_is_All_You_Need_Training_NMT_Training_Pre-LN
L2_NMT_Attention_is_All_You_Need_Training_NMT_Multi-Validation
1m 1s
L2_NMT_Attention_is_All_You_Need_Training_NMT_Multi-Validation
L2_NMT_Attention_is_All_You_Need_Inference
1m 25s
L2_NMT_Attention_is_All_You_Need_Inference
L2_NMT_Attention_is_All_You_Need_Finetuning
1m 9s
L2_NMT_Attention_is_All_You_Need_Finetuning
L2_NMT_Tarred_Dataset_Creation_Auto_Tarred_Dataset_Creation
49s
L2_NMT_Tarred_Dataset_Creation_Auto_Tarred_Dataset_Creation
L2_NMT_Tarred_Dataset_Creation_Script_Tarred_Dataset_Creation
43s
L2_NMT_Tarred_Dataset_Creation_Script_Tarred_Dataset_Creation
L2_Megatron_NMT_Training_TP2
3m 54s
L2_Megatron_NMT_Training_TP2
L2_Megatron_BART_Perceiver_MIM_Training_TP2
1m 52s
L2_Megatron_BART_Perceiver_MIM_Training_TP2
L2_Megatron_Bert_Pretraining_and_Resume_Training_with_Pipeline_Parallelism
1m 47s
L2_Megatron_Bert_Pretraining_and_Resume_Training_with_Pipeline_Parallelism
L2_Megatron_Bert_Pretraining_and_Resume_Training
2m 1s
L2_Megatron_Bert_Pretraining_and_Resume_Training
L2_Megatron_Core_Bert_Pretraining_and_Resume_Training
2m 19s
L2_Megatron_Core_Bert_Pretraining_and_Resume_Training
L2_Megatron_RETRO_Pretraining_and_Resume_Training
7m 10s
L2_Megatron_RETRO_Pretraining_and_Resume_Training
L2_Legacy_Megatron_RETRO_Pretraining_and_Resume_Training
1m 37s
L2_Legacy_Megatron_RETRO_Pretraining_and_Resume_Training
L2_BioMegatron_Bert_NER_Task
1m 55s
L2_BioMegatron_Bert_NER_Task
L2_Megatron_GPT_Pretraining_and_Resume_Training_TP2
3m 57s
L2_Megatron_GPT_Pretraining_and_Resume_Training_TP2
L2_Megatron_GPT_with_Rope_Pretraining_and_Resume_Training_TP2
2m 10s
L2_Megatron_GPT_with_Rope_Pretraining_and_Resume_Training_TP2
L2_Megatron_GPT_with_ALiBi_Pretraining_and_Resume_Training_TP2
2m 24s
L2_Megatron_GPT_with_ALiBi_Pretraining_and_Resume_Training_TP2
L2_Megatron_GPT_with_KERPLE_Pretraining_and_Resume_Training_TP2
2m 5s
L2_Megatron_GPT_with_KERPLE_Pretraining_and_Resume_Training_TP2
L2_Megatron_GPT_Pretraining_and_Resume_Training_PP2
4m 9s
L2_Megatron_GPT_Pretraining_and_Resume_Training_PP2
L2_Megatron_GPT_Finetuning_PP2
3m 28s
L2_Megatron_GPT_Finetuning_PP2
L2_Megatron_GPT_Finetuning_StarCoder_PP1
49s
L2_Megatron_GPT_Finetuning_StarCoder_PP1
L2_Megatron_GPT_Embedding
1m 34s
L2_Megatron_GPT_Embedding
L2_Megatron_GPT_PEFT_Lora_PP2
2m 43s
L2_Megatron_GPT_PEFT_Lora_PP2
L2_Megatron_GPT_PEFT_Lora_TP2
2m 0s
L2_Megatron_GPT_PEFT_Lora_TP2
L2_Megatron_GPT_Eval
49s
L2_Megatron_GPT_Eval
L2_Megatron_GPT_Eval_PP2
1m 30s
L2_Megatron_GPT_Eval_PP2
L2_Megatron_GPT_SFT_Eval_inference_seq_len_greaterThan_training_seq_len
1m 15s
L2_Megatron_GPT_SFT_Eval_inference_seq_len_greaterThan_training_seq_len
L2_Megatron_Change_Partitions_Reduce_TP_Num_Partitions_-2_to_1-_and_PP_Num_Partitions_-1_to_2
36s
L2_Megatron_Change_Partitions_Reduce_TP_Num_Partitions_-2_to_1-_and_PP_Num_Partitions_-1_to_2
L2_Megatron_Change_Partitions_Increase_TP_Num_Partitions_-2_to_4-_and_PP_Num_Partitions_-1_to_2
1m 9s
L2_Megatron_Change_Partitions_Increase_TP_Num_Partitions_-2_to_4-_and_PP_Num_Partitions_-1_to_2
L2_Megatron_T5_Pretraining_and_Resume_Training_TP2
1m 39s
L2_Megatron_T5_Pretraining_and_Resume_Training_TP2
L2_Megatron_T5_with_ALiBi_Pretraining_and_Resume_Training_TP2
1m 53s
L2_Megatron_T5_with_ALiBi_Pretraining_and_Resume_Training_TP2
L2_Megatron_T5_with_KERPLE_Pretraining_and_Resume_Training_TP2
1m 38s
L2_Megatron_T5_with_KERPLE_Pretraining_and_Resume_Training_TP2
L2_Megatron_T5_Pretraining_and_Resume_Training_PP2
2m 30s
L2_Megatron_T5_Pretraining_and_Resume_Training_PP2
L2_Megatron_T5_w_Mixture_of_Expert_Pretraining
1m 27s
L2_Megatron_T5_w_Mixture_of_Expert_Pretraining
L2_Megatron_UL2_Pretraining_and_Resume_Training_TP2
2m 30s
L2_Megatron_UL2_Pretraining_and_Resume_Training_TP2
L2_Megatron_T5_Eval
29s
L2_Megatron_T5_Eval
L2_Megatron_BART_Pretraining_and_Resume_Training_TP2
1m 36s
L2_Megatron_BART_Pretraining_and_Resume_Training_TP2
L2_Megatron_BART_Pretraining_and_Resume_Training_PP2
2m 30s
L2_Megatron_BART_Pretraining_and_Resume_Training_PP2
L2_Megatron_T5_GLUE_RTE
44s
L2_Megatron_T5_GLUE_RTE
L2_Megatron_T5_GLUE_XNLI
42s
L2_Megatron_T5_GLUE_XNLI
L2_Megatron_T5_PEFT_Lora_TP2
1m 47s
L2_Megatron_T5_PEFT_Lora_TP2
L2_Megatron_Mock_Data_Generation_MockGPTDataset
2m 34s
L2_Megatron_Mock_Data_Generation_MockGPTDataset
L2_Megatron_Mock_Data_Generation_MockT5Dataset
39s
L2_Megatron_Mock_Data_Generation_MockT5Dataset
L2_TTS_Fast_dev_runs_1_Tacotron_2
56s
L2_TTS_Fast_dev_runs_1_Tacotron_2
L2_TTS_Fast_dev_runs_1_WaveGlow
37s
L2_TTS_Fast_dev_runs_1_WaveGlow
L2_TTS_Fast_dev_runs_1_FastPitch
55s
L2_TTS_Fast_dev_runs_1_FastPitch
L2_TTS_Fast_dev_runs_1_Mixer-TTS
1m 0s
L2_TTS_Fast_dev_runs_1_Mixer-TTS
L2_TTS_Fast_dev_runs_1_Hifigan
40s
L2_TTS_Fast_dev_runs_1_Hifigan
Speech_Checkpoints_tests
10m 3s
Speech_Checkpoints_tests
L0_Setup_Test_Data_And_Models
13s
L0_Setup_Test_Data_And_Models
L2_Community_LLM_Checkpoints_tests_Llama3
59s
L2_Community_LLM_Checkpoints_tests_Llama3
L2_PTQ_Llama2_Export_Only
1m 2s
L2_PTQ_Llama2_Export_Only
OPTIONAL_ASR_dev_run_Speech_To_Text_HF_Finetuning
1m 18s
OPTIONAL_ASR_dev_run_Speech_To_Text_HF_Finetuning
Nemo_CICD_Test
0s
Nemo_CICD_Test
Fit to window
Zoom out
Zoom in

Annotations

2 errors and 2 warnings
Speech_Checkpoints_tests
The job running on runner azure-gpu-vm-runner4 has exceeded the maximum execution time of 10 minutes.
Speech_Checkpoints_tests
The operation was canceled.
L2_Community_LLM_Checkpoints_tests_Llama3
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v2. For more information see: https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
L2_Community_LLM_Checkpoints_tests_Llama3
The following actions uses node12 which is deprecated and will be forced to run on node16: actions/checkout@v2. For more info: https://github.blog/changelog/2023-06-13-github-actions-all-actions-will-run-on-node16-instead-of-node12-by-default/