Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Value Error Occured in training dataset with Jasper Model #145

Open
secrisa11 opened this issue Jul 8, 2021 · 2 comments
Open

Value Error Occured in training dataset with Jasper Model #145

secrisa11 opened this issue Jul 8, 2021 · 2 comments
Assignees

Comments

@secrisa11
Copy link

secrisa11 commented Jul 8, 2021

python ./bin/main.py model=jasper train=jasper_train train.dataset_path=$DATASET_PATH train.transcripts_path=$TRANSCRIPTS_PATH
./bin/main.py:175: UserWarning:
'audio/fbank' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
main()
[2021-07-09 15:36:37,587][kospeech.utils][INFO] - audio:
audio_extension: pcm
sample_rate: 16000
frame_length: 20
frame_shift: 10
normalize: true
del_silence: true
feature_extract_by: kaldi
time_mask_num: 4
freq_mask_num: 2
spec_augment: true
input_reverse: false
transform_method: fbank
n_mels: 80
freq_mask_para: 18
model:
architecture: jasper
teacher_forcing_ratio: 1.0
teacher_forcing_step: 0.01
min_teacher_forcing_ratio: 0.9
dropout: 0.3
bidirectional: false
joint_ctc_attention: false
max_len: 400
version: 10x5
train:
dataset: kspon
dataset_path: /home/suresoft/KoSpeech-1.3/dataset/kspon
transcripts_path: /home/suresoft/KoSpeech-1.3/dataset/kspon/transcripts.txt
output_unit: character
batch_size: 32
save_result_every: 1000
checkpoint_every: 5000
print_every: 10
mode: train
num_workers: 4
use_cuda: true
init_lr_scale: 0.01
final_lr_scale: 0.05
max_grad_norm: 400
weight_decay: 0.001
seed: 777
resume: false
optimizer: novograd
reduction: sum
init_lr: 0.001
final_lr: 0.0001
peak_lr: 0.001
warmup_steps: 0
num_epochs: 10
lr_scheduler: tri_stage_lr_scheduler

[2021-07-09 15:36:37,755][kospeech.utils][INFO] - Operating System : Linux 4.9.201-tegra
[2021-07-09 15:36:37,756][kospeech.utils][INFO] - Processor : aarch64
[2021-07-09 15:36:37,797][kospeech.utils][INFO] - device : NVIDIA Tegra X1
[2021-07-09 15:36:37,797][kospeech.utils][INFO] - CUDA is available : True
[2021-07-09 15:36:37,798][kospeech.utils][INFO] - CUDA version : 10.2
[2021-07-09 15:36:37,798][kospeech.utils][INFO] - PyTorch version : 1.6.0
[2021-07-09 15:36:37,827][kospeech.utils][INFO] - split dataset start !!
[2021-07-09 15:36:41,561][kospeech.utils][INFO] - Applying Spec Augmentation...
[2021-07-09 15:36:45,168][kospeech.utils][INFO] - Applying Spec Augmentation...
Error executing job with overrides: ['model=jasper', 'train=jasper_train', 'train.dataset_path=/home/suresoft/KoSpeech-1.3/dataset/kspon', 'train.transcripts_path=/home/suresoft/KoSpeech-1.3/dataset/kspon/transcripts.txt']
Traceback (most recent call last):
File "./bin/main.py", line 175, in
main()
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/main.py", line 53, in decorated_main
config_name=config_name,
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/_internal/utils.py", line 368, in _run_hydra
lambda: hydra.run(
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
raise ex
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
return func()
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/_internal/utils.py", line 371, in
overrides=args.overrides,
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/_internal/hydra.py", line 110, in run
_ = ret.return_value
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/core/utils.py", line 233, in return_value
raise self._return_value
File "/home/suresoft/miniforge3/envs/KoSpeech_Py36/lib/python3.6/site-packages/hydra/core/utils.py", line 160, in run_job
ret.return_value = task_function(task_cfg)
File "./bin/main.py", line 170, in main
last_model_checkpoint = train(config)
File "./bin/main.py", line 99, in train
epoch_time_step, trainset_list, validset = split_dataset(config, config.train.transcripts_path, vocab)
File "/home/suresoft/KoSpeech-1.3/kospeech/data/data_loader.py", line 306, in split_dataset
audio_extension=config.audio.audio_extension
File "/home/suresoft/KoSpeech-1.3/kospeech/data/data_loader.py", line 67, in init
self.shuffle()
File "/home/suresoft/KoSpeech-1.3/kospeech/data/data_loader.py", line 102, in shuffle
self.audio_paths, self.transcripts, self.augment_methods = zip(*tmp)
ValueError: not enough values to unpack (expected 3, got 0)

@sooftware
Copy link
Owner

Hi @Daeyeop-Kim. This repository is archived. Further development is underway here.

@pvodopija
Copy link

Had the same problem and managed to fix it like this.
Go to file kospeech/data/data_loader.py and in function shuffle change the following:

def shuffle(self):
        """ Shuffle dataset """
        tmp = list( zip( self.audio_paths, self.transcripts, self.augment_methods ) )
        random.shuffle( tmp )

        # This kinda works.
        for i, x in enumerate( tmp ):
            self.audio_paths[i] = x[0]
            self.transcripts[i] = x[1]
            self.augment_methods[i] = x[2]
        
        # This doesn't work.
        # self.audio_paths, self.transcripts, self.augment_methods = zip( *tmp )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants