Skip to content

Commit

Permalink
Add RNNLM recipe
Browse files Browse the repository at this point in the history
  • Loading branch information
ddwkim committed Apr 18, 2024
1 parent 8db74e9 commit 74eba89
Show file tree
Hide file tree
Showing 3 changed files with 123 additions and 1 deletion.
4 changes: 3 additions & 1 deletion recipes/KsponSpeech/LM/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,15 @@ Also, set data_folder in the yaml file to the result of ksponspeech_prepare.py.
Run the following to start training the language model.

```bash
python train.py hparams/transformer.yaml
python train.py hparams/transformer.yaml # transformerLM
python train.py hparams/RNNLM.yaml # RNNLM
```
# Results

| Release | hyperparams file | eval clean loss | eval other loss | Model link | GPUs |Training time|
|:----:|:----:|:----:|:----:|:----:|:----:|:----:|
|01-23-23|transformer.yaml|4.40|4.67|[Dropbox](https://www.dropbox.com/sh/egv5bdn8b5i45eo/AAB7a8gFt2FqbnO4yhL6DQ8na?dl=0)|1xA100 80GB|17 hours 2 mins|
|04-16-24|RNNLM.yaml|4.59|4.94|[Dropbox](https://www.dropbox.com/sh/egv5bdn8b5i45eo/AAB7a8gFt2FqbnO4yhL6DQ8na?dl=0)|1xA100 80GB|50 mins|

# About SpeechBrain
- Website: https://speechbrain.github.io/
Expand Down
117 changes: 117 additions & 0 deletions recipes/KsponSpeech/LM/hparams/RNNLM.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# ############################################################################
# Model: RNNLM of E2E ASR
# Tokens: unigram
# losses: NLL
# Training: KsponSpeech 965.2h transcript
# Authors: Ju-Chieh Chou 2020, Jianyuan Zhong 2021, Dong Won Kim 2024
# ############################################################################

# Seed needs to be set at top of yaml, before objects with parameters are made
seed: 2226
__set_seed: !apply:torch.manual_seed [!ref <seed>]
output_folder: !ref results/RNN/<seed>
save_folder: !ref <output_folder>/save
train_log: !ref <output_folder>/train_log.txt

# Data files
# The data_folder is needed because we train the LM on the training
# Data files
data_folder: !PLACEHOLDER # e.g., /path/to/KsponSpeech
train_csv: !ref <data_folder>/train.csv
valid_csv: !ref <data_folder>/dev.csv
test_csv:
- !ref <data_folder>/eval_clean.csv
- !ref <data_folder>/eval_other.csv

# Tokenizer model
tokenizer_file: ddwkim/asr-conformer-transducer-rnnlm-ksponspeech/tokenizer.ckpt

####################### Training Parameters ####################################
number_of_epochs: 20
batch_size: 128
lr: 0.001
weight_decay: 0.01
num_workers: 10
grad_accumulation_factor: 1 # Gradient accumulation to simulate large batch training
ckpt_interval_minutes: 15 # save checkpoint every N min

# Dataloader options
train_dataloader_opts:
batch_size: !ref <batch_size>
shuffle: True
num_workers: !ref <num_workers>

valid_dataloader_opts:
batch_size: 1

test_dataloader_opts:
batch_size: 1

####################### Model Parameters #######################################
emb_size: 256
activation: !name:torch.nn.LeakyReLU
dropout: 0.0
rnn_layers: 6
rnn_neurons: 512
dnn_blocks: 1
dnn_neurons: 256

# Outputs
output_neurons: 5000
# blank_index: 0
bos_index: 1
eos_index: 2
# pad_index: 0


# Functions
model: !new:speechbrain.lobes.models.RNNLM.RNNLM
output_neurons: !ref <output_neurons>
embedding_dim: !ref <emb_size>
activation: !ref <activation>
dropout: !ref <dropout>
rnn_layers: !ref <rnn_layers>
rnn_neurons: !ref <rnn_neurons>
dnn_blocks: !ref <dnn_blocks>
dnn_neurons: !ref <dnn_neurons>

modules:
model: !ref <model>

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
checkpoints_dir: !ref <save_folder>
recoverables:
model: !ref <model>
scheduler: !ref <lr_annealing>
counter: !ref <epoch_counter>

log_softmax: !new:speechbrain.nnet.activations.Softmax
apply_log: True

optimizer: !name:torch.optim.AdamW
lr: !ref <lr>
betas: (0.9, 0.998)
eps: 0.000000001
weight_decay: !ref <weight_decay>

lr_annealing: !new:speechbrain.nnet.schedulers.NoamScheduler
lr_initial: !ref <lr>
n_warmup_steps: 2000
model_size: 1

epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter
limit: !ref <number_of_epochs>

compute_cost: !name:speechbrain.nnet.losses.nll_loss

train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
save_file: !ref <train_log>

tokenizer: !new:sentencepiece.SentencePieceProcessor

pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
collect_in: !ref <save_folder>
loadables:
tokenizer: !ref <tokenizer>
paths:
tokenizer: !ref <tokenizer_file>
3 changes: 3 additions & 0 deletions tests/recipes/KsponSpeech.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
Task,Dataset,Script_file,Hparam_file,Data_prep_file,Readme_file,Result_url,HF_repo,test_debug_flags,test_debug_checks,performance
ASR,KsponSpeech,recipes/KsponSpeech/ASR/transformer/train.py,recipes/KsponSpeech/ASR/transformer/hparams/conformer_medium.yaml,recipes/KsponSpeech/ASR/transformer/ksponspeech_prepare.py,recipes/KsponSpeech/ASR/transformer/README.md,https://www.dropbox.com/sh/uibokbz83o8ybv3/AACtO5U7mUbu_XhtcoOphAjza?dl=0,https://huggingface.co/speechbrain/asr-conformer-transformerlm-ksponspeech,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=2 --skip_prep=True --num_encoder_layers=3 --num_decoder_layers=3,"file_exists=[wer_ASR_train.txt,train_log.txt,log.txt,env.log,train.py,hyperparams.yaml,save/tokenizer.ckpt,save/lm.ckpt]",Test-clean-WER=20.78% Test-others-WER=25.73%
ASR,KsponSpeech,recipes/KsponSpeech/ASR/transformer/train.py,recipes/KsponSpeech/ASR/transformer/hparams/conformer_small.yaml,recipes/KsponSpeech/ASR/transformer/ksponspeech_prepare.py,recipes/KsponSpeech/ASR/transformer/README.md,https://www.dropbox.com/sh/uibokbz83o8ybv3/AACtO5U7mUbu_XhtcoOphAjza?dl=0,https://huggingface.co/dddwkim/asr-conformer-small-transformerlm-ksponspeech,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=2 --skip_prep=True --num_encoder_layers=3 --num_decoder_layers=3,"file_exists=[wer_ASR_train.txt,train_log.txt,log.txt,env.log,train.py,hyperparams.yaml,save/tokenizer.ckpt,save/lm.ckpt]",Test-clean-WER=21.78% Test-others-WER=26.48%
ASR,KsponSpeech,recipes/KsponSpeech/ASR/transformer/train.py,recipes/KsponSpeech/ASR/transformer/hparams/brachformer_medium.yaml,recipes/KsponSpeech/ASR/transformer/ksponspeech_prepare.py,recipes/KsponSpeech/ASR/transformer/README.md,https://www.dropbox.com/sh/uibokbz83o8ybv3/AACtO5U7mUbu_XhtcoOphAjza?dl=0,https://huggingface.co/dddwkim/asr-branchformer-transformerlm-ksponspeech,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=2 --skip_prep=True --num_encoder_layers=3 --num_decoder_layers=3,"file_exists=[wer_ASR_train.txt,train_log.txt,log.txt,env.log,train.py,hyperparams.yaml,save/tokenizer.ckpt,save/lm.ckpt]",Test-clean-WER=21.01% Test-others-WER=25.68%
LM,KsponSpeech,recipes/KsponSpeech/LM/train.py,recipes/KsponSpeech/LM/hparams/transformer.yaml,recipes/KsponSpeech/LM/ksponspeech_prepare.py,recipes/KsponSpeech/LM/README.md,https://www.dropbox.com/sh/egv5bdn8b5i45eo/AAB7a8gFt2FqbnO4yhL6DQ8na?dl=0,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=2 --d_model=120 --d_ffn=96,"file_exists=[train_log.txt,log.txt,env.log,train.py,hyperparams.yaml,save/tokenizer.ckpt]",
LM,KsponSpeech,recipes/KsponSpeech/LM/train.py,recipes/KsponSpeech/LM/hparams/RNNLM.yaml,recipes/KsponSpeech/LM/ksponspeech_prepare.py,recipes/KsponSpeech/LM/README.md,https://www.dropbox.com/sh/egv5bdn8b5i45eo/AAB7a8gFt2FqbnO4yhL6DQ8na?dl=0,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=2 --emb_size=48 --rnn_neurons=96 --dnn_neurons=48,"file_exists=[train_log.txt,log.txt,env.log,train.py,hyperparams.yaml,save/tokenizer.ckpt]",
Tokenizer,KsponSpeech,recipes/KsponSpeech/Tokenizer/train.py,recipes/KsponSpeech/Tokenizer/hparams/5K_unigram_subword_bpe.yaml,recipes/KsponSpeech/Tokenizer/ksponspeech_prepare.py,recipes/KsponSpeech/Tokenizer/README.md,https://www.dropbox.com/sh/prnqt09e7xpc1kr/AAB-HkfUazPifn7kXnKnAJSga?dl=0,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --skip_prep=True --token_output=23,"file_exists=[23_unigram.model,23_unigram.vocab,log.txt,ASR_train.txt,env.log,train.py,hyperparams.yaml]",

0 comments on commit 74eba89

Please sign in to comment.