Skip to content

Releases: NVIDIA/NeMo

NVIDIA Neural Modules 1.16.0

08 Mar 04:35
1631118
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • ASR Evaluator
  • Multi-channel dereverberation algorithm
  • Hybrid ASR-TTS Models
  • Flashlight Decoder Beam Search
  • FastConformer Encoder with 8x subsampling

NeMo TTS

  • SSL Voice Conversion
  • Spectrogram Enhancer
  • VITS

NeMo Megatron

  • Per microbatch dataloader for GPT and BERT
  • Adapters compatible with Faster Transformer

NeMo Core

  • Nested model support

NeMo Tools

  • NeMo Forced Aligner

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:23.01

ASR

Changelog

TTS

Changelog
  • [TTS] Update Spanish TTS model to 1.15 by @rlangman :: PR: #5742
  • [TTS][DE] refine grapheme-based tokenizer and fastpitch training recipe on thorsten's neutral datasets. by @XuesongYang :: PR: #5753
  • No-script TS export, prepared for ONNX export by @borisfom :: PR: #5653
  • Fixing masking in RadTTS bottleneck layer by @borisfom :: PR: #5771
  • Port Riva's mel cepstral distortion w/ dynamic time warping notebook by @redoctopus :: PR: #5778
  • Update radtts' infer path by @blisc :: PR: #5788
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805
  • [TTS] porting VITS implementation by @treacker :: PR: #5600
  • [TTS][DE] updated IPA dictionary and heteronyms by @XuesongYang :: PR: #5860
  • [TTS] GAN-based spectrogram enhancer by @racoiaws :: PR: #5565
  • TTS inference with Heteronym classification model, hc model inference refactoring by @ekmb :: PR: #5768
  • Remove MCD_DTW tarball by @redoctopus :: PR: #5889
  • Hybrid ASR-TTS models by @artbataev :: PR: #5659
  • Moved eval notebook data to aws by @redoctopus :: PR: #5911
  • [G2P] fixed typos and broken import library. by @XuesongYang :: PR: #5978
  • [G2P] backward compatibility for english tokenizer and bugfix by @XuesongYang :: PR: #5980
  • fix links, add missing file by @ekmb :: PR: #6044
  • [TTS] Spectrogram Enhancer: correct dim for length when loading data by @racoiaws :: PR: #6048
  • [TTS] bugfix for fastpitch German tutorial by @XuesongYang :: PR: #6051
  • [TTS] bugfix Chinese Fastpitch tutorial by @XuesongYang :: PR: #6055
  • Fix enhancer usage by @artbataev :: PR: #6059
  • [TTS] Spectrogram Enhancer: support arbitrary input length by @racoiaws :: PR: #6060
  • Fix enhancer usage in ASR-TTS examples by @artbataev :: PR: #6116
  • [TTS] Spectrogram Enhancer: add option to zero out the initial tensor by @racoiaws :: PR: #6136
  • [TTS][DE] Augment tokenization/G2P to preserve capitalization of words and mix phonemes with word-level graphemes for an input text. by @XuesongYang :: PR: #5805

NLP / NMT

Changelog
  • Fix P-Tuning Truncation by @vadam5 :: PR: #5663
  • Adithyare/prompt learning seed by @arendu :: PR: #5749
  • Add extra data args to support proper finetuning of HF converted T5 checkpoints by @MaximumEntropy :: PR: #5719
  • Don't add output directory twice when creating shared sentencepiece tokenizer by @pks :: PR: #5737
  • add constraint info on batch size for tar dataset by @yzhang123 :: PR: #5812
  • remove transformer version upper bound by @Zhilin123 :: PR: #5831
  • Adithyare/adapter new placement by @arendu :: PR: #5791
  • Add SSL import functionality for Audio Lexical PNC Models by @trias702 :: PR: #5834
  • validation batch sizing and drop_last controls by @arendu :: PR: #5830
  • Remove ending newlines when encoding strings w/ sentencepiece tokenizer by @pks :: PR: #5739
  • Fix segmenting for pcla inference by @jubick1337 :: PR: #5849
  • RETRO model finetuning by @yidong72 :: PR: #5800
  • Optimizing distributed Adam when running with one work queue by @timmoon10 :: PR: #5560
  • Add option to disable distributed parameters in distributed Adam optimizer by @timmoon10 :: PR: #5685
  • set max_steps for lr decay through config by @anmolgupt :: PR: #5780
  • Fix Prompt text space issue by @aklife97 :: PR: #5983
  • Add batch_size to prompt_learning generate by @aklife97 :: PR: #6091

NeMo Tools

Changelog

Export

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.15.0

02 Feb 00:49
8c785ec
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • HybridTransducer-CTC ASR
  • Greedy timestamp decoding with inference script
  • MHA adapters
  • Conformer local attention (longformer)
  • High level beam search API
  • Multiblank transducer
  • Multi-channel audio processing model
  • AIstore for ASR datasets

NeMo Megatron

  • ALiBi position embeddings support for T5

NeMo TTS

  • Chinese TTS pipeline with polyphone disambiguation

NeMo Core

  • Optimizer based EMA
  • MLFlow logger support

Models

  • stt_eo_conformer_ctc_large (HF, NGC) Esperanto ASR model.
  • stt_eo_conformer_transducer_large (HF, NGC) Esperanto ASR model.

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.12

ASR

Changelog

TTS

Changelog
  • Add support for MHA adapters to ASR by @titu1994 :: PR: #5396
  • [TTS] fix ranges of char set for accented letters. by @XuesongYang :: PR: #5607
  • [TTS] add type hints and change varialbe names for tokenizers and g2p by @XuesongYang :: PR: #5602
  • Fixed RadTTS unit test by @borisfom :: PR: #5572
  • [TTS][ZH] Disambiguate polyphones with augmented dict and Jieba segmenter for Chinese FastPitch by @yuekaizhang :: PR: #5541
  • Add duration padding support for RADTTS inference by @kevjshih :: PR: #5650
  • [TTS] add tts dict cust notebook by @ekmb :: PR: #5662
  • [TN/TTS docs] TN customization, g2p docs moved to tts by @ekmb :: PR: #5683
  • typo and link fixed by @ekmb :: PR: #5741
  • link fixed by @ekmb :: PR: #5745
  • Update Tacotron2 NGC checkpoint load to latest version by @redoctopus :: PR: #5760
  • Docs g2p update by @ekmb :: PR: #5769
  • [TTS][ZH] bugfix import jieba errors. by @XuesongYang :: PR: #5776

NLP / NMT

Changelog

Export

Changelog
  • Add keep_initializers_as_inputs to _export method by @pks :: PR: #5731
  • Megatron export triton update by @Davood-M :: PR: #5766

General Improvements

Changelog

NVIDIA Neural Modules 1.14.0

24 Dec 02:49
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Hybrid CTC + Transducer loss ASR #5364
  • Sampled Softmax RNNT (Enables large vocab RNNT, for speech translation and multilingual ASR) #5216
  • ASR Adapters hyper parameter search scripts #5159
  • RNNT {ONNX, TorchScript} x GPU export infer #5248
  • Exportable MelSpectrogram (TorchScript) #5512
  • Audio To Audio Dataset Processor #5196
  • Multi Channel Audio Transcription #5479
  • Silence Augmentation #5476

NeMo Megatron

  • Support for the Mixture of Experts for T5
  • Fix PTL model size output for GPT-3 and BERT
  • BERT with Tensor Parallelism & Pipeline Parallel Support

NeMo Core

  • Hydra Multirun core support + NeMo HP optim in YAML #5159

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.11

ASR

Changelog
  • [Tools][ASR] Tool for generating data using simulated RIRs by @anteju :: PR: #5158
  • Modernize RNNT ONNX export and add TS export by @titu1994 :: PR: #5248
  • Add Gradio App to ASR Docs by @titu1994 :: PR: #5270
  • Add support for Sampled Softmax for RNNT Joint by @titu1994 :: PR: #5216
  • Speed up HF data processing script for ASR by @titu1994 :: PR: #5330
  • bugfix in volume loss for CTC models by @bmwshop :: PR: #5348
  • Add cpWER for evaluation of ASR with diarization by @tango4j :: PR: #5279
  • Fix for getting tokenizer in character-based ASR models when using tarred dataset by @jonghwanhyeon :: PR: #5442
  • Refactor/unify ASR offline and buffered inference by @fayejf :: PR: #5440
  • Standalone diarization+ASR evaluation script by @tango4j :: PR: #5439
  • [ASR] Transcribe for multi-channel signals by @anteju :: PR: #5479
  • Add Silence Augmentation by @fayejf :: PR: #5476
  • add exportable mel spec by @1-800-BAD-CODE :: PR: #5512
  • add RNN-T loss implemented by PyTorch and test code by @hainan-xv :: PR: #5312
  • [ASR] AudioToAudio datasets and related test by @anteju :: PR: #5196
  • Add StreamingFeatureBufferer class for real-life streaming decoding by @tango4j :: PR: #5534
  • Pool stats with padding by @1-800-BAD-CODE :: PR: #5403
  • Adding Hybrid RNNT-CTC model by @VahidooX :: PR: #5364
  • Fix ASR Buffered inference scripts by @titu1994 :: PR: #5552
  • Add wer details - insertion, deletion, substitution rate by @fayejf :: PR: #5557
  • Add support for Time Stamp calculation using transcribe_speech.py by @titu1994 :: PR: #5568
  • [STT] Add Esperanto (Eo) ASR Conformer-CTC and Conformer-Transducer models by @andrusenkoau :: PR: #5639

TTS

Changelog
  • [TTS] Fastpitch energy condition and refactoring by @subhankar-ghosh :: PR: #5218
  • [TTS] HiFi-TTS Download Script by @oleksiivolk :: PR: #5241
  • [TTS] Add Mandarin/English Bilingual Recipe for Training Fastpitch Models by @yuekaizhang :: PR: #5208
  • [TTS] fixed type of filepath and rename openslr. by @XuesongYang :: PR: #5276
  • [TTS] replace obsolete torch_tts unit test marker with run_only_on('CPU') by @XuesongYang :: PR: #5307
  • [TTS] bugfix IPAG2P and refactor to remove duplicate process. by @XuesongYang :: PR: #5304
  • Update path to get_data.py in TTS tutorial by @redoctopus :: PR: #5311
  • [TTS] Replace IPA lambda arguments with locale string by @rlangman :: PR: #5298
  • [TTS] expand to support flexible dictionary entry formats in IPAG2P. by @XuesongYang :: PR: #5318
  • [TTS] update organization of model checkpoints and their pointers. by @XuesongYang :: PR: #5327
  • [TTS] bugfix for the script of generating mels from fastpitch. by @XuesongYang :: PR: #5344
  • [TTS] Add Spanish model documentation by @rlangman :: PR: #5390
  • [TTS] Add Spanish FastPitch training configs by @rlangman :: PR: #5383
  • [TTS] replace pitch normalization params with ??? by @XuesongYang :: PR: #5392
  • [TTS] Create script for processing TTS training audio by @rlangman :: PR: #5262
  • [TTS] remove useless logic for set_tokenizer. by @XuesongYang :: PR: #5430
  • [TTS] Fixing RADTTS training - removing view buffer and fixing accuracy issue by @borisfom :: PR: #5358
  • JOC Optimization in FastPitch by @subhankar-ghosh :: PR: #5450
  • [TTS] Support speaker level pitch normalization by @rlangman :: PR: #5455
  • TTS tutorial update: use speaker 9017 instead of 6097 by @redoctopus :: PR: #5532
  • [TTS] Remove unused TTS eval function by @redoctopus :: PR: #5605
  • [TTS][ZH] add fastpitch and hifigan model NGC urls and update NeMo docs. by @XuesongYang :: PR: #5596
  • [TTS][DOC] add notes about automatic conversion to target sampling ra… by @XuesongYang :: PR: #5624
  • [TTS][ZH] bugfix for the tutorial and add NGC CLI installation guide. by @XuesongYang :: PR: #5643
  • [TTS][ZH] bugfix for ngc cli installation. by @XuesongYang :: PR: #5652
  • [TTS][ZH] fix broken link for the script. by @XuesongYang :: PR: #5666

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [ITN] fix year date graph, cardinals extension for hundreds by @ekmb :: PR: #5435
  • [TN] raise NotImplementedError for unsupported languages and other minor fixes by @XuesongYang :: PR: #5414

Export

Changelog

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.13.0

07 Dec 21:14
Compare
Choose a tag to compare

Highlights

NeMo ASR

  • Spoken Language Understanding (SLU) models based on Conformer encoder and transformer decoder
  • Support for codeswitched manifests during training
  • Support for Language ID during inference for ML models
  • Support of cache-aware streaming for offline models
  • Word confidence estimation for CTC & RNNT greedy decoding

NeMo Megatron

  • Interleaved Pipeline schedule
  • Transformer Engine for GPT
  • HF T5v1.1 -> NeMo-Megatron conversion and finetuning/p-tuning
  • IA3 and Adapter Tuning (Tensor + Pipeline Parallel)
  • Pipeline Parallel Support for T5 Prompt Learning
  • MegatronNMT export

NeMo TTS

  • TTS introductory tutorial
  • Phonemizer/espeak removal (Spanish/German)
  • Char-only support for Spanish/German models
  • Documentation Refactor

NeMo Core

  • Upgrade to NGC PyTorch 22.09 container
  • Add pre-commit hooks
  • Exponential moving average (EMA) of weights during training

NeMo Models

Detailed Changelogs

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.09

Known Issues

Issues
  • pytest for RadTTSModel_export_to_torchscript are failing intermittently due to random input values. Fixed in main.

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

NeMo Tools

Changelog

Export

Changelog
  • Fix export bug by @VahidooX :: PR: #5009
  • RADTTS model changes to accommodate export with batch size > 1 by @borisfom :: PR: #4947
  • Support TorchScript export for Squeezeformer by @titu1994 :: PR: #5164
  • Expose keep_initializers_as_inputs to Exportable class by @pks :: PR: #5052
  • Fix the self-attention export bug for cache-aware streaming Conformer by @VahidooX :: PR: #5114
  • replace ColumnParallelLinear with nn.Linear in export_utils by @arendu :: PR: #5217
  • Megatron Export Update by @Davood-M :: PR: #5343
  • Fix Conformer Export in 1.13.0 (cherry-pick from main) by @artbataev :: PR: #5446
  • export_utils bugfix by @Davood-M :: PR: #5480
  • Export fixes for Riva by @borisfom :: PR: #5496

General Improvements and Bugfixes

Changelog
Read more

NVIDIA Neural Modules 1.12.0

10 Oct 22:11
dd9a30f
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.08

ASR

Changelog

TTS

Changelog
  • [TTS] use consistent spline interpolation for fastpitch and hifigan. by @XuesongYang :: PR: #4679
  • TTS tokenizers moved to collections.common.tokenizers by @AlexGrinch :: PR: #4690
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • ARP to IPA mapping, g2p_encode for IPATokenizer by @ekmb :: PR: #4850
  • IPA G2P bugfixes by @redoctopus :: PR: #4869
  • [TTS] add missing WikiHomograph data entries to CMUdict, updates to match new ipa set by @ekmb :: PR: #4886
  • [TTS] fix wrong g2p path. by @XuesongYang :: PR: #4902
  • [TTS] FastPitch training: speed up align_prior_matrix calculation by @racoiaws :: PR: #4718
  • [TTS] fix broken tutorial for MixerTTS. by @XuesongYang :: PR: #4949
  • [TTS] bugfix 'EnglishPhonemesTokenizer' object has no attribute 'encode_from_g2p' by @XuesongYang :: PR: #4992
  • [TTS] added missing German phoneme tokenizer by @XuesongYang :: PR: #5070
  • [TTS] fixed wrong val loss for epoch 0 and inconsistent metrics names by @XuesongYang :: PR: #5087

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TTS] Fix text normalizer bugs in TTS data loader by @rlangman :: PR: #4781
  • [Chinese text normalization]Chinese TN part in text_normalization by @mzxcpp :: PR: #4826
  • Fix zh tn by @yzhang123 :: PR: #5035
  • Bug fixes for parallel mp3 to wav conversion, PC notebook, update Readme for TN requirements by @ekmb :: PR: #5047
  • Added P&C lexical audio model by @jubick1337 :: PR: #4802

Export

Changelog

General Improvements

Changelog

NVIDIA Neural Modules 1.11.0

08 Sep 17:06
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.07

ASR

Changelog
  • Add ASR CTC Decoding module by @titu1994 :: PR: #4342
  • Fixing bugs in calling method ctc_decoder_predictions_tensor. by @VahidooX :: PR: #4414
  • Fixed WER initialization in ASR_with_Nemo notebook by @anteju :: PR: #4523
  • Update signature of Hypothesis alignments by @titu1994 :: PR: #4511
  • Add support for ASR Adapter Auxiliary Losses by @titu1994 :: PR: #4480
  • Catalan ASR NGC Resource by @stevehuang52 :: PR: #4576
  • Add kw asr models, add itn ru checkpoint (tagger-based) by @bene-ges :: PR: #4595
  • Add DALI char dataset support to SSL model by @piraka9011 :: PR: #4592
  • Customize arguments for trimming the leading/trailing silence by @XuesongYang :: PR: #4582
  • Update Offline ASR with CTC Decoding by @titu1994 :: PR: #4608
  • Add Squeezeformer to ASR by @titu1994 :: PR: #4416
  • Fix ASR notebooks by @titu1994 :: PR: #4738
  • Add pretrained ASR models for Croatian by @anteju :: PR: #4682
  • Dataloader, collector, loss and metric for multiscale diarization decoder by @tango4j :: PR: #4187
  • Multilingual VAD model by @fayejf :: PR: #4734
  • Adding support for models trained with full context for cache-aware streaming. by @VahidooX :: PR: #4687
  • Fp16 support for Conformer by @bmwshop :: PR: #4571
  • Tiny VAD refactoring for postprocessing by @fayejf :: PR: #4625
  • Add silence handling for speaker diarization pipeline by @nithinraok :: PR: #4512
  • Add Bucketing support to TarredAudioToClassificationLabelDataset by @entn-at :: PR: #4465

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

Export

Changelog

Bugfixes

Changelog
  • Wrong order of returned tuple for general_collate_fn. by @XuesongYang :: PR: #4388
  • Pitch, voiced_mask, prob_voiced have the same values which is not expected. by @XuesongYang :: PR: #4392
  • Fix tarred dataset len when num shards is not divisible by workers by @itzsimpl :: PR: #4553
  • Fix multiple dev/test datasets after restoring from checkpoint by @PeganovAnton :: PR: #4636
  • Fix/need different cache dirs for different datasets by @PeganovAnton :: PR: #4640
  • Improve mAES algorithm with patches by @titu1994 :: PR: #4662

General Improvements

Changelog
Read more

NVIDIA Neural Modules 1.10.0

01 Jul 22:14
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.05

Known Issues

Issues
  • Tutorial: Fastpitch_Training_GermanTTS.ipynb is experimental and still being tested.

ASR

Changelog

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn tutorial by @yzhang123 :: PR: #4090
  • [TN] WFST to normalize punctuation by @ekmb :: PR: #4108
  • Tn add rules by @yzhang123 :: PR: #4302
  • [TN/TTS] Add graph to tag IPA words/sentences in square brackets and leave them unchanged by @ekmb :: PR: #4323
  • Tn install by @yzhang123 :: PR: #4055
  • Fix electronic bug, new time ITN rule by @ekmb :: PR: #4355
  • [TN] Bug fix: expand serial coverage of unknown symbol, remove constraints from word graph by @ekmb :: PR: #4463
  • Configure T5 finetuning metrics by @MaximumEntropy :: PR: #4122

Export

Changelog

Core

Changelog

General Improvements and Fixes

Changelog
Read more

NVIDIA Neural Modules 1.9.0

03 Jun 20:40
Compare
Choose a tag to compare

Container

For additional information regarding NeMo containers, please visit: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo

docker pull nvcr.io/nvidia/nemo:22.04

ASR

Changelog
  • Fix changed function name in offline vad asr notebeook by @fayejf :: PR: #4007
  • NeMo Adapters Support + ASR Adapters by @titu1994 :: PR: #3942
  • Update ASR configs with num_workers and pin_memory by @titu1994 :: PR: #4270
  • Verbose k2 install, skip if failed by @GNroy :: PR: #4289
  • Torch conversion for VAD-Diarization pipeline by @tango4j :: PR: #3930
  • Multiprocess improvements by @nithinraok :: PR: #4127

TTS

Changelog

NLP / NMT

Changelog

Text Normalization / Inverse Text Normalization

Changelog

NeMo Tools

Changelog
  • Added exception handling for audio player in SDE by @vsl9 :: PR: #4077

NeMo Core

Changelog
  • Support pre-extracted nemo checkpoint for restoration by @titu1994 :: PR: #4061
  • Fix type checking to be compatible with named tuples by @artbataev :: PR: #3986
  • Update num worker calculation due to PTL flag changes by @redoctopus :: PR: #4056
  • Refresh NeMo documentation to Sphinx Book Theme by @titu1994 :: PR: #3996
  • Generalize adapter merge strategy for future adapters by @titu1994 :: PR: #4091

General Improvements

Changelog

NVIDIA Neural Modules 1.8.2

26 Apr 21:29
Compare
Choose a tag to compare

Known Issues

  • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

NVIDIA Neural Modules 1.8.1

22 Apr 05:06
2ef2892
Compare
Choose a tag to compare

Known Issues

  • Megatron BERT export does not currently work in the NVIDIA NGC PyTorch 22.03 container. The issue will be fixed in the NGC PyTorch 22.04 container.

TTS

Hugging Face Hub Integration

Bug Fixes