Skip to content

This repo provides the pretrained DeepSpeech model in MATLAB. The model is compatible with transfer learning and C/C++ code generation.

License

Notifications You must be signed in to change notification settings

matlab-deep-learning/deepspeech

Repository files navigation

deepspeech

Speech-to-Text Transcription Using Deep Speech

This repo enables you to load a pretrained Deep Speech model into MATLAB® and perform speech-to-text transcription [1].

speech2text image

Creator: MathWorks® Development

Requirements

To accelerate transcription, a GPU and the following toolbox is recommended:

To evaluate word error rate (WER), the following toolbox is recommended:

Get Started

Download or clone this repositiory to your machine and open it in MATLAB®.

Run deepspeech_inference.mlx to perform speech-to-text conversion on a specified audio file. The script plays the audio file to your default sound card and returns the text.

Run deepspeech_streaming.mlx to perform speech-to-text conversion on streaming audio input.

Run deepspeech_deployment.mlx to generate plain C code from the speech-to-text system. The script generates a MEX file which you can run from MATLAB to verify results.

Run deepspeech_transferlearning.mlx to learn how to retrain weights on the network so that speech-to-text performance is optimized for your needs.

The following files are included in the repo:

Building blocks:

  • deepspeechFeatures.m - Extract features for DeepSpeech network
  • deepspeechBuffer.m - Buffer features for DeepSpeech network
  • deepspeech.m - Load DeepSpeech speech-to-text network
  • deepspeechPostprocess.m - Decode output from DeepSpeech network

All-in-one:

  • deepspeech2text.m - Transcribe speech to text using DeepSpeech

inference image

  • deepspeech2text_stream.m - Transcribe streaming speech to text using DeepSpeech

streaming inference image

Network Details

The model provided in this example corresponds to the pretrained Deep Speech model provided by [2]. The model was trained using the Fisher, LibriSpeech, Switchboard, and Common Voice English datasets, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

network image

Metrics and Evaluation

Accuracy Metrics

[2] reports a 5.97% word error rate (WER) on the LibriSpeech clean test set. The WER corresponds to the acoustic model (included in this repo) and a language model, which is not included in this repo.

Size

The total size of the model is 167 MB.

License

The license is available in the License.txt file in this repository.

References

[1] Hannun, A. "DeepSpeech: Scaling up end-to-end speech recognition", 2014.

[2] https://github.com/mozilla/DeepSpeech/releases/tag/v0.7.1

Copyright 2022 The MathWorks, Inc.

About

This repo provides the pretrained DeepSpeech model in MATLAB. The model is compatible with transfer learning and C/C++ code generation.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages