deepspeech

Speech-to-Text Transcription Using Deep Speech

This repo enables you to load a pretrained Deep Speech model into MATLAB® and perform speech-to-text transcription [1].

Creator: MathWorks® Development

Requirements

To accelerate transcription, a GPU and the following toolbox is recommended:

Parallel Computing Toolbox™

To evaluate word error rate (WER), the following toolbox is recommended:

Text Analytics Toolbox™

Get Started

Download or clone this repositiory to your machine and open it in MATLAB®.

Run deepspeech_inference.mlx to perform speech-to-text conversion on a specified audio file. The script plays the audio file to your default sound card and returns the text.

Run deepspeech_streaming.mlx to perform speech-to-text conversion on streaming audio input.

Run deepspeech_deployment.mlx to generate plain C code from the speech-to-text system. The script generates a MEX file which you can run from MATLAB to verify results.

Run deepspeech_transferlearning.mlx to learn how to retrain weights on the network so that speech-to-text performance is optimized for your needs.

The following files are included in the repo:

Building blocks:

deepspeechFeatures.m - Extract features for DeepSpeech network
deepspeechBuffer.m - Buffer features for DeepSpeech network
deepspeech.m - Load DeepSpeech speech-to-text network
deepspeechPostprocess.m - Decode output from DeepSpeech network

All-in-one:

deepspeech2text.m - Transcribe speech to text using DeepSpeech

deepspeech2text_stream.m - Transcribe streaming speech to text using DeepSpeech

Network Details

The model provided in this example corresponds to the pretrained Deep Speech model provided by [2]. The model was trained using the Fisher, LibriSpeech, Switchboard, and Common Voice English datasets, and approximately 1700 hours of transcribed WAMU (NPR) radio shows explicitly licensed to use as training corpora.

Metrics and Evaluation

Accuracy Metrics

[2] reports a 5.97% word error rate (WER) on the LibriSpeech clean test set. The WER corresponds to the acoustic model (included in this repo) and a language model, which is not included in this repo.

Size

The total size of the model is 167 MB.

License

The license is available in the License.txt file in this repository.

References

[1] Hannun, A. "DeepSpeech: Scaling up end-to-end speech recognition", 2014.

[2] https://github.com/mozilla/DeepSpeech/releases/tag/v0.7.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepspeech

Speech-to-Text Transcription Using Deep Speech

Requirements

Get Started

Network Details

Metrics and Evaluation

Accuracy Metrics

Size

License

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
+deepspeechUtility		+deepspeechUtility
images		images
002.flac		002.flac
License.txt		License.txt
README.md		README.md
SECURITY.md		SECURITY.md
deepspeech.m		deepspeech.m
deepspeech2text.m		deepspeech2text.m
deepspeech2text_stream.m		deepspeech2text_stream.m
deepspeechBuffer.m		deepspeechBuffer.m
deepspeechFeatures.m		deepspeechFeatures.m
deepspeechPostprocess.m		deepspeechPostprocess.m
deepspeech_deployment.mlx		deepspeech_deployment.mlx
deepspeech_inference.mlx		deepspeech_inference.mlx
deepspeech_streaming.mlx		deepspeech_streaming.mlx
deepspeech_transferlearning.mlx		deepspeech_transferlearning.mlx

License

matlab-deep-learning/deepspeech

Folders and files

Latest commit

History

Repository files navigation

deepspeech

Speech-to-Text Transcription Using Deep Speech

Requirements

Get Started

Network Details

Metrics and Evaluation

Accuracy Metrics

Size

License

References

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages