Skip to content

Releases: PaddlePaddle/PaddleSpeech

PaddleSpeech r1.4.1

14 Apr 09:36
9d61b8c
Compare
Choose a tag to compare

Others

PaddleSpeech r1.4.0

15 Mar 08:10
d103cb8
Compare
Choose a tag to compare

S2T

T2S

Server

Engine

Audio

Demos

Docs

Others

  • Remove fluid API in ASR. #2944 #2859 #2852 by @zxcd
  • Add python simple adadelta optimizer. #2925 by @zxcd
  • Add encoding=utf-8 for text. #2896 by @zxcd #2865 by @yt605155624
  • Fix Tensor.numpy()[0] to float(Tensor) to adapt 0D. #2884 by @zhouwei25
  • Fix libsndfile.so not found in ubuntu18-cpu/Dockerfile. #2763 by @linkec
  • Fix AttributeError "module 'distutils' has no attribute 'ccompiler'" in setup.py in ctc_decoders. #2745 by @GreatV

New Contributors

Full Changelog: r1.3.0...r1.4.0

PaddleSpeech r1.3.0

14 Dec 06:38
c54c950
Compare
Choose a tag to compare

HighLIght

S2T

T2S

Audio

Demo

New Contributors

Full Changelog: r1.2.0...r1.3.0

PaddleSpeech r1.2.0

10 Oct 03:31
15ca007
Compare
Choose a tag to compare

S2T

T2S

Text

  • Create preprocess.py for Punctuation Restoration. #2295 by @THUzyt21

Demo

Server

  • Add num_decoding_left_chunks in streaming_asr_server's config. #2337 by @THUzyt21
  • Removed useless spk_id in speech_server and streaming_tts_server, support Chinese English mixed TTS server engine. #2380 by @WongLaw

Doc

Test

Other

Acknowledgements

Special thanks to @yt605155624 @lym0302 @THUzyt21 @iftaken @Zth9730 @zhoupc2015 @WongLaw @david-95 @pengzhendong @kslz @HighCWu @yuehuayingxueluo @sneaxiy @SmileGoat

New Contributors

Full Changelog: r1.1.0...r1.2.0

PaddleSpeech r1.1.0

19 Aug 10:58
aab5412
Compare
Choose a tag to compare

S2T

  • Add wer tools. #1709
  • Add optimize attention cache used for attention ; 0-dim tensor for model export. #2124
  • Fix cnn cache dy2st shape. #2168

TTS

Speechx

  • add custom asr script. #1946
  • refactor frontend. #2003
  • deepspeech2 to onnx #2034
  • Refactor audio/data/feature cache. #1638
  • Frontend refactor . #1640
  • Fix nnet itf header. #1641
  • Refactor speech egs. #1707
  • Refactor egs and more egs for TLG wfst graph build. #1715
  • Speedup ngram building . #1729
  • Update speechx install doc. #1736
  • Fix nnet input and output name. #1740
  • Update wfst graph. #1742
  • Fix model params path name. #1750
  • Remove fluid tools for onnx export. #2116

Audio

  • Refactor paddleaudio to paddlespeech.audio. #2007
  • Add webdataset in paddlespeech.audio. #2062

Server

  • Remove extra logs. #2111 #2113
  • Change streaming tts servers' fs from 24k to models' fs. #2121
  • Fix bug in engine_warmup. #2171 by @Betterman-qs
  • Replace default vocoder in seerver to mb_melgan. #2214
  • Fix bug in streaming_asr_server with punctuation restoration. #2244
  • Rename time_s and time_ns to time_b and time_nb. #2133
  • More accuracy decoding somthing. #2128

CLI

  • Add paddlespeech.resource module. #1917
  • Dynamic cli commands registration. #1959
  • Fix unnecessary download. #2103
  • Remove extra logs. #2084 #2085 #2107
  • Add Chinese English mixed TTS CLI. #2249
  • Add onnxruntime infer for CLI. #2222

Demo

  • Add speech web demo. #2039 #2080
  • Add kws cli and demo. #2063
  • Use paddle web for streaming asr. #2105
  • add custom asr script #1946
  • More cli for speech demos. #2138

Doc

  • Add API doc. #2075
  • Format tts doc string for read the docs. #2115

Others

Acknowledgements

Special thanks to @buchongyu2 @BrightXiaoHan @BarryKCL @Betterman-qs @david-95 @jerryuhoo @QingshuChen @iftaken @zh794390558 @Jackwaterveg @lym0302 @SmileGoat @yt605155624

New Contributors

Full Changelog: r1.0.0...r1.1.0

PaddleSpeech r1.0.0

13 May 10:25
44b7e51
Compare
Choose a tag to compare

Highlight

More

ASR

  • DeepSpeech2 streaming model aishell cer 6.66%
  • DeepSpeech2 streaming model wenetspeech cer: 15.2% (test_net, w/o LM), 24.17% (test_meeting, w/o LM), 5.3% (aishell, w/ LM)
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • Conformer streaming model wenetspeech cer: 11.0% (test_net), 18.79% (test_meeting)

Speechx

KWS

Audio

  • [Audio] rename paddleaudio to audio, since confilict with pkg name by @zh794390558 in #1758
  • [Audio] Fix mcd issue. by @KPatr1ck in #1658
  • [Audio] Remove mcd. by @KPatr1ck in #1659
  • [Audio] Add VoxCeleb dataset for speaker recognition.
  • [Audio] Add HeySnips dataset for keyword spotting.

What's Changed

Full Changelog: r1.0.0a...r1.0.0

PaddleSpeech r1.0.0a

28 Apr 04:59
b5fb276
Compare
Choose a tag to compare

Highlight

  • Release Streaming ASR and Streaming TTS system for industrial application.
  • Support KWS model
  • Deepspeech2 streaming model aishell cer 6.66%
  • Conformer aishell cer 4.64%
  • Conformer streaming model aishell cer 5.44%
  • SpeechX Deepspeech2 streaming with WFST

What's Changed

Read more

PaddleSpeech r0.2.0

01 Apr 07:48
05b8ba8
Compare
Choose a tag to compare

S2T

  • Replace kaidi_fbank with paddleaudio #1612
  • Support CTC decoder online #821 #1626
  • Improve accuracy of Conformer. Support using kaiming Uniform as default initialization. #1577

TTS

  • Add SpeedySpeech multi-speaker support for synthesize_e2e.py. #1370 by @jerryuhoo
  • Add WaveRNN for CSMSC dataset. #1379
  • Add Tacotron2 for CSMSC / LJSpeech datasets. #1314 / #1416
  • Add GE2E Tacotron2 Voice Cloning for AISHELL3 dataset. #1419
  • Update text frontend. #1506
  • Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets. #1549 / #1581 / #1587
  • Add NPU support for TransformerTTS. #1593 by @windstamp
  • Add CNN Decoder for Streaming Fastspeech2. #1634

Audio

  • Add paddleaudio.compliance modules that offers audio feature APIs aligned with Kaldi and Librosa. #1518
  • Unittest and benchmark for audio feature APIs. #1548
  • [Audio] - [audio] refactor audio arch #1494 by @zh794390558
  • [Audio] - [audio] dtw metric #1493 by @zh794390558
  • [Audio] - [audio] fix complicance bug #1597 by @zh794390558

Deployment

server

vector

  • [vector] - [vector] ecapa-tdnn on voxceleb #1523 by @Honei

CLI

  • Batch input supported. #1460
  • TTS: Add WaveRNN for CSMSC dataset.
  • TTS: Add HiFiGAN for LJSpeech / AISHELL-3 / VCTK datasets.
  • Vector: add speaker verification demo and doc #1605 by @Honei

Demo

  • [Demo] - [vec][search] update client image url #1628 by @qingen
  • [Demo] - [server] add server demo #1480 by @lym0302
  • [Demo] - [vec][search] add audio similarity search #1609 by @qingen

Acknowledgements

Special thanks to @WilliamZhang06 @yt605155624 @windstamp @Jackwaterveg @Honei @SmileGoat @KPatr1ck @zh794390558 @lym0302 @qingen

PaddleSpeech r0.1.2

25 Feb 03:03
c7a9650
Compare
Choose a tag to compare

Bug Fix:

  1. FIxed the version of librosa==0.8.1. Solve the compatibility issue caused by librosa upgrading. #1426

PaddleSpeech r0.1.1

14 Jan 03:27
3d5aac6
Compare
Choose a tag to compare

New Features

CLI :

  • Add cli stats. #1274
  • Add unit test. #1321
  • ASR: Support English: Add transformer_libirspeech model. #1297
  • ASR: Support 4 decoding methods: ctc_greedy_search, ctc_beam_search, attention, attention_rescoring. #1297
  • ASR & ST: Use the unified config. #1305 / #1312
  • ASR: Refactor the code. #1260 by @AdamBear
  • TTS: Support long input text by default. #1241
  • TTS: Add Style MelGAN and HiFiGAN. #1241

ASR

  • Refactor configs in examples. #1225

TTS

ST

  • Refactor configs in examples. #1225

Text

  • Refactor Punctuation Restoration example. #1215

Docs

  • Add topic note for releasing python packages
  • Add TTS papers. #1330
  • Add Frontend G2P topic. #1254

Others

  • Update released models and results. #1306

Acknowledgements

@zh794390558 @yt605155624 @Jackwaterveg @KPatr1ck @Mingxue-Xu @JiehangXie @grasswolfs @jerryuhoo @AdamBear @LittleChenCc @JamesLim-sy