Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request to re-test sherpa-ncnn #2

Open
csukuangfj opened this issue Apr 24, 2023 · 4 comments
Open

Request to re-test sherpa-ncnn #2

csukuangfj opened this issue Apr 24, 2023 · 4 comments

Comments

@csukuangfj
Copy link

The model small-2023-01-09 is not our best-performing model.

Please have a look at of our latest streaming zipformer at
https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/zipformer-transucer-models.html

They can get a reasonable WER even without an LM and is quite fast.

@csukuangfj
Copy link
Author

Here is the command for testing

./build/bin/sherpa-ncnn \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/tokens.txt \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \
  ./sherpa-ncnn-streaming-zipformer-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \
  ./test-files_en_speech_jfk_11s.wav
  1 \
  greedy_search

And here is the result

Disable fp16 for Zipformer encoder
Don't Use GPU. has_gpu: 0, config.use_vulkan_compute: 1
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-en-2023-02-13/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False)
wav filename: ./test-files_en_speech_jfk_11s.wav
wav duration (s): 11
Started!
Done!
Recognition result for ./test-files_en_speech_jfk_11s.wav
text:  AND SAW MY FELLOW AMERICANS ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY
timestamps: 0.8 1.28 1.44 1.68 1.8 1.92 2 2.12 2.2 2.36 2.52 2.8 4 4.2 4.44 5.76 6.08 6.32 6.6 6.84 7.08 7.36 7.64 8.64 8.8 9.04 9.32 9.6 9.8 10 10.16 10.44 10.76
Elapsed seconds: 1.150 s
Real time factor (RTF): 1.150 / 11.000 = 0.105

@csukuangfj
Copy link
Author

Note: the above test is run on macOS, but it can also be run on raspberry pi.

@fquirin
Copy link
Owner

fquirin commented Apr 24, 2023

I will test the new models soon, thanks for mentioning 👍

@fquirin
Copy link
Owner

fquirin commented Apr 25, 2023

Did a quick test-run, results are definitely much better! 😎👍

Some examples:

Old: PLAY HARD WIFE HERSELF DESTRUCTS BY THE TALLICA
New: PLAY HARD WIRE TO SELF DISTRACTS BY METELICA (pretty close)

Old: WHOM HE WAY WHO THE TRAIN
New: SHOW ME THE WAY FROM NEW YORK TO CHICAGO WITH THE TRAIN (nailed it)

Old: SAID WHEN HE WILL DECREASE
New: SAID THE TWO TO TWENTY ONE DEGREES
Org.: "Set the heater to 21 degrees" 😑

Do you have instructions how to include language models or maybe a way to add/emphasize custom vocabulary somehow (dynamic graph etc.)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants