A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
Updated
May 25, 2024 - Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
End-to-End Speech Processing Toolkit
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
The dataset of Speech Recognition
A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface made using Tkinter. Code written fully in Python.
Tracking the progress in end-to-end speech translation
Zero -- A neural machine translation system
Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
🖍️ This project combines multiple operations in Microsoft Azure Cognitive Services into one GUI, including QnA Maker, LUIS, Computer Vision, Custom Vision, Face, Form Recognizer, Text To Speech, Speech To Text and Speech Translation. It's very user-friendly for users to implement any operation mentioned above.
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".
code for paper "Cross-modal Contrastive Learning for Speech Translation" (NAACL 2022)
Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.
Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models"
Code for NeurIPS 2023 paper "DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation".
Pushing the Limits of Zero-shot End-to-End Speech Translation
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
This repository contains the data resources for the LacunaFund supported project, Multimodal datasets for the Bemba Language of Zambia.
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
This is an implementation of paper "End-to-end Speech Translation via Cross-modal Progressive Training" (Interspeech2021)
Add a description, image, and links to the speech-translation topic page so that developers can more easily learn about it.
To associate your repository with the speech-translation topic, visit your repo's landing page and select "manage topics."