Speech2Text-for-Long-Audio-Files

Speech recognition is a fun task. A lot of API resources are available in market today which makes it easier for user to opt for one or another. However, when it comes to audio files like processing lengthy audio files then this becomes quite challenging.I have used Google Speech to Text API for performing this operation.

A simple Demo:

( Use Google Chrome/Microsoft Edge for viewing the demo)

Speech2Text-Demo.mp4

Google Speech to text has three types of API requests based on audio content:

1. Synchronous Request:

The audio file content should be approximately 1 minute to make a synchronous request. In this type of request, the user does not have to upload the data to Google cloud. This provides the flexibility to users to store the audio file in their local computer or server and reference the API to get the text.

2. Asynchronous Request:

The audio file content should be approximately 480 minutes(8 hours). In this type of request, the user have to upload their data to Google cloud. Something that I am using here.)

3. Streaming Request:

It is suitable for streaming data where the user is talking to microphone directly and needs to get it transcribed. This type of request is apt for chatbots. Again, the streaming data should be approximately a minute for this type of request.

Initial Setup

Before we begin, we need to do some initial setup for setting up the API client and storing the necessary credentials details which you would be needing later. Please follow this link
Once we create the API client, the next step is to create a storage bucket..

My methodology for converting speech to text:

Importing the necessary packages.
Audio file encoding. You can read about it here.

Audio file specifications One other limitation is that the API does not support stereo audio files. So we need to convert a stereo file to mono file before using the API. In addition, we also have to provide the audio frame rate for the file. I already implemented a function in the code to convert the audio files to .wav format.
Upload files to Google storage In order to perform asynchronous request the file is uploaded to google cloud.
Delete files in Google storage Once the speech to text operation is completed, the file can be deleted from Google cloud to avoid unnecessary costs.
Transcribe Convert the speech to plain text and save them as separate transcripts(text files). A sample transcript looks like this:

What if I have more than 1 speaker in my audio file? Like a conversation!!?

Speaker Diarization is a process of distinguishing speakers in an audio file. I Google speech to text API to perform speaker diarization which is given as a separate script. The final transcripts generated by Google after speaker diarization looks like below.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Transcripts		Transcripts
audio_wav		audio_wav
Google_LongAudio_multiple_speakers.py		Google_LongAudio_multiple_speakers.py
Google_Longaudio_API.py		Google_Longaudio_API.py
LICENSE		LICENSE
README.md		README.md
conversation.wav		conversation.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Text-for-Long-Audio-Files

A simple Demo:

1. Synchronous Request:

2. Asynchronous Request:

3. Streaming Request:

Initial Setup

What if I have more than 1 speaker in my audio file? Like a conversation!!?

Now, why not you go ahead and record some voice notes of yours or some meetings and transcribe them using Speech2Text?? :)

About

Releases

Packages

Languages

License

prateekralhan/Speech2Text-for-Long-Audio-Files

Folders and files

Latest commit

History

Repository files navigation

Speech2Text-for-Long-Audio-Files

A simple Demo:

1. Synchronous Request:

2. Asynchronous Request:

3. Streaming Request:

Initial Setup

What if I have more than 1 speaker in my audio file? Like a conversation!!?

Now, why not you go ahead and record some voice notes of yours or some meetings and transcribe them using Speech2Text?? :)

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages