How to get audio? #8

asker-github · 2020-11-26T14:34:51Z

Hello, I'm trying to use your model to test my video. And my location is terrible. How do you process video to get audio?
This is how I extract audio:
from moviepy.editor import *
audioclip = AudioFileClip(video_path) #read video
audioclip.write_audiofile(audio_path, 48000) #save as wav
The shape of the read audio data is (n * 2), so I can only take the average value so that the program can run normally.
extractor.py：
rate, sample = wavfile.read(aud_path)
sample = np.mean(sample, 1) #todo: I added it myself
But my location is terrible, so I'd like to know how you extract audio from your own video.
In addition, do you have a good positioning effect?

kyuyeonpooh · 2020-11-29T15:47:43Z

Hi,

Thank you for your interest in my code and project.

In case of extracting audio files from videos, I used ffmpeg.
Please remind that people usually use mono (single-channel) audio to obtain audio features.

The command below is what I used.

ffpmeg -y -i <input_video.mp4> -ac 1 -ar <sampling_rate> -vn -- <output_audio.wav>

Please also consider using ffmpeg-python if you want to use python wrapper for ffmpeg.
The code below (URL) is the example of extracting wav files from videos using ffmpeg-python.
https://github.com/kyuyeonpooh/VAT-Net/blob/54ba38c45f40f22c9e15fb67e0c24aa22469184c/extract.py#L92-L109

In utils/extractor.py, there is a code which preprocesses audio files into spectrograms.

objects-that-sound/utils/extractor.py

Lines 91 to 137 in d19f971

 def extract_spectrogram( 

 self, aud_file, sr=48000, winsize=480, overlap=0.5, nfft=512, logscale=True, eps=1e-7, **kwargs 

 ): 

 # parse audio ID from audio file path 

 aud_path = os.path.join(self.src_aud_dir, aud_file) 

 aud_id = os.path.splitext(aud_file)[0][len(self.aud_fname_head) :] 

 # audio file reading with validity check on arguments 

 try: 

 rate, sample = wavfile.read(aud_path) 

 except: 

 print("Failed to open wav file, aud_id: {}".format(aud_id)) 

 return False 

 if rate != sr: 

 print("Given sampling rate does not match, aud_id: {}".format(aud_id)) 

 return False 

 duration = len(sample) / sr 

 if self.start_pos + self.interval * self.nseg > duration: 

 print("Error in audio file or in method arguments, aud_id: {}".format(aud_id)) 

 return False 

 # extract spectrograms 

 spec_dict = dict() 

 seg_count = 0 

 start = self.start_pos 

 end = start + self.interval 

 try: 

 while seg_count < self.nseg: 

 cur_sample = sample[int(start * sr) : int(end * sr)] 

 freq, time, spectrogram = signal.spectrogram( 

 cur_sample, fs=sr, nperseg=winsize, noverlap=winsize * overlap, nfft=nfft 

 ) 

 # convert into log-scale spectrogram (magnitude to decibel) 

 if logscale: 

 spectrogram = 10 * np.log10(spectrogram + eps) 

 # update interval pointers 

 spec_dict[str(seg_count)] = spectrogram 

 start += self.interval 

 end += self.interval 

 seg_count += 1 

 except: 

 print("Error occurs when extracting a spectrogram from audio, aud_id: {}".format(aud_id)) 

 return False 

 # save into npz file 

 np.savez_compressed(os.path.join(self.dst_aud_dir, aud_id + ".npz"), **spec_dict) 

 return True

Here are some detailed explanation of the source code above:

Read an audio files with wavfile.read().
Extract an particular 1-second interval.
Convert the audio interval into the spectrogram using scipy.signal.spectrogram().
Convert the spectrogram into log-scale.
Before feeding the spectrogram into the network, I normalized spectrograms with their means and standard deviation. (Please refer to this code.)

In addition, as using mel-spectrogram is a trend of audio processing procedure,
please also consider using librosa to convert wav into mel-spectrograms.
For this, you can refer to the code below (URL).
https://github.com/kyuyeonpooh/VAT-Net/blob/54ba38c45f40f22c9e15fb67e0c24aa22469184c/datasets/VGGSound.py#L114-L120

I hope this answer might help you understand the procedure of extracting and preprocessing audio.

If you have any more questions, please do not hesitate to leave an issue. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get audio? #8

How to get audio? #8

asker-github commented Nov 26, 2020

kyuyeonpooh commented Nov 29, 2020 •

edited

How to get audio? #8

How to get audio? #8

Comments

asker-github commented Nov 26, 2020

kyuyeonpooh commented Nov 29, 2020 • edited

kyuyeonpooh commented Nov 29, 2020 •

edited