Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get audio? #8

Open
asker-github opened this issue Nov 26, 2020 · 1 comment
Open

How to get audio? #8

asker-github opened this issue Nov 26, 2020 · 1 comment

Comments

@asker-github
Copy link

Hello, I'm trying to use your model to test my video. And my location is terrible. How do you process video to get audio?
This is how I extract audio:
from moviepy.editor import *
audioclip = AudioFileClip(video_path) #read video
audioclip.write_audiofile(audio_path, 48000) #save as wav
The shape of the read audio data is (n * 2), so I can only take the average value so that the program can run normally.
extractor.py:
rate, sample = wavfile.read(aud_path)
sample = np.mean(sample, 1) #todo: I added it myself
But my location is terrible, so I'd like to know how you extract audio from your own video.
In addition, do you have a good positioning effect?

@kyuyeonpooh
Copy link
Owner

kyuyeonpooh commented Nov 29, 2020

Hi,

Thank you for your interest in my code and project.


In case of extracting audio files from videos, I used ffmpeg.
Please remind that people usually use mono (single-channel) audio to obtain audio features.

The command below is what I used.

ffpmeg -y -i <input_video.mp4> -ac 1 -ar <sampling_rate> -vn -- <output_audio.wav>

Please also consider using ffmpeg-python if you want to use python wrapper for ffmpeg.
The code below (URL) is the example of extracting wav files from videos using ffmpeg-python.
https://github.com/kyuyeonpooh/VAT-Net/blob/54ba38c45f40f22c9e15fb67e0c24aa22469184c/extract.py#L92-L109


In utils/extractor.py, there is a code which preprocesses audio files into spectrograms.

def extract_spectrogram(
self, aud_file, sr=48000, winsize=480, overlap=0.5, nfft=512, logscale=True, eps=1e-7, **kwargs
):
# parse audio ID from audio file path
aud_path = os.path.join(self.src_aud_dir, aud_file)
aud_id = os.path.splitext(aud_file)[0][len(self.aud_fname_head) :]
# audio file reading with validity check on arguments
try:
rate, sample = wavfile.read(aud_path)
except:
print("Failed to open wav file, aud_id: {}".format(aud_id))
return False
if rate != sr:
print("Given sampling rate does not match, aud_id: {}".format(aud_id))
return False
duration = len(sample) / sr
if self.start_pos + self.interval * self.nseg > duration:
print("Error in audio file or in method arguments, aud_id: {}".format(aud_id))
return False
# extract spectrograms
spec_dict = dict()
seg_count = 0
start = self.start_pos
end = start + self.interval
try:
while seg_count < self.nseg:
cur_sample = sample[int(start * sr) : int(end * sr)]
freq, time, spectrogram = signal.spectrogram(
cur_sample, fs=sr, nperseg=winsize, noverlap=winsize * overlap, nfft=nfft
)
# convert into log-scale spectrogram (magnitude to decibel)
if logscale:
spectrogram = 10 * np.log10(spectrogram + eps)
# update interval pointers
spec_dict[str(seg_count)] = spectrogram
start += self.interval
end += self.interval
seg_count += 1
except:
print("Error occurs when extracting a spectrogram from audio, aud_id: {}".format(aud_id))
return False
# save into npz file
np.savez_compressed(os.path.join(self.dst_aud_dir, aud_id + ".npz"), **spec_dict)
return True

Here are some detailed explanation of the source code above:

  1. Read an audio files with wavfile.read().
  2. Extract an particular 1-second interval.
  3. Convert the audio interval into the spectrogram using scipy.signal.spectrogram().
  4. Convert the spectrogram into log-scale.
  5. Before feeding the spectrogram into the network, I normalized spectrograms with their means and standard deviation. (Please refer to this code.)

In addition, as using mel-spectrogram is a trend of audio processing procedure,
please also consider using librosa to convert wav into mel-spectrograms.
For this, you can refer to the code below (URL).
https://github.com/kyuyeonpooh/VAT-Net/blob/54ba38c45f40f22c9e15fb67e0c24aa22469184c/datasets/VGGSound.py#L114-L120


I hope this answer might help you understand the procedure of extracting and preprocessing audio.

If you have any more questions, please do not hesitate to leave an issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants