Skip to content

Zero-shot Acoustic Guitar Timbre Transfer using Pix2Pix HD: converting humming voice into guitar using semantic harmonics

License

Notifications You must be signed in to change notification settings

hwang9u/hum2Guitar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hum2Guitar

Acoustic Guitar Timbre Transfer using Pix2PixHD architecture

✨ Inspiration

I have ALWAYS loved playing the Guitar🎸 since I was young. (I especially enjoy fingerstyle playing.) When I first started practicing the guitar, my fingers didn't work. So I once dreamed that "If I hum, I want it to change to guitar sound." It was wildest dream at the time, so I just practiced the guitar harder. 😂 This project began simply with curiosity about the memory.

Approach

✅ Timbre-Transfer using Image-to-Image translation techniques

  • I utilized the mel spectrogram as the imput images. It can show the time-frequency characteristics of sound.
  • However, since it has only magnitude information. So I used Griffin-Lim algorithm as a baseline for phase reconstruction.

✅ Semantic Harmonics as input semantic label

  • I attempted to extract the fundamental frequency(F0) from audio signal and utilize the fact that "its positive integer multiple is a harmonic" to create a semantic label.
  • I'll refer to the artificially generated harmonics as "Semantic Harmonics"
  • Therefore, we can create our own paired dataset. As you may have guessed, not only humming but any sound with pitch can be transformed into a guitar sound!

✅ Pix2PixHD Architecture

  • At first, I attempted to use the Pix2Pix architecture, but I found that it didn't represent local information well, resulting in lack of sharpness in the output audio sound.
  • Therefore, I tried to employ the Pix2PixHD architecture, which is known for capturing fine-grained details in local information. (Other SOTA architectures are also worth trying.)

✅ Data Segmentation Instead of Data Augmentation

  • There are only 180 "solo" samples available in the GuitarSet dataset. It is extremely small size. I though more audio samples with various pitch are required.
  • Although the dataset size was very small, I didn't do any augmentation. Because playing low notes on the guitar is not simply a matter of lowering the pitch. The resonance when plucking the strings also varies. When playing low notes, there is more "buzzing" sound, but "pitch shift" did not reflect this aspect.
  • Instead, I segmented the audio files into 5s durations and stored them to enable more weight updates.

Data Pre-Processing

Training

Reconstructing the Audio signal from Synthesized Mel Spectrogram


Dataset

GuitarSet

MTG-QBH


Run

$ cd hum2guitar
$ python source/train.py --guitar_dir GUITARSET_DIR --humming_dir HUMMING_DIR  
  • Check utils/env.py and args.py for more training details.

Results

Humming to Guitar

+) Guitar to Guitar

How close is the sound restored from the semantic harmonics of the guitar to the original, not humming semantic harmonics?

  • ❗️ We can guess on how well the model can accurately restore from semantic harmonics based on these examples.
  • ❗️ After comparing the synthesized audio with the actual input audio of the guitar, it became evident that while the synthesized audio was "similar" to the actual guitar sound, there were distinct differences in timbre.
  • ❗️ I thought it would be reasonable to examine the results by converting real mel spectrograms into audio to determine the source of the differences between the synthesized sound and the original guitar sound.
  • ❗️I have confirmed that when converting the real guitar's mel spectrogram into audio, the timbre of the input audio is not reproduced accurately. Therefore, in order to more accurately reproduce the timbre of the guitar, it is necessary to explore better methods for restoring the "phase information" of the guitar.
  • ❗️Therefore, I need to find a better method than the GLA or discover better features than the Mel spectrogram.

Outro

  • What I obtained from this project was not an exact guitar playing sound but rather a sound resembling a guitar. My model produces a sound similar to when I first started playing the guitar , with a gentle plucking sensation as if using my fingers instead of a pick.
  • Of course, using more advanced generation algorithms could potentially achieve a sound closer to that of a guitar. To address this aspect, I need to continue exploring various approaches and strive to improve and grow in the future.
  • Anyway, I felt happy during this project because it made me feel like I fulfilled my childhood dream on my own. 😆

+) To do: Key points I need to focus on

  • The sound of a guitar is influenced by various factors.
    • The physical elements of the guitar itself: wood, strings, height of strings ...
    • The elements related to "playing": timbre when performing techniques such as sliding and hammering ....
  • It would be beneficial to first explore the characteristics in the frequency domain and also further investigate the impulse response of an acoustic guitar.
  • Hyper-parameter optimization
  • Attempts to approach using different generative model
  • Mel to Audio inversion method (focusing on phase reconstruction)

Reference

[1] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs", in CVPR, 2018.
[2] Pix2PixHD official repository: https://github.com/NVIDIA/pix2pixHD/tree/master
[3] PyCeps: https://github.com/hwang9u/pyceps

Cite

If you want to use this code, please cite as follows:

@misc{hwang9u-hum2guitar,
  author = {Kim, Seonju},
  title = {hum2guitar},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/hwang9u/hum2guitar}},
}

About

Zero-shot Acoustic Guitar Timbre Transfer using Pix2Pix HD: converting humming voice into guitar using semantic harmonics

Topics

Resources

License

Stars

Watchers

Forks

Languages