WaveSimSiam

Self-Supervised Learning for General Audio Representation of Raw Waveform

Description

자기지도학습은 데이터가 내포하고 있는 일반적인 표현을 학습하기 위한 방법론이다. 본 프로젝트에서는 SimSiam 모델을 기반으로 원시 오디오 파형에서 일반적인 표현을 학습할 수 있는 WaveSimSiam모델을 제안한다. WaveSimSiam은 오디오의 일반적인 표현을 학습할 수 있도록 원시 오디오 파형 데이터에 적합한 Augmentation Layer와 Encoding Layer를 설계하였다. 제안한 모델을 평가하기 위해 선형 분류기 평가 방식으로 분류와 인식 기반의 실제 작업을 진행하였다. WaveSimSiam은 대부분의 실제 작업에서 기존 논문들보다 향상된 성능결과를 보였다.

Getting Started

Datasets

FSD50K - Training Pretext task
UrbanSound8K - Downstream task
VoxCeleb1 - Downstream task
SpeechCommandV2 - Downstream task
RAVDESS - Downstream task
VoxForge - Downsteam task

Dependencies

Linux Ubuntu, Nvidia Docker, Python
adamp 0.3.0
scikit-learn 1.0.2
numpy 1.21.6
tensorboard 2.8.0
torch 1.12.0
torchvision 0.13.0
torchaudio 0.12.0
tqdm 4.63.1
sox 1.4.1
soundfile 0.10.3
natsort 8.1.0
WavAugment

Pretext Task

Make up own your configuration file. (There is an pretext example in the config folder)
You can modify this part at train.py

...
...

os.environ['CUDA_VISIBLE_DEVICES'] = '1' # input tranning cuda device number


...
...


def main():
    parser = argparse.ArgumentParser(description='waverdeep - WaveSimSiam')
    parser.add_argument("--configuration", required=False,
                        default='./config/write down your configuration file name')
                        
...
...

And then, start pretext task training!

python train.py

Downstream Task

Currently, only transfer learning is implemented in this project.

Make up own your configuration file. (There is an transfer learning example in the config folder)
You can modify this part at train.py

...
...

os.environ['CUDA_VISIBLE_DEVICES'] = '1' # input tranning cuda device number


...
...


def main():
    parser = argparse.ArgumentParser(description='waverdeep - WaveSimSiam')
    parser.add_argument("--configuration", required=False,
                        default='./config/write down your configuration file name')
                        
...
...

And then, start pretext task training!

python train.py

Authors

waverDeep

Version History

* Initial Release

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config_FT01		config_FT01
config_FT02		config_FT02
config_T01		config_T01
config_T01_DRP		config_T01_DRP
src		src
License		License
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config_FT01

config_FT01

config_FT02

config_FT02

config_T01

config_T01

config_T01_DRP

config_T01_DRP

src

src

License

License

README.md

README.md

train.py

train.py

Repository files navigation

WaveSimSiam

Description

Getting Started

Datasets

Dependencies

Pretext Task

Downstream Task

Authors

Version History

License

Acknowledgments

About

Releases

Packages

Languages

License

waverdeep/WaveSimSiam

Folders and files

Latest commit

History

Repository files navigation

WaveSimSiam

Description

Getting Started

Datasets

Dependencies

Pretext Task

Downstream Task

Authors

Version History

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Languages