Skip to content

[CVPR 2024] AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion

Notifications You must be signed in to change notification settings

kiranchhatre/amuse

Repository files navigation


Kiran Chhatre · Radek Daněček · Nikos Athanasiou
Giorgio Becherini · Christopher Peters · Michael J. Black · Timo Bolkart


Project Page Paper PDF Intro Video Poster PDF



This is a repository for AMUSE: Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion. AMUSE generates realistic emotional 3D body gestures directly from a speech sequence (top). It provides user control over the generated emotion by combining the driving speech with a different emotional audio (bottom).

News 🚩

  • [2024/06/12] Code is available.
  • [2024/02/27] AMUSE has been accepted for CVPR 2024! Working on code release.
  • [2023/12/08] ArXiv is available.

Setup

Main Repo Setup

git clone https://github.com/kiranchhatre/amuse.git
cd amuse/dm/utils/
git clone https://github.com/kiranchhatre/sk2torch.git
git clone -b init https://github.com/kiranchhatre/PyMO.git
cd ../..
git submodule update --remote --merge --init --recursive
git submodule sync

git submodule add https://github.com/kiranchhatre/sk2torch.git dm/utils/sk2torch
git submodule add -b init https://github.com/kiranchhatre/PyMO.git dm/utils/PyMO

git submodule update --init --recursive

git add .gitmodules dm/utils/sk2torch dm/utils/PyMO

Environment Setup

conda create -n amuse python=3.8
conda activate amuse
export CUDA_HOME=/is/software/nvidia/cuda-11.3
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
conda env update --file amuse.yml --prune
module load cuda/11.3
conda install anaconda::gxx_linux-64 # install 11.2.0
FORCE_CUDA=1 pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1110/download.html

Blender Setup

conda deactivate
conda env create -f blender.yaml
AMUSEPATH=$(pwd)
cd ~
wget https://download.blender.org/release/Blender3.4/blender-3.4.1-linux-x64.tar.xz
tar -xvf ./blender-3.4.1-linux-x64.tar.xz
cd ~/blender-3.4.1-linux-x64/3.4
mv python/ _python/
ln -s /home/kchhatre/anaconda3/envs/envs/blender ./python
cd "$AMUSEPATH"
cd scripts
conda activate amuse

Data Setup and Blender Resources

Follow instructions: https://amuse.is.tue.mpg.de/download.php


Tasks

Once the above setup is correctly done, you can execute the following:

  • train_audio (training step 1/2)
    Train AMUSE step 1 of the speech disentanglement model.

    cd $AMUSEPATH/scripts
    python main.py --fn train_audio
  • train_gesture (training step 2/2)
    Train AMUSE step 2 of the gesture generation model.

    cd $AMUSEPATH/scripts
    python main.py --fn train_gesture
  • infer_gesture
    Infer AMUSE on a single 10s WAV monologue audio sequence.
    Place audio in $AMUSEPATH/viz_dump/test/speech.
    Video of generated gesture will be in $AMUSEPATH/viz_dump/test/gesture.

    cd $AMUSEPATH/scripts
    python main.py --fn infer_gesture
  • edit_gesture
    COMING SOON

    cd $AMUSEPATH/scripts
    python main.py --fn infer_gesture
  • bvh2smplx_
    Convert BVH to SMPLX (only with provided BMAP presets from AMUSE website download page if possible).
    Highly experimental, no support. Place BVH file inside $AMUSEPATH/data/beat-rawdata-eng/beat_rawdata_english/<<actor_id>>, where actor_id is between 1 and 30. The converted file will be in $AMUSEPATH/viz_dump/smplx_conversions.

    cd $AMUSEPATH/scripts
    python main.py --fn bvh2smplx_

    Once converted, import the file in Blender using the SMPLX blender addon. Remember to specify the target FPS (for current file: 24 FPS) in the import animation window while importing the NPZ file.

  • prepare_data
    Train AMUSE on BEAT 0.2.1 or BEAT-X or custom dataset. COMING SOON: Conversion script, dataloader LMDB file creation.

    cd $AMUSEPATH/scripts
    python main.py --fn prepare_data
  • other
    COMING SOON


Citation

@InProceedings{Chhatre_2024_CVPR,
    author    = {Chhatre, Kiran and Daněček, Radek and Athanasiou, Nikos and Becherini, Giorgio and Peters, Christopher and Black, Michael J. and Bolkart, Timo},
    title     = {{AMUSE}: Emotional Speech-driven {3D} Body Animation via Disentangled Latent Diffusion},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {1942-1953},
    url = {https://amuse.is.tue.mpg.de},
}

Contact

For any inquiries, please feel free to contact [email protected]. Feel free to use this project and contribute to its improvement.