Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take a look at this #31

Closed
fenghe12 opened this issue Mar 26, 2024 · 20 comments
Closed

take a look at this #31

fenghe12 opened this issue Mar 26, 2024 · 20 comments

Comments

@fenghe12
Copy link

https://github.com/Zejun-Yang/AniPortrait a new open source talking head generation repo.It seems it's very similar to Emote

@johndpope
Copy link
Owner

I did see it - I'll add to readme. seems a complete rip off from MooreThread animateAnyone - Poseguider https://github.com/MooreThreads/Moore-AnimateAnyone/blob/master/train_stage_1.py#L54
Disappointing thing is - there's no training code - so all the models are locked up.

@fenghe12
Copy link
Author

maybe you can refer to their model code,at least we have that,training code seems just like animate anyone

@johndpope
Copy link
Owner

looking at the saved models looks less scary than this mess
MStypulkowski/diffused-heads#21

can probably just load these into simpler architecture.
Screenshot from 2024-03-27 12-51-48

I think the reader /writer is just to have the paralllel unet (to dig out the features from referencenet - reader - and throw to backbone)
https://github.com/Zejun-Yang/AniPortrait/blob/main/train_stage_1.py#L53

@fenghe12
Copy link
Author

fenghe12 commented Mar 27, 2024 via email

@johndpope
Copy link
Owner

johndpope commented Mar 27, 2024

did you see this? #28 - I think we can just piggy back off the Alibaba pretrained unet model -

@fenghe12
Copy link
Author

fenghe12 commented Mar 27, 2024 via email

@johndpope
Copy link
Owner

johndpope commented Mar 27, 2024

that's why I'm thinking it will be plug and play. Got all the models for AniPortrait -
check this helper out
xmu-xiaoma666/External-Attention-pytorch#115

@fenghe12
Copy link
Author

fenghe12 commented Mar 27, 2024 via email

@johndpope
Copy link
Owner

johndpope commented Mar 27, 2024

the aniportrait is good. I thought the ControlNetMediaPipeFace maybe best solution
#23

@chris-crucible - it seems that they have enhanced the lips from the base media pipeline
maybe worth retraining the model? here's the default sample.
https://drive.google.com/file/d/198TWE631UX1z_YzbT31ItdF6yhSyLycL/view?usp=sharing

https://github.com/Zejun-Yang/AniPortrait/blob/bfa15742af3233c297c72b8bb5d7637c5ef5984a/src/utils/draw_util.py#L36


    FACEMESH_LIPS_OUTER_BOTTOM_LEFT = [(61,146),(146,91),(91,181),(181,84),(84,17)]
        FACEMESH_LIPS_OUTER_BOTTOM_RIGHT = [(17,314),(314,405),(405,321),(321,375),(375,291)]
        
        FACEMESH_LIPS_INNER_BOTTOM_LEFT = [(78,95),(95,88),(88,178),(178,87),(87,14)]
        FACEMESH_LIPS_INNER_BOTTOM_RIGHT = [(14,317),(317,402),(402,318),(318,324),(324,308)]
        
        FACEMESH_LIPS_OUTER_TOP_LEFT = [(61,185),(185,40),(40,39),(39,37),(37,0)]
        FACEMESH_LIPS_OUTER_TOP_RIGHT = [(0,267),(267,269),(269,270),(270,409),(409,291)]
        
        FACEMESH_LIPS_INNER_TOP_LEFT = [(78,191),(191,80),(80,81),(81,82),(82,13)]
        FACEMESH_LIPS_INNER_TOP_RIGHT = [(13,312),(312,311),(311,310),(310,415),(415,308)]

I get that there's still expression issues here. but result is quite good.

The head rotations maybe nice branch to get the emotion into video.

@johndpope johndpope mentioned this issue Mar 28, 2024
@johndpope
Copy link
Owner

Screenshot from 2024-03-29 16-32-04

@fenghe12
Copy link
Author

fenghe12 commented Mar 29, 2024 via email

@johndpope
Copy link
Owner

python ./scripts/vid2vid.py --config ./configs/prompts/animation_facereenac.yaml -W 512 -H 512 -L 256

animation_facereenac.yaml

pretrained_base_model_path: '/media/2TB/Emote-hack/pretrained_models/StableDiffusion'
pretrained_vae_path: "stabilityai/sd-vae-ft-mse"
image_encoder_path: '/media/oem/12TB/AniPortrait/pretrained_model/image_encoder'


denoising_unet_path: "./pretrained_model/denoising_unet.pth"
reference_unet_path: "./pretrained_model/reference_unet.pth"
pose_guider_path: "./pretrained_model/pose_guider.pth"
motion_module_path: "./pretrained_model/motion_module.pth"

inference_config: "./configs/inference/inference_v2.yaml"
weight_dtype: 'fp16'

test_cases:
  "./configs/inference/ref_images/lyl.png":
    - '/media/2TB/Emote-hack/junk/M2Ohb0FAaJU_1.mp4'

there's no speed embedding - so the vanilla image to video will hold the face in video mostly - but because they're using the animateanyone framework - they get video2video out of the box - allowing this
https://drive.google.com/file/d/1HaHPZbllOVPhbGkvV3aHLtcEew9CZGUV/view

@johndpope
Copy link
Owner

hey @fenghe12

i had some success with megaportraits
https://github.com/johndpope/MegaPortrait-hack

and now attempting to integrate into VASA on this branch.
https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits work on

@fenghe12
Copy link
Author

VASA adopts DiT as backbone denoising network,but it lacks more details about how to integrate conditons into dit. I attempted to replace the unet in Moore animateanyone with DIT (Latte, a video generation model), but the results were not satisfactory. We are now trying to train a talking face video generation model based on OpenSora-plan.

@fenghe12
Copy link
Author

I guess DiT will be mainstream video generation architecture,because of SORA

@fenghe12
Copy link
Author

Maybe I can offer some help

hey @fenghe12

i had some success with megaportraits https://github.com/johndpope/MegaPortrait-hack

and now attempting to integrate into VASA on this branch. https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits work on

@johndpope
Copy link
Owner

johndpope commented Jun 12, 2024

i attempt to port matmulfree for llm to pytorch

https://github.com/ridgerchu/matmulfreellm

i send you a link - could be more exciting if i can get cuda code working.

@fenghe12
Copy link
Author

fenghe12 commented Jun 13, 2024 via email

@johndpope
Copy link
Owner

sorry that project was 3 days down the gurgler - learned how to compile cuda code.

not sure how to handle the audio stuff - wav2vec -
https://github.com/johndpope/VASA-1-hack/tree/MegaPortraits

@fenghe12
Copy link
Author

how can i help you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants