Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Fix the realistic person's tongue #439

Open
timmyhk852 opened this issue Apr 30, 2024 · 11 comments
Open

[Feature]: Fix the realistic person's tongue #439

timmyhk852 opened this issue Apr 30, 2024 · 11 comments
Labels
enhancement New feature or request 🔬 research

Comments

@timmyhk852
Copy link

Feature description

The tongue cannot be generated by using this extension.
So the person cannot tongue out.

@timmyhk852 timmyhk852 added enhancement New feature or request new labels Apr 30, 2024
@johndpope
Copy link

this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask
if you click Face Mask Correction to see difference.
#292

@timmyhk852
Copy link
Author

this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask if you click Face Mask Correction to see difference. #292

same after ticking Face Mask Correction,

@johndpope
Copy link

@timmyhk852 - I tested with tongue out - and it failed. I have had results with mouth open - though it's not good enough.
was looking at this again today - maybe can use mediapipe to create the mask and cut the bottom of the mask to the top lips. but results will probably still disappoint.
I'm wondering if the insightface / onnx / mapper backing model is simply inadequate or it needs another model to help here.
The reactor / roop stuff works fantastic but it fails miserably in this use case.
I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.

for the simpler cosmetic approach - @Gourieff -did you use mediapipe
not sure how to articulate this - but I'm not clear on how I could pass mediapipe coordinates to create a different mask

  # Process the image to detect face landmarks.

    self.mp_face_mesh = mp.solutions.face_mesh
        results = self.mp_face_mesh.process(image_rgb)

        img_h, img_w, _ = image.shape
        face_3d = []
        face_2d = []


        if results.multi_face_landmarks:       
            for face_landmarks in results.multi_face_landmarks:

https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941

apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)

this is advance detection of lips from a project I was reviewing the other week
https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133

it's possible I could have this contained and wired up to just do this augmentation of mask

@Gourieff
Copy link
Owner

I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.

I've been thinking about this as well... We need to make a "reverse engineering" of the inswapper model to improve it and make a new model with 256 or 512 target-input (it would be great for Community to have really free-licensed model with HQ output) and maybe with an additional masking input or as you suggested in the way as it could be a Lora

About masking of parts... There is smth like this in Facefusion, I've not tested it with tongues, but it works with lips and teeth
So I have in plans to implement such segmenting for ReActor in future updates, just need to find free time for this

@Gourieff
Copy link
Owner

https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941

apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)

this is advance detection of lips from a project I was reviewing the other week
https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133

it's possible I could have this contained and wired up to just do this augmentation of mask

Hm... Rather interesting... 🧐

@johndpope
Copy link

johndpope commented May 1, 2024

@johndpope
Copy link

had a play with ConsistentID - IT WORKS!!!!
after some faffing around - JackAILab/ConsistentID#18

@timmyhk852
Copy link
Author

had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18

I dont understand...so is it possible for the swapped face to tongue out now?

@Gourieff
Copy link
Owner

Gourieff commented May 4, 2024

had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18

Nice! I'll take a look next week
Maybe we can combine your PR with this feature, it would be super-good

@johndpope
Copy link

the consistenid works by introducing a new stablediffusion pipeline
https://github.com/JackAILab/ConsistentID/blob/main/infer.py
need to review other automatic1111 plugins to get my head around this flow.
@Gourieff - does any plugin come to mind?

for my needs - just plugging in to infer.py is fine - just select the SD model - and you can add loras.

# TODO import base SD model and pretrained ConsistentID model
device = "cuda"
base_model_path = "SG161222/Realistic_Vision_V6.0_B1_noVAE"
consistentID_path = "./ConsistentID_model_facemask_pretrain_50w.bin" # pretrained ConsistentID model
# "philz1337/epicrealism" # 
# Gets the absolute path of the current script
script_directory = os.path.dirname(os.path.realpath(__file__))

### Load base model
pipe = ConsistentIDStableDiffusionPipeline.from_pretrained(
    base_model_path, 
    torch_dtype=torch.float16, 
    use_safetensors=False
).to(device)

i had initially used Marlyn Monroe - and the results were quite good
but now jury is out - im using different loras and faces and the results are a bit off - they have plans to increase the input images to an array of faces.
@timmyhk852 - basically the model faceinsight can't handle tongues / mouth opens we need to explore some "photoshopping" cut and paste work with masks - try saving the originally - and then have to merge the two images.

@TimEyerish
Copy link

Bumping detection threshold up over 0.86 has hit and miss after the 3rd or 4th generation in a batch. Mostly it loses the mask when it does work. More consistency over 0.90 but by then there is no mask. Maybe there's something to be tweaked in that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request 🔬 research
Projects
None yet
Development

No branches or pull requests

4 participants