[Feature]: Fix the realistic person's tongue #439

timmyhk852 · 2024-04-30T01:10:54Z

Feature description

The tongue cannot be generated by using this extension.
So the person cannot tongue out.

johndpope · 2024-04-30T01:52:58Z

this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask
if you click Face Mask Correction to see difference.
#292

timmyhk852 · 2024-04-30T10:14:39Z

this PR I drafted some time back - the plan was to ignore the bottom half of face in the mask if you click Face Mask Correction to see difference. #292

same after ticking Face Mask Correction,

johndpope · 2024-04-30T10:33:42Z

@timmyhk852 - I tested with tongue out - and it failed. I have had results with mouth open - though it's not good enough.
was looking at this again today - maybe can use mediapipe to create the mask and cut the bottom of the mask to the top lips. but results will probably still disappoint.
I'm wondering if the insightface / onnx / mapper backing model is simply inadequate or it needs another model to help here.
The reactor / roop stuff works fantastic but it fails miserably in this use case.
I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.

for the simpler cosmetic approach - @Gourieff -did you use mediapipe
not sure how to articulate this - but I'm not clear on how I could pass mediapipe coordinates to create a different mask

  # Process the image to detect face landmarks.

    self.mp_face_mesh = mp.solutions.face_mesh
        results = self.mp_face_mesh.process(image_rgb)

        img_h, img_w, _ = image.shape
        face_3d = []
        face_2d = []


        if results.multi_face_landmarks:       
            for face_landmarks in results.multi_face_landmarks:

https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941

apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)

this is advance detection of lips from a project I was reviewing the other week
https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133

it's possible I could have this contained and wired up to just do this augmentation of mask

Gourieff · 2024-04-30T12:03:29Z

I'm wondering if the inswapper_128.onnx could be translated back to pytorch - and the result of faceswap could be somehow passed back into pipeline.like make the onnx model become a lora of sorts operating in the latent space.

I've been thinking about this as well... We need to make a "reverse engineering" of the inswapper model to improve it and make a new model with 256 or 512 target-input (it would be great for Community to have really free-licensed model with HQ output) and maybe with an additional masking input or as you suggested in the way as it could be a Lora

About masking of parts... There is smth like this in Facefusion, I've not tested it with tongues, but it works with lips and teeth
So I have in plans to implement such segmenting for ReActor in future updates, just need to find free time for this

Gourieff · 2024-04-30T12:08:38Z

https://github.com/johndpope/Emote-hack/blob/main/Net.py#L941

apply_face_mask_with_exclusion(swapped_image=swapped_image,target_image=result,target_face=target_face,entire_mask_image=entire_mask_image,MEDIA_PIPE_LANDMARK_MASK_WITH_HEAD_CUT_TO_TOP_LIPS)

this is advance detection of lips from a project I was reviewing the other week
https://github.com/Zejun-Yang/AniPortrait/blob/cb86caa741d6ab1e119ea7ac2554eb28aabc631b/src/utils/face_landmark.py#L133

it's possible I could have this contained and wired up to just do this augmentation of mask

Hm... Rather interesting... 🧐

johndpope · 2024-05-01T03:05:06Z

somewhat related - https://github.com/AtlantixJJ/PVA-CelebAHQ-IDI

UPDATE
https://github.com/JackAILab/ConsistentID

johndpope · 2024-05-02T21:48:03Z

had a play with ConsistentID - IT WORKS!!!!
after some faffing around - JackAILab/ConsistentID#18

timmyhk852 · 2024-05-03T16:18:21Z

had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18

I dont understand...so is it possible for the swapped face to tongue out now?

Gourieff · 2024-05-04T08:50:31Z

had a play with ConsistentID - IT WORKS!!!! after some faffing around - JackAILab/ConsistentID#18

Nice! I'll take a look next week
Maybe we can combine your PR with this feature, it would be super-good

johndpope · 2024-05-04T10:31:01Z

the consistenid works by introducing a new stablediffusion pipeline
https://github.com/JackAILab/ConsistentID/blob/main/infer.py
need to review other automatic1111 plugins to get my head around this flow.
@Gourieff - does any plugin come to mind?

for my needs - just plugging in to infer.py is fine - just select the SD model - and you can add loras.

# TODO import base SD model and pretrained ConsistentID model
device = "cuda"
base_model_path = "SG161222/Realistic_Vision_V6.0_B1_noVAE"
consistentID_path = "./ConsistentID_model_facemask_pretrain_50w.bin" # pretrained ConsistentID model
# "philz1337/epicrealism" # 
# Gets the absolute path of the current script
script_directory = os.path.dirname(os.path.realpath(__file__))

### Load base model
pipe = ConsistentIDStableDiffusionPipeline.from_pretrained(
    base_model_path, 
    torch_dtype=torch.float16, 
    use_safetensors=False
).to(device)

i had initially used Marlyn Monroe - and the results were quite good
but now jury is out - im using different loras and faces and the results are a bit off - they have plans to increase the input images to an array of faces.
@timmyhk852 - basically the model faceinsight can't handle tongues / mouth opens we need to explore some "photoshopping" cut and paste work with masks - try saving the originally - and then have to merge the two images.

TimEyerish · 2024-05-07T16:33:02Z

Bumping detection threshold up over 0.86 has hit and miss after the 3rd or 4th generation in a batch. Mostly it loses the mask when it does work. More consistency over 0.90 but by then there is no mask. Maybe there's something to be tweaked in that?

timmyhk852 added enhancement New feature or request new labels Apr 30, 2024

Gourieff added 🔬 research and removed new labels Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Fix the realistic person's tongue #439

[Feature]: Fix the realistic person's tongue #439

timmyhk852 commented Apr 30, 2024

johndpope commented Apr 30, 2024

timmyhk852 commented Apr 30, 2024

johndpope commented Apr 30, 2024

Gourieff commented Apr 30, 2024

Gourieff commented Apr 30, 2024

johndpope commented May 1, 2024 •

edited

johndpope commented May 2, 2024

timmyhk852 commented May 3, 2024

Gourieff commented May 4, 2024

johndpope commented May 4, 2024

TimEyerish commented May 7, 2024

[Feature]: Fix the realistic person's tongue #439

[Feature]: Fix the realistic person's tongue #439

Comments

timmyhk852 commented Apr 30, 2024

Feature description

johndpope commented Apr 30, 2024

timmyhk852 commented Apr 30, 2024

johndpope commented Apr 30, 2024

Gourieff commented Apr 30, 2024

Gourieff commented Apr 30, 2024

johndpope commented May 1, 2024 • edited

johndpope commented May 2, 2024

timmyhk852 commented May 3, 2024

Gourieff commented May 4, 2024

johndpope commented May 4, 2024

TimEyerish commented May 7, 2024

johndpope commented May 1, 2024 •

edited