Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Howo to recognize more than one face on one image? #216

Open
bit-scientist opened this issue Dec 7, 2023 · 1 comment
Open

Howo to recognize more than one face on one image? #216

bit-scientist opened this issue Dec 7, 2023 · 1 comment

Comments

@bit-scientist
Copy link

An inference pipeline given in infer deals with one face per image. I have several images where each image has many people. I would like to recognize all unique persons individually on all images.
I have made an exemplanary case below:

(images belong to their respective owners)

image

Here, I would like to assign IDs (0 to 5) to each character on these given images. I have been thinking of how I can accomplish this.

For now, I could think of one way where I loop all images through mtcnn (keep_all=True) to get face crops (6 per image), calculate their embeddings with resnet, then calculate their distance matrix using:

dists = [[(e1 - e2).norm().item() for e2 in embeddings] for e1 in embeddings]
print(pd.DataFrame(dists, columns=names, index=names))

But I don't know what to do with these numbers afterwards.
Since each face crop (for now) is considered unique, I will get 36x36 matrix, right?
How to proceed from here?

I hope to get some help as I am quite new to face recognition domain. Thanks!

@Aldhanekaa
Copy link

Hello @bit-scientist, I just figured out on how to recognise more than one face using this Pytorch Facenet's Face Recognition approach.

As the examples gives in infer.py, it gets individual face embeddings from each person image and save it inside list as the code below :

aligned = []
names = []
for x, y in loader:
    x_aligned, prob = mtcnn(x, return_prob=True)
    if x_aligned is not None:
        print('Face detected with probability: {:8f}'.format(prob))
        aligned.append(x_aligned)
        names.append(dataset.idx_to_class[y])

aligned = torch.stack(aligned).to(device)
embeddings = resnet(aligned).detach().cpu()

Then, you can save both embeddings and names as model file :

data = [embeddings, names] 
torch.save(data, 'Facenet Pytorch Finetuning embeddings.pt') # saving data.pt file

I tried to classify 5 different people with dozens of image on each people and it only took 1.5 mb size of model

Next up, here is how to use the model;

# importing libraries

from facenet_pytorch import MTCNN, InceptionResnetV1
import torch
from torchvision import datasets
from torch.utils.data import DataLoader
from PIL import Image
import cv2
import time
import os


"""Initializing global variables"""

def get_device():
    """Returns the best available device for PyTorch."""

    if torch.backends.mps.is_available():
        device = torch.device("mps")
    elif torch.cuda.is_available():
        device = torch.device("cuda:0")
    else:
        device = torch.device("cpu")

    return device
device = get_device()
load_data = torch.load('Facenet Pytorch Finetuning embeddings.pt',map_location=device) 
embedding_list = load_data[0] 
name_list = load_data[1] 
print(embedding_list.shape)

resnet = InceptionResnetV1(pretrained='vggface2',).eval()
# resnet.to(device)
mtcnn = MTCNN(image_size=160,margin=0,min_face_size=20,
    thresholds=[0.6,0.7,0.7],factor=0.709,post_process=True,keep_all=True ) # keep_all=True


cam = cv2.VideoCapture(0) 

while True:
    ret, frame = cam.read()
    if not ret:
        print("fail to grab frame, try again")
        break
        
    img = Image.fromarray(frame)
    img_cropped_list, prob_list = mtcnn(img, return_prob=True) 
    
    if img_cropped_list is not None:
        boxes, _ = mtcnn.detect(img)
                
        for i, prob in enumerate(prob_list):
            if prob>0.90:
                emb = resnet(img_cropped_list[i].unsqueeze(0)).detach() 
                
                dist_list = [] # list of matched distances, minimum distance is used to identify the person
                
                for idx, emb_db in enumerate(embedding_list):
                    dist = torch.dist(emb.to(device=device), emb_db).item()
                    dist_list.append(dist)

                min_dist = min(dist_list) # get minumum dist value
                min_dist_idx = dist_list.index(min_dist) # get minumum dist index
                name = name_list[min_dist_idx-1] # get name corrosponding to minimum dist
                
                box = boxes[i] 
                original_frame = frame.copy() # storing copy of frame before drawing on it

                if min_dist<0.65:
                    min_dist = 1 - min_dist
                    frame = cv2.putText(frame, name+' '+str(min_dist), (int(box[0]),int(box[1])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0),1)
                    print(f"{name} {min_dist}")
                else:
                    frame = cv2.putText(frame, "Unknown", (int(box[0]),int(box[1])), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0),1)
                
                frame = cv2.rectangle(frame, (int(box[0]),int(box[1])) , (int(box[2]),int(box[3])), (255,0,0), 2)

    cv2.imshow("IMG", frame)
    if cv2.waitKey(1) == ord('q'):
        break        
        
        
cam.release()
cv2.destroyAllWindows()

Thanks! Hope it helps 🤠

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants