In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

In this paper we present the largest visual emotion recognition cross-corpus study to date. We suggested a novel and effective end-to-end emotion recognition framework consisted of two key elements, which are employed for differentfunctions:

(1) the backbone emotion recognition model, which is based on the VGGFace2 (Cao et al., 2018) ResNet50 model (He et al., 2016), trained in a balanced way, and is able to predict emotion from the raw image with high performance;

(2) the temporal block stacked on top of the backbone model and trained with dynamic visual emotional datasets (RAVDESS (Livingstone et al., 2018), CREMA-D (Cao et al., 2014), SAVEE (Haq et al., 2008), RAMAS (Perepelkina et al., 2018), IEMOCAP (Busso et al., 2008), Aff-Wild2 (Kollias et al., 2018)) using the cross-corpus protocol in order to show its reliability and effectiveness.

During the research, the backbone model was fine-tuned on the largest facial expression dataset AffectNet (Mollahosseini et al., 2019) contained static images. Our backbone model achieved an accuracy of 66.4 % on the AffectNet validation set. We achieved 66.5% accuracy using label smoothing technique.

In this GitHub repository we propose for common use (for scientific usage only) the backbone emotion recognition model and 6 LSTM models obtained as a result of leave-one-corpus-out cross-validation experiment.

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Train datasets	Test dataset	Name model	UAR, %
RAVDESS, CREMA-D, SAVEE, RAMAS, IEMOCAP	Aff-Wild2	Aff-Wild2	51,6
Aff-Wild2, CREMA-D, SAVEE, RAMAS, IEMOCAP	RAVDESS	RAVDESS	65,8
Aff-Wild2, RAVDESS, SAVEE, RAMAS, IEMOCAP	CREMA-D	CREMA-D	60,6
Aff-Wild2, RAVDESS, CREMA-D, RAMAS, IEMOCAP	SAVEE	SAVEE	76,1
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, IEMOCAP	RAMAS	RAMAS	44,3
Aff-Wild2, RAVDESS, CREMA-D, SAVEE, RAMAS	IEMOCAP	IEMOCAP	25,1

We provide two static (backbone) models trained using the tensorflow framefork. Both tensorflow models have been converted to TorchScript models. To check four models on the AffectNet validation set, you should run check_tf_torch_models_on_Affectnet.ipynb.

To check static (backbone) models by webcam, you should run check_backbone_models_by_webcam. Webcam result:

We provide six temporal (LSTM) models trained using the tensorflow framefork. All tensorflow models have been converted to TorchScript models. To check backbone and temporal models (CNN+LSTM) by webcam, you should run check_temporal_models_by_webcam. Webcam result:

To predict emotions for all videos in your folder, you should run the command python run.py --path_video video/ --path_save report/.

To demonstrate the functioning of our pipeline, we have run it on several videos from the RAVDESS corpus. The output is:

To get new video file with visualization of emotion prediction for each frame, you should run the command python visualization.py. Below are examples of test videos:

Citation

If you are using EMO-AffectNetModel in your research, please consider to cite research paper. Here is an example of BibTeX entry:

@article{RYUMINA2022,
  title        = {In Search of a Robust Facial Expressions Recognition Model: A Large-Scale Visual Cross-Corpus Study},
  author       = {Elena Ryumina and Denis Dresvyanskiy and Alexey Karpov},
  journal      = {Neurocomputing},
  year         = {2022},
  doi          = {10.1016/j.neucom.2022.10.013},
  url          = {https://www.sciencedirect.com/science/article/pii/S0925231222012656},
}

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
figures		figures
functions		functions
gif		gif
LICENSE		LICENSE
README.md		README.md
check_backbone_models_by_webcam.ipynb		check_backbone_models_by_webcam.ipynb
check_temporal_models_by_webcam.ipynb		check_temporal_models_by_webcam.ipynb
check_tf_torch_models_on_Affectnet.ipynb		check_tf_torch_models_on_Affectnet.ipynb
check_valid_set_Affectnet.ipynb		check_valid_set_Affectnet.ipynb
get_face_area.ipynb		get_face_area.ipynb
run.py		run.py
test_LSTM_RAVDESS.ipynb		test_LSTM_RAVDESS.ipynb
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

functions

functions

gif

gif

LICENSE

LICENSE

README.md

README.md

check_backbone_models_by_webcam.ipynb

check_backbone_models_by_webcam.ipynb

check_temporal_models_by_webcam.ipynb

check_temporal_models_by_webcam.ipynb

check_tf_torch_models_on_Affectnet.ipynb

check_tf_torch_models_on_Affectnet.ipynb

check_valid_set_Affectnet.ipynb

check_valid_set_Affectnet.ipynb

get_face_area.ipynb

get_face_area.ipynb

run.py

run.py

test_LSTM_RAVDESS.ipynb

test_LSTM_RAVDESS.ipynb

visualization.py

visualization.py

Repository files navigation

In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Citation

Links to papers

About

Releases

Packages

Languages

License

ElenaRyumina/EMO-AffectNetModel

Folders and files

Latest commit

History

Repository files navigation

In Search of the Robust Facial Expressions Recognition Model: The Visual Cross-Corpus Study

Table. Results (Unweighted average recall, UAR) of leave-one-corpus-out cross-validation

Citation

Links to papers

About

Topics

Resources

License

Stars

Watchers

Forks

Languages