You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the original paper for DialogueRNN, it mentions about using openSMILE toolkit to retrieve the audio and text features, as (300,0) or (600,0( vectors for each utterance.
Could you explain the pre-processing done on the raw mp4 files to get these vectors using the toolkit? I wish to utilize and make an inference system, I have the mp4 files and I wish to use the openSMILE toolkit to get the embeddings..
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am using the MELD dataset and the MELD features from the pickle file under https://github.com/declare-lab/conv-emotion/blob/master/DialogueRNN/DialogueRNN_features.zip
In the original paper for DialogueRNN, it mentions about using openSMILE toolkit to retrieve the audio and text features, as (300,0) or (600,0( vectors for each utterance.
Could you explain the pre-processing done on the raw mp4 files to get these vectors using the toolkit? I wish to utilize and make an inference system, I have the mp4 files and I wish to use the openSMILE toolkit to get the embeddings..
Beta Was this translation helpful? Give feedback.
All reactions