Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: For what languages do we want to collect data for? #5

Open
omarsar opened this issue Jun 28, 2020 · 18 comments
Open

Q: For what languages do we want to collect data for? #5

omarsar opened this issue Jun 28, 2020 · 18 comments

Comments

@omarsar
Copy link
Member

omarsar commented Jun 28, 2020

Please include the languages that you think we should collect data for. If you have experience working in a specific language, that will be useful and you can propose collecting emotion-related data in that language.

@omarsar
Copy link
Member Author

omarsar commented Jun 28, 2020

I have worked with both English and Spanish. I am also looking at my dialect, Creole.

@fmplaza
Copy link
Collaborator

fmplaza commented Jun 28, 2020

In my PhD I'm working with both English and Spanish too, but I focus more on Spanish as it is my mother tongue. I have experienced in collecting Twitter messages.

@maraimm
Copy link

maraimm commented Jun 29, 2020

I will contribute for Arabic

Maybe it is good to discuss the data collection in the next meeting. Are we creating new resources or make use of the existing ones?

@KhalidAlt
Copy link
Collaborator

I would like to contribute in both Arabic and English.

@omarsar
Copy link
Member Author

omarsar commented Jul 1, 2020

In my PhD I'm working with both English and Spanish too, but I focus more on Spanish as it is my mother tongue. I have experienced in collecting Twitter messages.

@fmplaza do you know of any large-scale dataset for Spanish? I haven't come across any.

@omarsar
Copy link
Member Author

omarsar commented Jul 1, 2020

I will contribute for Arabic

Maybe it is good to discuss the data collection in the next meeting. Are we creating new resources or make use of the existing ones?

@maraimm we are creating new resources. I will emphasize on the data collection part next meeting. Thanks. Arabic data will be great as well. Have you looked around to see if there any existing datasets for emotion recognition?

Maybe @KhalidAlt feel free to share any information you come across.

Let's have some updates on this for our next meeting.

@KwasiArhin
Copy link
Collaborator

I will look up to see if there are any datasets with TWI that i can find.. other I can only participate with English haha

@fmplaza
Copy link
Collaborator

fmplaza commented Jul 2, 2020

In my PhD I'm working with both English and Spanish too, but I focus more on Spanish as it is my mother tongue. I have experienced in collecting Twitter messages.

@fmplaza do you know of any large-scale dataset for Spanish? I haven't come across any.

@omarsar I know three different emotion datasets for Spanish labeled at tweet level but they don't include a large data set:

  1. EmoEvent: A Multilingual Emotion Corpus based on different Events.
    I'm one of the authors of this paper, it has been recently published in the LREC conference.
    The Spanish version of EmoEvent dataset contains 8,409 tweets. Labels: anger, fear, sadness, joy, disgust, surprise, other.

  2. Datasets from SemEval-2018 Task 1: Affect in Tweets AIT dataset comprises the datasets used in two subtasks:

    1. E-c Multi-Label Classification. The dataset contains 7,094 tweets but it is a Multi-Label Classification Dataset. Labels: anger, anticipation, disgust, fear, joy, love, optimism, pessimism, sadness, surprise, trust, neutral, or no emotion.

    2. EI-oc (emotion intensity ordinal classification) and EI-reg (emotion intensity regression) subtasks. The dataset contains 7,953 tweets. Labels: anger, fear, sadness, joy.

@maraimm
Copy link

maraimm commented Jul 13, 2020

For Arabic, there are many efforts and most of them result in small-sized datasets: The following are the datasets I found in the first phase of the search.

@omarsar
Copy link
Member Author

omarsar commented Jul 14, 2020

@maraimm those are great findings. Do you mind giving us a short overview of your findings in the next meeting? It doesn't have to be a long presentation. We would just like an update.

@maraimm
Copy link

maraimm commented Jul 15, 2020

@omarsar Yeah, sure. I am not sure when is the next meeting, date and time?

@omarsar
Copy link
Member Author

omarsar commented Jul 17, 2020

@maraimm it's scheduled for next Saturday (25 July 2020 - 15:00 CEST). I will send the zoom link in our Slack group.

@maraimm
Copy link

maraimm commented Jul 17, 2020

Thanks, @omarsar. Unfortunately, I am not sure I will be able to join the call on Saturday. In case, I was not able, shall I prepare something today to share it with the team tomorrow? (Summary for example)

Will the session be recorded?

@omarsar
Copy link
Member Author

omarsar commented Jul 17, 2020

@maraimm the summary would be excellent. If it's a recording even better then I can share it with the group when we meet again. All sessions are being recorded.

@maraimm
Copy link

maraimm commented Jul 18, 2020

Hi @omarsar,

I emailed you a short recording.

Thanks

@omarsar
Copy link
Member Author

omarsar commented Jul 18, 2020

@maraimm Thank you for the video recording. I have added it to our meeting notes.

@cahya-wirawan
Copy link

cahya-wirawan commented Jul 25, 2020

Hi,
Sorry for coming late. I found a paper from late 2018 about emotion classification on indonesian twitter: https://www.researchgate.net/publication/330674171_Emotion_Classification_on_Indonesian_Twitter_Dataset
They collected and annotated 7500 tweets with 5 emotions: love, joy, anger, sadness, and fear.

@rfazeli
Copy link

rfazeli commented Sep 10, 2020

I can collect data for Persian

@manisnesan manisnesan removed their assignment Apr 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment