GitHub - pmhalvor/semeval22_structured_sentiment: SemEval-2022 Shared Task 10: Structured Sentiment Analysis

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

This Github repository hosts the data and baseline models for the SemEval-2022 shared task 10 on structured sentiment. In this repository you will find the datasets, baselines, and other useful information on the shared task.

Table of contents:

Problem description
Subtasks
1. Monolingual
  1. Data
2. Cross-lingual
Data format
Resources
Submission via Codalab
Baselines
Frequently Asked Questions
Task organizers

Problem description

The task is to predict all structured sentiment graphs in a text (see the examples below). We can formalize this as finding all the opinion tuples O = O_i,...,O_n in a text. Each opinion O_i is a tuple (h, t, e, p)

where h is a holder who expresses a polarity p towards a target t through a sentiment expression e, implicitly defining the relationships between the elements of a sentiment graph.

The two examples below (first in English, then in Basque) give a visual representation of these sentiment graphs.

Participants can then either approach this as a sequence-labelling task, or as a graph prediction task.

Subtasks

Monolingual

This track assumes that you train and test on the same language. Participants will need to submit results for seven datasets in five languages. The evaluation will report the Sentiment F₁ for each dataset, as well as the average of all 7. The winning submission will be the one that has the highest average Sentiment F₁.

The datasets can be found in the data directory.

Data

Dataset	Language	# sents	# holders	# targets	# expr.
NoReC_fine	Norwegian	11437	1128	8923	11115
MultiBooked_eu	Basque	1521	296	1775	2328
MultiBooked_ca	Catalan	1678	235	2336	2756
OpeNER_es	Spanish	2057	255	3980	4388
OpeNER_en	English	2494	413	3850	4150
MPQA	English
Darmstadt_unis	English	2803	86	1119	1119

Cross-lingual

This track will explore how well models can generalize across languages. The test data will be the MultiBooked Datasets (Catalan and Basque) and the OpeNER Spanish dataset. For training, you can use any of the other datasets, as well as any other resource that does not come directly from the test datasets.

Data format

We provide the data in json lines format.

Each line is an annotated sentence, represented as a dictionary with the following keys and values:

'sent_id': unique sentence identifiers
'text': raw text version of the previously tokenized sentence
opinions': list of all opinions (dictionaries) in the sentence

Additionally, each opinion in a sentence is a dictionary with the following keys and values:

'Source': a list of text and character offsets for the opinion holder
'Target': a list of text and character offsets for the opinion target
'Polar_expression': a list of text and character offsets for the opinion expression
'Polarity': sentiment label ('negative', 'positive', 'neutral')
'Intensity': sentiment intensity ('average', 'strong', 'weak')

{
    "sent_id": "../opener/en/kaf/hotel/english00164_c6d60bf75b0de8d72b7e1c575e04e314-6",

    "text": "Even though the price is decent for Paris , I would not recommend this hotel .",

    "opinions": [
                 {
                    "Source": [["I"], ["44:45"]],
                    "Target": [["this hotel"], ["66:76"]],
                    "Polar_expression": [["would not recommend"], ["46:65"]],
                    "Polarity": "negative",
                    "Intensity": "average"
                  },
                 {
                    "Source": [[], []],
                    "Target": [["the price"], ["12:21"]],
                    "Polar_expression": [["decent"], ["25:31"]],
                    "Polarity": "positive",
                    "Intensity": "average"}
                ]
}

You can import the data by using the json library in python:

>>> import json
>>> with open("data/norec/train.json") as infile:
            norec_train = json.load(infile)

Resources:

The task organizers provide training data, but participants are free to use other resources (word embeddings, pretrained models, sentiment lexicons, translation lexicons, translation datasets, etc). We do ask that participants document and cite their resources well.

We also provide some links to what we believe could be helpful resources:

Submission via Codalab

Submissions will be handled through our codalab competition website.

Baselines

The task organizers provide two baselines: one that takes a sequence-labelling approach and a second that converts the problem to a dependency graph parsing task.

Frequently asked questions

Q: How do I participate?

A: Sign up at our codalab website, download the data, train the baselines and submit the results to the codalab website.

Task organizers

Corresponding organizers
- Jeremy Barnes: contact for info on task, participation, etc. ([email protected])
- Andrey Kutuzov: [email protected]
Organizers
- Jan Buchman
- Laura Ana Maria Oberländer
- Enrica Troiano
- Rodrigo Agerri
- Lilja Øvrelid
- Erik Velldal
- Stephan Oepen

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
baselines		baselines
codalab		codalab
data		data
evaluation		evaluation
figures		figures
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

Problem description

Subtasks

Monolingual

Data

Cross-lingual

Data format

Resources:

Submission via Codalab

Baselines

Frequently asked questions

Task organizers

About

Releases

Packages

Languages

pmhalvor/semeval22_structured_sentiment

Folders and files

Latest commit

History

Repository files navigation

SemEval-2022 Shared Task 10: Structured Sentiment Analysis

Problem description

Subtasks

Monolingual

Data

Cross-lingual

Data format

Resources:

Submission via Codalab

Baselines

Frequently asked questions

Task organizers

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages