semeval22_structured_sentiment/data at master · pmhalvor/semeval22_structured_sentiment

History

Name		Name	Last commit message	Last commit date
parent directory ..
darmstadt_unis		darmstadt_unis
mpqa		mpqa
multibooked_ca		multibooked_ca
multibooked_eu		multibooked_eu
norec		norec
opener_en		opener_en
opener_es		opener_es
README.md		README.md

README.md

Subtask 1: Monolingual structured sentiment

This track assumes that we train and test on the same languages. For this we will use the following datasets:

norec (Norwegian professional reviews in multiple domains)
multibooked_ca (Catalan hotel reviews)
multibooked_eu (Basque hotel reviews)
opener_en (English hotel reviews)
opener_es (Spanish hotel reviews)
darmstadt_unis (English online university reviews)

Subtask 2: Cross-lingual structured sentiment

This track will instead train only on a high-resource language (English) and test on several languages.

Train:

opener_en

Test:

opener_es
multibooked_ca
multibooked_eu
Several surprise languages that will not be available until the evaluation phase.

That means that the cross-lingual models should be able to adapt quickly to new languages.

Data and formatting

We provide the data in json lines format.

Each line is an annotated sentence, represented as a dictionary with the following keys and values:

'sent_id': unique sentence identifiers
'text': raw text
'opinions': list of all opinions (dictionaries) in the sentence

Additionally, each opinion in a sentence is a dictionary with the following keys and values:

'Source': a list of text and character offsets for the opinion holder
'Target': a list of text and character offsets for the opinion target
'Polar_expression': a list of text and character offsets for the opinion expression
'Polarity': sentiment label ('negative', 'positive', 'neutral')
'Intensity': sentiment intensity ('average', 'strong', 'weak')

{
    "sent_id": "../opener/en/kaf/hotel/english00164_c6d60bf75b0de8d72b7e1c575e04e314-6",

    "text": "Even though the price is decent for Paris , I would not recommend this hotel .",

    "opinions": [
                 {
                    "Source": [["I"], ["44:45"]],
                    "Target": [["this hotel"], ["66:76"]],
                    "Polar_expression": [["would not recommend"], ["46:65"]],
                    "Polarity": "negative",
                    "Intensity": "average"
                  },
                 {
                    "Source": [[], []],
                    "Target": [["the price"], ["12:21"]],
                    "Polar_expression": [["decent"], ["25:31"]],
                    "Polarity": "positive",
                    "Intensity": "average"}
                ]
}

You can import the data by using the json library in python:

>>> import json
>>> with open("data/norec/train.json") as infile:
            norec_train = json.load(infile)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Subtask 1: Monolingual structured sentiment

Subtask 2: Cross-lingual structured sentiment

Data and formatting

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Subtask 1: Monolingual structured sentiment

Subtask 2: Cross-lingual structured sentiment

Data and formatting