You can explore 5 types of model errors:
- Mistaken recognition - one type of entity is recognized as another
- Entity is not recognized
- Misspelling - origin text doesn't contain the predicted entity
- Overpredictiton
- Conflicting predictions
import pandas as pd
prediction = pd.read_json('prediction.json')
prediction.head(3)
id | extracted | target | |
---|---|---|---|
0 | 8_1443820.tsv | {'Drugname': [], 'Drugclass': [], 'Drugform': ['таблетки'], 'DI': [], 'ADR': [], 'Finding': []} | {'Drugname': [], 'Drugclass': [], 'Drugform': ['таблетки'], 'DI': [], 'ADR': [], 'Finding': []} |
1 | 1_2555494.tsv | {'Drugname': ['Римантадин'], 'Drugclass': [], 'Drugform': ['сиропе'], 'DI': [], 'ADR': [], 'Finding': []} | {'Drugname': ['Римантадин'], 'Drugclass': [], 'Drugform': ['сиропе'], 'DI': [], 'ADR': [], 'Finding': []} |
2 | 1_618967.tsv | {'Drugname': [], 'Drugclass': [], 'Drugform': [], 'DI': [], 'ADR': [], 'Finding': []} | {'Drugname': [], 'Drugclass': [], 'Drugform': [], 'DI': [], 'ADR': [], 'Finding': []} |
from error_analysis.utils import aggregate_errors_from_dataframe
aggregate_errors_from_dataframe(prediction)
>>> {'total': 1443,
'fp': 282,
'fn': 373,
'mistaken_recognitions': defaultdict(list,
{'Finding': [('нервные срывы', 'ADR', '4_2671902.tsv'),
('профилактике', 'DI', '0_1484511.tsv'),
('гриппа', 'DI', '0_1484511.tsv'),
('дифтерия', 'DI', '8_2394715.tsv'),
('столбняк', 'DI', '8_2394715.tsv'),
('коклюш', 'DI', '8_2394715.tsv'),
('бородавка на подушечке указательного пальца',
'DI',
'6_1410682.tsv'), ...
mistaken_recognition
was created according to the scheme below:
mistaken_recognitions[real_target].append((text, predicted_target, sample_id))
Error below
{'Finding': [('нервные срывы', 'ADR', '4_2671902.tsv')]}
corresponds to
predicted: {'Drugname': [],
'Drugclass': [],
'Drugform': [],
'DI': [],
'ADR': ['нервные срывы'],
'Finding': []}
target: {'Drugname': [],
'Drugclass': [],
'Drugform': [],
'DI': [],
'ADR': [],
'Finding': ['нервные срывы']}
from error_analysis.utils import plot_confusion_matrix_from_dataframe
plot_confusion_matrix_from_dataframe(prediction)
In percent (no. mistaken recognitions / no. of entities of this type):
plot_confusion_matrix_from_dataframe(prediction, in_percent=True)
Input text: "Paris Hilton visits Paris"
Predictions: {'LOC'['Paris']}
Which of the two occurrences of the word Paris
correspods to LOC
(location)?
LLMs for ner task usually don't generate position of exctracted entity in the origin text. Use aggregate_conflicting_predictions
from error_analysis.utils
to analyze this type of error.
from error_analysis.utils import aggregate_conflicting_predictions
conflicting_predictions = aggregate_conflicting_predictions(extracted, texts)
Output for example above:
>>> {'total': 1, 'errors_by_sample_id': {0: [('Paris', 1, 2, 'LOC')]}}
No. of occurrences < Sum No. of extracted
Input text: "Paris Hilton"
Predictions: {'LOC'['Paris'], 'PER': ['Paris']}
Output: {'total': 1, 'errors_by_sample_id': {0: [('Paris', 2, 1)]}}
Example for RuDReC dataset
{'total': 8,
'errors_by_sample_id': defaultdict(list,
{'2_2527424.tsv': [('антибиотик', 2, 1)],
'4_880567.tsv': [('капсула', 2, 1)],
'1_6275749.tsv': [('темп', 1, 2, 'Drugname')],
'1_2719942.tsv': [('антибиотик', 1, 3, 'Drugclass')],
'3_269906.tsv': [('насморк', 1, 2, 'DI')],
'3_2519035.tsv': [('таблеток', 1, 2, 'Drugform')],
'3_877244.tsv': [('стоматит', 1, 2, 'DI')],
'4_614513.tsv': [('капсул', 1, 2, 'Drugform')]})}