CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation

Introduction

CTRLEval is an unsupervised reference-free metric to evaluate the coherence, consistency, and attribute relevance in controlled text generation tasks. You can read our paper for more details. This project is a PyTorch implementation of our work.

Dependencies

Python 3.7
NumPy
PyTorch 1.4.0
Transformers (Huggingface) 4.4.2
NLTK 3.5
SciPy
tqdm

Data

We build an evaluation set for controlled text generation, which is stored in ./data. The data files data_senti.txt and data_topic.txt which are used in sentiment-controlled and topic-controlled text generation, respectively, have the same format. Specifically, attribute labels, content prefixes, generated texts, coherence scores, consistency scores, and attribute relevance scores are separated by \t in each line.

Prompt and Verbalizer

For attribute relevance, we design additional prompts and verbalizers, which are stored in ./prompt. In prompt_senti.txt and prompt_topic.txt, each line contains a prompt including two special placeholders, i.e., <gen_result> (for generated texts) and <mask_token> (for mask tokens). In verbal_senti.txt and verbal_topic.txt, the first line (separated by \t) contains the name of each attribute label, where each of the following lines indicates a sequence of label words. You can refer to Section 4.2 and Appendix A in our paper for more details.

Quick Start for Evaluation with CTRLEval

We have already built main.py to reproduce our overall results. You can directly run this code to obtain the evaluation results:

python main.py

You can also refer to this code to evaluate your own generated texts with CTRLEval. We give an example (note: you should first enter the directory of CTRLEval):

>>> from ctrleval import CTRLEval
>>> task = 'senti' # evaluation for sentiment-controlled text generation
>>> scorer = CTRLEval(iwf_dir='iwf_full.txt', prompt_dir='./prompt/prompt_{}.txt'.format(task), verbal_dir='./prompt/verbal_{}.txt'.format(task), model_name_or_path='google/pegasus-large')
>>> data = ['The book is about NLP. It depicts fancy models.']
>>> prefix = ['The book']
>>> label = ['positive']
>>> scorer.score(aspect='coh', data=data, batch_size=1) # evaluation of coherence
[-4.723989181567967]
>>> scorer.score(aspect='cons', data=data, prefix=prefix, batch_size=1) # evaluation of consistency
[-7.385560478830747]
>>> scorer.score(aspect='ar', data=data, label=label, batch_size=1) # evaluation of attribute relevance
array([0.8916258], dtype=float32)

Disclaimer

The evaluation set aims to facilitate the research for evaluating controlled text generation. The texts in this evaluation set are generated from four representative models including CTRL, PPLM, GeDi, and CoCon. We directly use the hyper-parameters of original papers to generate samples. Although we asked annotators to check whether there are inappropriate contents in the data , there is no guarantee that all the inappropriate samples have been filtered out. All the contents contained in this evaluation set do not represent the authors' opinion.

Citation

@inproceedings{ke-etal-2022-ctrleval,
    title = "{CTRLE}val: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation",
    author = "Ke, Pei  and Zhou, Hao  and Lin, Yankai  and Li, Peng  and Zhou, Jie  and Zhu, Xiaoyan  and Huang, Minlie",
    booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    pages = "2306--2319",
}

Please kindly cite our paper if this paper and the codes are helpful.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
prompt		prompt
README.md		README.md
ctrleval.py		ctrleval.py
iwf_full.txt		iwf_full.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

prompt

prompt

README.md

README.md

ctrleval.py

ctrleval.py

iwf_full.txt

iwf_full.txt

main.py

main.py

Repository files navigation

CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation

Introduction

Dependencies

Data

Prompt and Verbalizer

Quick Start for Evaluation with CTRLEval

Disclaimer

Citation

About

Releases

Packages

Languages

thu-coai/CTRLEval

Folders and files

Latest commit

History

Repository files navigation

CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation

Introduction

Dependencies

Data

Prompt and Verbalizer

Quick Start for Evaluation with CTRLEval

Disclaimer

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages