Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

We aim to answer four main questions in this work. (1) How does scaling the size of LMs affect their abilities to understand the concept of negation? (2) Are LMs explicitly trained to follow instructions (T0, InstructGPT) better at understanding negated instructions? (3) Can In-Context Learning or Fine-tuning help mitigate this problem? (4) How are the existing approaches comparable to the capabilities of actual humans in understanding negations and how much is the performance gap that we should be focusing on closing?

The answers can be found in our case study paper Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts! Come check it out! :)

Dependencies

You can use pip install -r requirements.txt to install the required libraries.

OpenAI Beta

To use GPT-3 you must use OpenAI Beta, which is limited access. You can apply for access here. Once you have access you will need to point the score.py to your API key with the --key argument or put your key in api.key which is the default path.

Running Scorers

Once you have a dataset downloaded, running all the zero-shot scoring strategies at once is as simple as:

CUDA_VISIBLE_DEVICES=[gpu devices ids] python score.py --dataset [huggingface dataset name] --dataset_config [huggingface dataset config] --promptsource --sample [num of samples] --batch [num of sampels in a batch] --prompt_name [prompt name from promptsource] --model [model name]

For example, running inferece of OPT-125m on the ARC-Easy datasets can be done as follows:

CUDA_VISIBLE_DEVICES=0 python score.py --dataset ai2_arc --dataset_config ARC-Easy --promptsource  --use_csv --sample 300 --batch 8 --prompt_name "q&a" --model opt-125m

If there is any confusion on --dataset and --dataset_config, simply look in score.py to see how dataset selection works. --model is the name of OPT, T0, GPT-2 or GPT-3 model e.g. xl, davinci, etc. Check the score.py for the full list of supported LMs. To speed things up you can use a larger --batch if you have enough GPU memory. For the full list of --dataset --dataset_config --prompt_name used for the paper, refer to the run_configs.txt file

Other

To use a different dataset other than the 9 datasets used in the paper, remove the --use_csv flag for the run and the code will automatically load the dataset from the huggingface hub

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data/final_eval		data/final_eval
data_downloaders		data_downloaders
promptsource		promptsource
t0		t0
.gitignore		.gitignore
DATA_README.md		DATA_README.md
README.md		README.md
data_loaders.py		data_loaders.py
eval_bash.sh		eval_bash.sh
figure1.png		figure1.png
requirements.txt		requirements.txt
run_configs.txt		run_configs.txt
score.py		score.py
utils.py		utils.py

joeljang/negated-prompts-for-llms

Folders and files

Latest commit

History

Repository files navigation

Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

Dependencies

OpenAI Beta

Running Scorers

Other

About

Topics

Resources

Stars

Watchers

Forks

Languages