How many shots are used for evaluation of HumanEval? #39

zhimin-z · 2023-12-11T19:03:41Z

Is that 0-shot and 3-shot CoT?

EthanC111 · 2023-12-15T08:37:05Z

Hi @zhimin-z . If you are referring to the Self-Check baselines, then yes. Please refer to https://github.com/GAIR-NLP/factool/blob/main/factool/utils/prompts/self_check.yaml

zhimin-z · 2023-12-15T20:09:59Z

Hi @zhimin-z . If you are referring to the Self-Check baselines, then yes. Please refer to https://github.com/GAIR-NLP/factool/blob/main/factool/utils/prompts/self_check.yaml

Thanks for your quick replies
Are these evaluation results from 0-shot or 3-shot CoT?

EthanC111 · 2023-12-16T02:05:47Z

Hi @zhimin-z. These results were evaluated neither using 0-shot nor 3-shot. The results were evaluated using FacTool.
If you are referring to how the individual modules in FacTool are implemented, please refer to our prompts at https://github.com/GAIR-NLP/factool/tree/main/factool/utils/prompts

For some modules in certain tasks we did provide some demonstrations (e.g. query generation for KB-QA), while others we use 0-shot prompting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How many shots are used for evaluation of HumanEval? #39

How many shots are used for evaluation of HumanEval? #39

zhimin-z commented Dec 11, 2023

EthanC111 commented Dec 15, 2023 •

edited

Loading

zhimin-z commented Dec 15, 2023

EthanC111 commented Dec 16, 2023 •

edited

Loading

How many shots are used for evaluation of HumanEval? #39

How many shots are used for evaluation of HumanEval? #39

Comments

zhimin-z commented Dec 11, 2023

EthanC111 commented Dec 15, 2023 • edited Loading

zhimin-z commented Dec 15, 2023

EthanC111 commented Dec 16, 2023 • edited Loading

EthanC111 commented Dec 15, 2023 •

edited

Loading

EthanC111 commented Dec 16, 2023 •

edited

Loading