[ACL 2024] Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
benchmark
framework
evaluation
dataset
hallucination
aquila
unconstrained
baichuan
gpt-3
hallucinations
gpt-4
large-language-models
llm
chatgpt
chatglm
internlm
qwen
hallucination-detection
truthfulqa
acl2024
-
Updated
May 21, 2024 - Python