New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Long text evaluation parameters are not clear #1035
Comments
For optimal performance, it is advisable to configure the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
python 3.10.1
OpenCompass 0.2.3
vllm 0.2.3
Reproduces the problem - code/configuration sample
configs/models/chatglm/vllm_chatglm2_6b_32k.py
from opencompass.models import VLLM
models = [
dict(
type=VLLM,
abbr='chatglm2-6b-32k-vllm',
path='THUDM/chatglm2-6b-32k',
max_out_len=512,
max_seq_len=4096,
batch_size=32,
generation_kwargs=dict(temperature=0),
run_cfg=dict(num_gpus=1, num_procs=1),
)
]
Reproduces the problem - command or script
python run.py --model vllm_chatglm2_6b_32k --datasets longbench leval
Reproduces the problem - error message
The difference between the evaluation result parameters and the document long text evaluation is about 20 points, The score for the document can not be reproduced.
Other information
No response
The text was updated successfully, but these errors were encountered: