Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random assertion errors due to evaluate_nochat #1600

Open
Blacksuan19 opened this issue May 6, 2024 · 13 comments
Open

random assertion errors due to evaluate_nochat #1600

Blacksuan19 opened this issue May 6, 2024 · 13 comments

Comments

@Blacksuan19
Copy link
Contributor

Blacksuan19 commented May 6, 2024

when using the docker image, I randomly get assertion errors when making a request from the gradio UI, sometimes it works and sometimes it does not, here is the raised error.

this occurs with the latest two docker images tagged 4059a2c9 and 7297519c.

Full error
thread exception: Traceback (most recent call last):                                                                                                                                                                                         File "/workspace/src/utils.py", line 502, in run                                                                                                                                                                                             self._return = self._target(*self._args, **self._kwargs)                                                                                                                                                                                 File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call
    output, extra_return_dict = self.combine_docs(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate
    return self.llm.generate_prompt(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/workspace/src/gpt_langchain.py", line 2339, in _generate
    rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate
    responses = self.pipeline(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__
    outputs = list(final_iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
 File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/workspace/src/h2oai_pipeline.py", line 271, in _forward
    return self.__forward(model_inputs, **generate_kwargs)
  File "/workspace/src/h2oai_pipeline.py", line 309, in __forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._greedy_search(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search
    outputs = self(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward
    outputs = self.model(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 127, in forward
    h, _, _ = layer(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 123, in forward
    attn_output, _, past_key_value = self.attn.forward(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 235, in forward
    xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 60, in forward
    freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
AssertionError

make stop: Traceback (most recent call last):
  File "/workspace/src/utils.py", line 502, in run
    self._return = self._target(*self._args, **self._kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call
    output, extra_return_dict = self.combine_docs(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate
    return self.llm.generate_prompt(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/workspace/src/gpt_langchain.py", line 2339, in _generate
    rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate
    responses = self.pipeline(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__
    outputs = list(final_iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/workspace/src/h2oai_pipeline.py", line 271, in _forward
    return self.__forward(model_inputs, **generate_kwargs)
  File "/workspace/src/h2oai_pipeline.py", line 309, in __forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._greedy_search(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search
    outputs = self(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward
    outputs = self.model(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 127, in forward
    h, _, _ = layer(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 123, in forward
    attn_output, _, past_key_value = self.attn.forward(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 235, in forward
    xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 60, in forward
    freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
AssertionError

hit stop
evaluate_nochat exception: : ('', '', '', True, 'open_chat', "{   'PreInput': None,\n    'PreInstruct': 'GPT4 User: ',\n    'PreResponse': 'GPT4 Assistant:',\n    'botstr': 'GPT4 Assistant:',\n    'can_handle_system_prompt': False,\n
  'chat_sep': '<|end_of_turn|>',\n    'chat_turn_sep': '<|end_of_turn|>',\n    'generates_leading_space': False,\n    'humanstr': 'GPT4 User: ',\n    'promptA': '',\n    'promptB': '',\n    'system_prompt': '',\n    'terminate_response
': ['GPT4 Assistant:', '<|end_of_turn|>']}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0.0, True, '', '', 'UserData', True, 'Query', [], 10, True, 512, 'Relevant', ['/workspace/user_path/9b999f43-2ade-4148-97cf-d2448125168c/r
es/e6a9ce98_user_upload_protocols.pdf'], [], [], [], [], 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'According to only the information in the docume
nt sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text.', 'Using only the inform
ation in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to use that answer as
better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is strictly valid JSON text.', 'Ensure your entir
e response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier.   Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanumerics, underscores
, or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['OCR', 'DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], [], '', False, '[]', '[]', 'best_near_prompt', 51
2, -1.0, -1.0, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, [], 1.0, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cuda', 'base_model': 'TheBloke/openchat_3.5-16k-
AWQ', 'tokenizer_base_model': '', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'open_chat', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': 'GPT4 User: ', 'PreInput': None, 'PreResponse': 'GPT4 Assistant:
', 'terminate_response': ['GPT4 Assistant:', '<|end_of_turn|>'], 'chat_sep': '<|end_of_turn|>', 'chat_turn_sep': '<|end_of_turn|>', 'humanstr': 'GPT4 User: ', 'botstr': 'GPT4 Assistant:', 'generates_leading_space': False, 'system_promp
t': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, '90711427-650d-458e-ac69-bc1629b452be', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'UserData'], 'langchain_mode_paths': {'Us
erData': '/workspace/user_path/'}, 'langchain_mode_types': {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '',
 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'Upgrade', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/
537.36 Edg/124.0.0.0', 'upgrade': 'websocket', 'origin': 'http://0.0.0.0:7850', 'sec-websocket-version': '13', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8p
y5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZDSCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43
hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yOZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlf
gUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-token-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ', 'sec-websoc
ket-key': 'CS4lXFJi7AM2jwkdWyhKyQ==', 'sec-websocket-extensions': 'permessage-deflate; client_max_window_bits', 'host2': '14.1.206.49', 'picture': 'None'}, {}, [['summarize the given document', '']])
Docker command

used command to run h2ogpt

export CONTEXT_LENGTH=16384
export IMAGE_TAG=7297519c

docker run \
 --init \
--gpus all \
--runtime=nvidia \
--shm-size=2g \
-p 7850:7860 \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u $(id -u):$(id -g) \
gcr.io/vorvan/h2oai/h2ogpt-runtime:$IMAGE_TAG /workspace/generate.py \
--page_title="GenNet AI" \
--favicon_path="/workspace/assets/gennet_logo.svg" \
--height=700 \
--gradio_size="medium" \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--compile_model=True \
--use_cache=True \
--use_flash_attention_2=True \
--attention_sinks=True \
--sink_dict="{'num_sink_tokens': 4, 'window_length': $CONTEXT_LENGTH }" \
--save_dir='/workspace/save/' \
--user_path='/workspace/user_path/' \
--langchain_mode="UserData" \
--langchain_modes="['UserData', 'LLM']" \
--visible_langchain_actions="['Query']" \
--visible_langchain_agents="[]" \
--use_llm_if_no_docs=True \
--max_seq_len=$CONTEXT_LENGTH \
--enable_ocr=True \
--enable_tts=False \
--enable_stt=False
@pseudotensor
Copy link
Collaborator

Hi, I see the issue is from awq:

File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])

It's likely a bug in awq, perhaps when combined with attention sinks, flash attention, or compile of model. While we expose those options from transformers, I cannot be sure arbitrary combinations work.

If I run:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--compile_model=True \
--use_cache=True \
--use_flash_attention_2=True \
--attention_sinks=True \
--sink_dict="{'num_sink_tokens': 4, 'window_length': 16384 }" \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True

I don't have a generic issue running. I removed things that shouldn't be relevant to the awq issue.

image

However, when I upload some text and then ask a question, I get the same issue:

/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
  warn_deprecated(
thread exception: Traceback (most recent call last):
  File "/home/jon/h2ogpt/src/utils.py", line 502, in run
    self._return = self._target(*self._args, **self._kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate
    return self.llm.generate_prompt(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/home/jon/h2ogpt/src/gpt_langchain.py", line 2339, in _generate
    rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate
    responses = self.pipeline(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__
    outputs = list(final_iterator)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/jon/h2ogpt/src/h2oai_pipeline.py", line 271, in _forward
    return self.__forward(model_inputs, **generate_kwargs)
  File "/home/jon/h2ogpt/src/h2oai_pipeline.py", line 309, in __forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._greedy_search(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search
    outputs = self(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward
    outputs = self.model(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 119, in forward
    h, _, past_key_value = layer(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 113, in forward
    attn_output, _, past_key_value = self.attn.forward(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 210, in forward
    xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 62, in forward
    freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 50, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
AssertionError

@pseudotensor
Copy link
Collaborator

This does the same thing:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--attention_sinks=True \
--sink_dict="{'num_sink_tokens': 4, 'window_length': 16384 }" \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True

@pseudotensor
Copy link
Collaborator

As does this:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True

So it seems to be a pure awq issue.

The latest 0.2.5 does the same thing. Reducing to (say) 15000 does same thing.

@pseudotensor
Copy link
Collaborator

A small script does the same thing, so it's not related to h2oGPT itself.

@pseudotensor
Copy link
Collaborator

@Blacksuan19
Copy link
Contributor Author

I'm getting a similar error with LLAMA-3 GGUF as well (same model mentioned in the FAQ), it only includes the evaluate_nochat exception from the log above.

error running LLAMA-3
evaluate_nochat exception: : ('', '', '', True, 'unknown', "{   'PreInput': None,\n    'PreInstruct': None,\n    'PreResponse': None,\n    'botstr': None,\n    'can_handle_system_prompt': False,\n    'chat_sep': '\\n',\n    'chat_turn_
sep': '\\n',\n    'generates_leading_space': False,\n    'humanstr': None,\n    'promptA': None,\n    'promptB': None,\n    'system_prompt': '',\n    'terminate_response': []}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, Tr
ue, '', '', 'LLM', True, 'Query', None, 10, True, 512, 'Relevant', ['All'], None, None, None, None, 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'Acco
rding to only the information in the document sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to t
he following text.', 'Using only the information in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some
 NLP embedding model to use that answer as better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is str
ictly valid JSON text.', 'Ensure your entire response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier.   Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanumerics, underscores, or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], None, '', False, '[]', '[]', 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, None, 1, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cpu', 'base_model': 'llama', 'tokenizer_base_model': '', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'unknown', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse': None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, '6164574a-865b-4659-8fda-d35faa6a1d09', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'MyData', 'UserData'], 'langchain_mode_paths': {'UserData': None}, 'langchain_mode_types': {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'MyData': 'personal', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'keep-alive', 'content-length': '117', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0', 'dnt': '1', 'content-type': 'application/json', 'accept': '*/*', 'origin': 'http://0.0.0.0:7850', 'referer': 'http://0.0.0.0:7850/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8py5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZDSCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yOZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlfgUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-token-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ; access-token-unsecure-JSzZdmZ5Fn4S9ekIB_5lXnXTrnwvQu1X7IyivtmRjuk=mm2CzLGIw9b3H9xFfS1KpQ; access-token-unsecure-IxTC1FBXOKLvW0SXsNRzMYxrHxvTPTIwwB4y69dHG9A=fYRfZinU_x99RK1k11fvIA; access-token-unsecure-eJjGwBq3ju0P30aflFl-P8uUU2QqEgAIsxgw-FsdJgU=fDM1Je-16fNS5ndkdNRL6g; access-token-unsecure-uiZ2ybGZZJgrCTxTV22rFbgZVNssg73T8oL2xiUqp1I=VrTDb6L_l-gQy27srKSq0Q; access-token-unsecure-U2bTOnaNSeVnj1LFMlrf1Mtm6lWYkZALH_MqD44PYNU=lt481IiB7f9YGqWkkOinIA; access-token-unsecure-RxkNbfFX2paq-I07CliUNsj55vZP1qWOWwp2u-TDNCc=mmNdzgtQmfvlDlQe6i4PvA; access-token-unsecure-zfJ9zz3Jn9sTfeKgGIdQFP7gIY4kjYQ8rW5jEkOaylQ=IsBrbZibgh9mmFvTDmEOAg; access-token-unsecure-L9Xt7VY0kFHSJqZAK8-I90p4rp6XNVtxCYJp3nAmxfs=LZu9e7Ggvl2vYBE1AuZbLQ', 'host2': '14.1.246.124', 'picture': 'None'}, {}, [['hello', '']])
Traceback (most recent call last):
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/queueing.py", line 566, in process_events
    response = await route_utils.call_process_api(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api
    result = await self.call_function(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 595, in async_iteration
    return await iterator.__anext__()
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 588, in __anext__
    return await anyio.to_thread.run_sync(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 571, in run_sync_iterator_async
    return next(iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 754, in gen_wrapper
    response = next(iterator)
  File "/workspace/src/gradio_runner.py", line 5053, in bot
    for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1,
  File "/workspace/src/gradio_runner.py", line 4948, in get_response
    for output_fun in fun1():
  File "/workspace/src/gen.py", line 4402, in evaluate
    prompt_basic = prompter.generate_prompt(data_point, context_from_history=False)
  File "/workspace/src/prompter.py", line 1729, in generate_prompt
    assert self.use_chat_template
AssertionError
run command
export IMAGE_TAG=4059a2c9
export HF_TOKEN=hf_xxx

docker run \
 --init \
--gpus all \
--runtime=nvidia \
--shm-size=2g \
-p 7850:7860 \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u $(id -u):$(id -g) \
gcr.io/vorvan/h2oai/h2ogpt-runtime:$IMAGE_TAG /workspace/generate.py \
--openai_server=False \
--auth="/workspace/auth/users.json" \
--h2ogpt_api_keys="/workspace/auth/api_keys.json" \
--use_gpu_id=False \
--score_model=None \
--base_model=llama \
--model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf?download=true
--tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct \
--save_dir='/workspace/save/' \
--user_path='/workspace/user_path/' \
--langchain_mode="UserData" \
--langchain_modes="['UserData', 'LLM']" \
--visible_langchain_actions="['Query']" \
--visible_langchain_agents="[]" \
--use_llm_if_no_docs=True \
--enable_ocr=True \
--enable_tts=False \
--enable_stt=False

@pseudotensor
Copy link
Collaborator

It should fail and make you pass:

--max_seq_len=8192

as well.

e.g.

python generate.py --openai_server=False --score_model=None --base_model=llama --model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf?download=true --tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct --use_llm_if_no_docs=True --max_seq_len=8192

gives:

image

I don't see the error you see. And when I debug the code with the above command on latest h2oGPT, I see that chat_template is True.

Perhaps you are using older docker image or older h2oGPT or something?

@Blacksuan19
Copy link
Contributor Author

Blacksuan19 commented May 7, 2024

there was a missing slash in the command after --model_path_llama, fixed now, however I'm still unable to run LLAMA-3 with the docker image due to it being a gated model, I tried passing --use_auth_token=hf_xxx and setting environment variables HUGGING_FACE_HUB_TOKEN and HF_TOKEN and HUGGINGFACE_TOKEN but still cannot access.

this occurs on both latest and previous docker images with tags c25144e9 and 4059a2c9

@pseudotensor
Copy link
Collaborator

Can you share a stack trace of where it's failing?

@Blacksuan19
Copy link
Contributor Author

Blacksuan19 commented May 7, 2024

here is the full stack trace, I'm authenticated on the huggingface-cli, have exported HUGGING_FACE_HUB_TOKEN in the environment and passing --use_auth_token to docker run

the error is not raised while starting the container, only after sending a query is the error raised.

evaluate_nochat exception: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.: ('', '', '', True, 'unknown', "{   'PreInput': None,\n    'PreInstruct': None,\n    'PreResponse': None,\n    'botstr': None,\n
    'can_handle_system_prompt': False,\n    'chat_sep': '\\n',\n    'chat_turn_sep': '\\n',\n    'generates_leading_space': False,\n    'humanstr': None,\n    'promptA': None,\n    'promptB': None,\n    'system_prompt': '',\n    'termi
nate_response': []}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'UserData', True, 'Query', None, 10, True, 512, 'Relevant', ['/workspace/user_path/9b999f43-2ade-4148-97cf-d2448125168c/res/b1a173b2_user_upload
_Abubakar-Yusif-March-2024-Progress-Report.pdf'], None, None, None, None, 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'According to only the informat
ion in the document sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text.', 'Usin
g only the information in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to us
e that answer as better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is strictly valid JSON text.', '
Ensure your entire response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier.   Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanume
rics, underscores, or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['OCR', 'DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], None, '', False, '[]', '[]', 'be
st_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, None, 1, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cpu', 'base_model': 'llama', 'tok
enizer_base_model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'unknown', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse'
: None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key':
None}, {'MyData': [None, '2db742f5-7880-4708-bfcc-061f995f6c51', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'UserData'], 'langchain_mode_paths': {'UserData': '/workspace/user_path/'}, 'langchain_mode_types': {'UserData': 'shared
', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'keep-alive', 'c
ontent-length': '155', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0', 'dnt': '1', 'content-type': 'application/json', 'accept': '*/*', 'origin': 'htt
p://0.0.0.0:7850', 'referer': 'http://0.0.0.0:7850/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8py5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZD
SCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yO
ZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlfgUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-tok
en-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ; access-token-unsecure-JSzZdmZ5Fn4S9ekIB_5lXnXTrnwvQu1X7IyivtmRjuk
=mm2CzLGIw9b3H9xFfS1KpQ; access-token-unsecure-IxTC1FBXOKLvW0SXsNRzMYxrHxvTPTIwwB4y69dHG9A=fYRfZinU_x99RK1k11fvIA; access-token-unsecure-eJjGwBq3ju0P30aflFl-P8uUU2QqEgAIsxgw-FsdJgU=fDM1Je-16fNS5ndkdNRL6g; access-token-unsecure-uiZ2ybGZ
ZJgrCTxTV22rFbgZVNssg73T8oL2xiUqp1I=VrTDb6L_l-gQy27srKSq0Q; access-token-unsecure-U2bTOnaNSeVnj1LFMlrf1Mtm6lWYkZALH_MqD44PYNU=lt481IiB7f9YGqWkkOinIA; access-token-unsecure-RxkNbfFX2paq-I07CliUNsj55vZP1qWOWwp2u-TDNCc=mmNdzgtQmfvlDlQe6i4
PvA; access-token-unsecure-zfJ9zz3Jn9sTfeKgGIdQFP7gIY4kjYQ8rW5jEkOaylQ=IsBrbZibgh9mmFvTDmEOAg; access-token-unsecure-L9Xt7VY0kFHSJqZAK8-I90p4rp6XNVtxCYJp3nAmxfs=LZu9e7Ggvl2vYBE1AuZbLQ; access-token-unsecure-Q1siuNCSLCiNRxIYSoa5j3shaosW
MURmotg6HLCC7_U=GnFkZ58wi1tNt8jvtK_UXw; access-token-unsecure-18dc1X-LAJmB7lwYCfNUZLG0jDJ14hzU62wssf0wc68=sl8WTC1xPw4lEirkcWQTwg; access-token-unsecure-EjGzYB6qR9D1LPBN82KeJcwOVChluLaElinAd1OgeSk=jzCYJhyTI3Aka1vIsd-b8g; access-token-un
secure-X7gaKAyAK8MUU18su8lDmyVdbuDVILxO9xngYdFnvgo=1JT2Rzr5QHd5rLGbEgqASg; access-token-unsecure-SBEZ_P_Ugyx9zNJcGUYmCbOTE7jSx0BgJFTIyH3OkSg=_X_Xbu1W23MfAgZgdLlWZg; access-token-unsecure-wxxoptBvH1YNkOHaad2CST68Cg5SrC23tRQMsGy2TOc=Vl3F
HsOce3Q6zMWHJq79tA; access-token-unsecure-6WqVOYaEXoF6RktDUq-aghEiDdfSuCY2tkPQHCqGjck=N4QVkzJs0_AgRFQRdx5Y-Q; access-token-unsecure-PX6qZPvMN8mpgv3RwObslqNOspATAkytwLj-5mrO12Q=CxUwwnlESuufP4AeTnOh3Q; access-token-unsecure-a8SnDigFzjafo
M84LMb3tCUsrk44E9VoXdpmv36kpNo=DjNp22jvVqPfWo9k6fWxUg; access-token-unsecure-xy6fU_AFL6Tp1nllr3w_QAOlrvJRFgRn0bTPJnCrTAo=Nqt1rhRGgIiZD_m8zaERrg; access-token-unsecure-9JU3qRVH6hK4ZC54rM0J035t62b8-m26tT3T5I-wEWA=ESa5PlhZ1RUHvb1Klzxujg',
 'host2': '14.1.246.126', 'picture': 'None'}, {}, [['summarize the given data', '']])
Traceback (most recent call last):
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/queueing.py", line 566, in process_events
    response = await route_utils.call_process_api(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api
    output = await app.get_blocks().process_api(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api
    result = await self.call_function(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 595, in async_iteration
    return await iterator.__anext__()
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 588, in __anext__
    return await anyio.to_thread.run_sync(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
    return await future
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 571, in run_sync_iterator_async
    return next(iterator)
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 754, in gen_wrapper
    response = next(iterator)
  File "/workspace/src/gradio_runner.py", line 5053, in bot
    for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1,
  File "/workspace/src/gradio_runner.py", line 4948, in get_response
    for output_fun in fun1():
  File "/workspace/src/gen.py", line 4278, in evaluate
    prompter = Prompter(prompt_type, prompt_dict, debug=debug, stream_output=stream_output,
  File "/workspace/src/prompter.py", line 1706, in __init__
    self.terminate_response = update_terminate_responses(self.terminate_response,
  File "/workspace/src/stopping.py", line 25, in update_terminate_responses
    generate_eos_token_id = GenerationConfig.from_pretrained(tokenizer.name_or_path).eos_token_id
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 843, in from_pretrained
    resolved_config_file = cached_file(
  File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 416, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

@pseudotensor
Copy link
Collaborator

I see. But if you pass the env HUGGING_FACE_HUB_TOKEN through it should still work here.

e.g. the docker line would add:

-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \

pseudotensor added a commit that referenced this issue May 7, 2024
@pseudotensor
Copy link
Collaborator

But, I made some changes for that particular piece of code.

@Blacksuan19
Copy link
Contributor Author

I can confirm passing the environment variable to the docker image with -e or passing --use_auth_token also works on the latest image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants