Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Warmup(Generation("'bool' object has no attribute 'dtype'")) #422

Open
1 of 4 tasks
KrisWongz opened this issue Apr 18, 2024 · 3 comments
Open
1 of 4 tasks
Labels
bug Something isn't working

Comments

@KrisWongz
Copy link

System Info

lorax latest docker, 2 A100, unbuntu 22.04

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

When loading qwen-32b-chat.

2024-04-18T01:17:55.125611Z INFO lorax_router: router/src/main.rs:296: Warming up model
2024-04-18T01:17:55.167710Z INFO lorax_launcher: flash_causal_lm.py:793 Warming up to max_total_tokens: 488
2024-04-18T01:17:55.167711Z INFO lorax_launcher: flash_causal_lm.py:793 Warming up to max_total_tokens: 488
2024-04-18T01:17:55.548798Z ERROR lorax_launcher: interceptor.py:41 Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 321, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 81, in Warmup
max_supported_total_tokens = self.model.warmup(batch, request.max_new_tokens)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 796, in warmup
_, batch = self.generate_token(batch, is_warmup=True)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 936, in generate_token
raise e
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 933, in generate_token
out = self.forward(batch, adapter_data)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 890, in forward
logits = model.forward(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 484, in forward
hidden_states = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 428, in forward
hidden_states, residual = layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 351, in forward
attn_output = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 213, in forward
qkv = self.query_key_value(hidden_states, adapter_data)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 606, in forward
result = self.base_layer(input)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 402, in forward
return self.linear.forward(x)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 288, in forward
if self.bias is not None and self.bias.dtype != x.dtype:
AttributeError: 'bool' object has no attribute 'dtype'

2024-04-18T01:17:55.549099Z ERROR warmup{max_input_length=1024 max_prefill_tokens=1024 max_total_tokens=1512}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: 'bool' object has no attribute 'dtype'
2024-04-18T01:17:55.559823Z ERROR lorax_launcher: interceptor.py:41 Method Warmup encountered an error.
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/lorax_server/cli.py", line 89, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 321, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/interceptor.py", line 38, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 81, in Warmup
max_supported_total_tokens = self.model.warmup(batch, request.max_new_tokens)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 796, in warmup
_, batch = self.generate_token(batch, is_warmup=True)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 936, in generate_token
raise e
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 933, in generate_token
out = self.forward(batch, adapter_data)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_causal_lm.py", line 890, in forward
logits = model.forward(
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 484, in forward
hidden_states = self.model(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 428, in forward
hidden_states, residual = layer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 351, in forward
attn_output = self.self_attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/custom_modeling/flash_qwen2_modeling.py", line 213, in forward
qkv = self.query_key_value(hidden_states, adapter_data)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 606, in forward
result = self.base_layer(input)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 402, in forward
return self.linear.forward(x)
File "/opt/conda/lib/python3.10/site-packages/lorax_server/utils/layers.py", line 288, in forward
if self.bias is not None and self.bias.dtype != x.dtype:
AttributeError: 'bool' object has no attribute 'dtype'

2024-04-18T01:17:55.560073Z ERROR warmup{max_input_length=1024 max_prefill_tokens=1024 max_total_tokens=1512}:warmup: lorax_client: router/client/src/lib.rs:34: Server error: 'bool' object has no attribute 'dtype'
Error: Warmup(Generation("'bool' object has no attribute 'dtype'"))
2024-04-18T01:17:55.643388Z ERROR lorax_launcher: Webserver Crashed

Expected behavior

none

@magdyksaleh
Copy link
Collaborator

@KrisWongz Can you please share the arguments to the command you used when you ran into this issue?

@KrisWongz
Copy link
Author

Sure @magdyksaleh

docker run --gpus all
--shm-size 1g
-p 8081:80
-v /home/unionlab001/Model/qwen-32b:/data ghcr.io/predibase/lorax:latest
--model-id /data/Qwen1_5-32B-Chat
--trust-remote-code
--quantize bitsandbytes-nf4
--max-batch-prefill-tokens 1024
--max-input-length 1024
--max-total-tokens 1512
--num-shard 4 \

@tgaddair
Copy link
Contributor

Looks like this may be an issue specific to Qwen + nf4 quantization. We'll take a look.

@magdyksaleh magdyksaleh added the bug Something isn't working label May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants