Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when setting up self hosted model + langchain integration #9

Open
dcavadia opened this issue Feb 25, 2023 · 28 comments
Open

Comments

@dcavadia
Copy link

dcavadia commented Feb 25, 2023

Im having this bug when trying to setup a model within a lambda cloud running SelfHostedHuggingFaceLLM() after the rh.cluster() function.

`
from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh
gpu = rh.cluster(name="rh-a10", instance_type="A10:1").save()
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])
llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["pip:./", "transformers", "torch"])
`

image

I made sure with sky check that the lambda credentials are set, but the error i get within the log is this, which i havent been able to solve.

image

If i can get any help solving this i would appreciate it.

@dongreenberg
Copy link
Contributor

Hi! Thanks for raising this. It looks like the GPU type you're specifying is "A10", which is not a valid GPU type. Can you try "A100:1"? To see all the GPU types available, you can run sky show-gpus or sky show-gpus --cloud lambda.

@dongreenberg
Copy link
Contributor

Cc @concretevitamin - it looks to me like the accelerator validation for lambda is not catching properly?

@concretevitamin
Copy link

Hey, thanks for the report. This bug showed when Lambda console contained existing instances in addition to using SkyPilot to launch new instances.

This has been fixed in SkyPilot main branch.

@dongreenberg
Copy link
Contributor

Ya, I spun up an A10 shortly after I wrote the above and realized it works and just wasn't in the catalogue 😄. Excellent, glad to hear it's fixed. @dcavadia I can help you get set up on the SkyPilot main branch if that's helpful, or use an existing Lambda instance you have up if you'd prefer to do that instead.

@concretevitamin
Copy link

concretevitamin commented Feb 27, 2023

@dongreenberg It's a quirk on our end: sky show-gpus --cloud lambda shows NVIDIA_GPU which really means an opinionated list of "common NVIDIA GPUs". If you pass --all / -a to the above, all supported GPUs in the catalog will be shown including A10, A6000, RTX6000, etc. Let us know if showing all GPUs by default or at least A10 is a good idea.

@dongreenberg
Copy link
Contributor

Based on my intuition to run the show-gpus command to see if a particular variant exists in the catalogue, my bias would be to either show all by default or print a warning that this is only common GPUs, and to run -a to see the full list. Maybe a middle ground would be that if I just run sky show-gpus it shows only the common hardware variants so it's not a mess of cloud-specific hardware (with a warning about running -a), but if I run with --cloud it show the full catalogue for the given cloud.

@dcavadia
Copy link
Author

dcavadia commented Feb 27, 2023

Ya, I spun up an A10 shortly after I wrote the above and realized it works and just wasn't in the catalogue 😄. Excellent, glad to hear it's fixed. @dcavadia I can help you get set up on the SkyPilot main branch if that's helpful, or use an existing Lambda instance you have up if you'd prefer to do that instead.

Hi! im glad you guys find the issue thanks a lot. I just set up the SkyPilot main branch with pip install git+https://github.com/skypilot-org/skypilot and that solved the problem i had before... it now set up the instance in lambda i can launch it but i get a new error while still running the function, looks like a InactiveRpcError. Have any idea on this?

image
image

@dongreenberg
Copy link
Contributor

dongreenberg commented Feb 27, 2023

Great! Glad this worked.

It's because your working directory (referenced in reqs by "./") is being detected as "ubuntu" (I assume your home directory), and the pip modifier in front of it is telling the grpc server to try to pip install it. Try changing "pip:./" to just "./" or "local:./" to avoid pip installing it. Sorry, the notebook you're using was inside the langchain directory (meaning langchain was the working directory) so it needed to be pip installed. You'll probably need to add "langchain" into the reqs too if you haven't already to make sure to install it on the server.

If for some reason you're getting an error about gRPC not finding methods, the gRPC server on your instance went down from this error. You can restart it by running gpu.restart_grpc_server().

@dcavadia
Copy link
Author

Great! Glad this worked.

It's because your working directory (referenced in reqs by "./") is being detected as "ubuntu" (I assume your home directory), and the pip modifier in front of it is telling the grpc server to try to pip install it. Try changing "pip:./" to just "./" or "local:./" to avoid pip installing it. Sorry, the notebook you're using was inside the langchain directory (meaning langchain was the working directory) so it needed to be pip installed. You'll probably need to add "langchain" into the reqs too if you haven't already to make sure to install it on the server.

If for some reason you're getting an error about gRPC not finding methods, the gRPC server on your instance went down from this error. You can restart it by running gpu.restart_grpc_server().

Thanks for the quick reply. I created a new instance and set it all up again with the ./ and langchain parameter as SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["./", "transformers", "torch","langchain"]) and it seems that finally the setup has been successful.

I dont get an error about the gRPC but it get hang at the Running _generate_text via gRPC. Im not sure if its normal for the model integration to take >20mins with a A10 lambda instance.
image

@dongreenberg
Copy link
Contributor

Great! But no, it shouldn't take nearly that long to download the model with a small model like gpt2. One way to see what's happening on the server is to call the RPC with stream_logs=True (though it's not integrated in a user-facing way into langchain). Can you halt that and try running the following:

llm.client(pipeline=llm.pipeline_ref, prompt="My prompt...", stream_logs=True)

If that doesn't work, there's a way to inspect the server logs directly that I can point you to. Thank you for bearing with us!

@dcavadia
Copy link
Author

Yes that would be great if you can point me where to get the server logs directly. Thanks!

@dongreenberg
Copy link
Contributor

dongreenberg commented Feb 27, 2023

If you ssh into the cluster (you can just run ssh rh-a10 from your command line) and then type screen -r, you can view the screen in which the server is running. Just be careful not the ctrl-C to exit or you'll kill the server (not a big deal, you can just restart it with the restart_grpc_server call I mentioned above). CMD-A-D exits screen without killing the server. Happy to live debug too, I'm free pretty much the rest of the day.

@dcavadia
Copy link
Author

dcavadia commented Feb 27, 2023

Great i now can look at the logs of the server, i noticed something about no available nodes can fulfill resource request.
image

And this is the ray status within the server instance.
image

@dongreenberg
Copy link
Contributor

dongreenberg commented Feb 27, 2023

That would indeed cause the thread to hang. Confusing why Ray would be halting that when the resources are clearly available in ray status. Could you try running gpu.restart_grpc_server(restart_ray=True)?

@dcavadia
Copy link
Author

That did something, now at least the message is send but still giving some errors.

`INFO | 2023-02-27 19:58:17,693 | Running _generate_text via gRPC
INFO | 2023-02-27 19:58:18,463 | Time to send message: 0.77 seconds
ERROR | 2023-02-27 19:58:18,464 | Error inside function call: 'str' object is not callable.
ERROR | 2023-02-27 19:58:18,464 | Traceback: Traceback (most recent call last):
File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/runhouse/grpc_handler/unary_server.py", line 184, in RunModule
res = call_fn_by_type(fn, fn_type, fn_name, module_path, args, kwargs)
File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/runhouse/rns/run_module_utils.py", line 28, in call_fn_by_type
res = fn(*args, **kwargs)
File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/langchain/llms/self_hosted_hugging_face.py", line 31, in _generate_text
response = pipeline(prompt, *args, **kwargs)
TypeError: 'str' object is not callable

ERROR | 2023-02-27 19:58:18,564 | Internal Python error in the inspect module.
Below is the traceback from this internal error.

TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

AttributeError: 'TypeError' object has no attribute 'render_traceback'

During handling of the above exception, another exception occurred:

AssertionError
INFO | 2023-02-27 19:58:18,567 |
Unfortunately, your original traceback can not be constructed.
`

@dongreenberg
Copy link
Contributor

Ok great - your local llm object is still using the pipeline reference string stored on the previous Ray KVstore that we killed. You should be able to fix this by rerunning the cells to create the llm object and LLMChain object, which will create the pipeline in the Ray KV store.

@dcavadia
Copy link
Author

Ok great - your local llm object is still using the pipeline reference string stored on the previous Ray KVstore that we killed. You should be able to fix this by rerunning the cells to create the llm object and LLMChain object, which will create the pipeline in the Ray KV store.

Oh I see, i rerun the cells but i get back to the hanging at Running _generate_text via gRPC. Now with a different log info.

(raylet) [2023-02-27 20:17:05,913 E 68936 68936] (raylet) worker_pool.cc:502: Some workers of the worker process(69328) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet) Traceback (most recent call last):
(raylet)   File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 8, in <module>
(raylet)     import ray
(raylet)   File "/home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/__init__.py", line 101, in <module>
(raylet)     _configure_system()
(raylet)   File "/home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/__init__.py", line 98, in _configure_system
(raylet)     CDLL(so_path, ctypes.RTLD_GLOBAL)
(raylet)   File "/home/ubuntu/miniconda3/lib/python3.10/ctypes/__init__.py", line 374, in __init__
(raylet)     self._handle = _dlopen(self._name, mode)
(raylet) OSError: /home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/_raylet.so: undefined symbol: _Py_CheckRecursionLimit

@dongreenberg
Copy link
Contributor

dongreenberg commented Feb 27, 2023

Ok that's a new one - notebooks are funny, I think something is sticking in memory. Would it be possible to restart the notebook kernel, run from the top, and run gpu.restart_grpc_server(restart_ray=True) after defining the gpu object? (Also I'd just try runing normally through langchain, not through llm.client with stream logs)

@dcavadia
Copy link
Author

pip:./

Yes its been funny, I tried even with a new instance, this is the code so far:

from langchain.llms import SelfHostedPipeline, SelfHostedHuggingFaceLLM
from langchain import PromptTemplate, LLMChain
import runhouse as rh
gpu = rh.cluster(name="rh-a10", instance_type="A100:1").save()
gpu.restart_grpc_server(restart_ray=True)
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])
llm = SelfHostedHuggingFaceLLM(model_id="gpt2", hardware=gpu, model_reqs=["./", "transformers", "torch", "langchain"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
question = "What NFL team won the Super Bowl in the year Justin Beiber was born?"
llm_chain.run(question)

im trying to run this in my virtual machine instead but im still setting that up, while investigating this issue wihin the notebook

@dongreenberg
Copy link
Contributor

Hm, that code makes sense, it won't run?

@dcavadia
Copy link
Author

Hm, that code makes sense, it won't run?

Its just get hang at the Running _generate_text via gRPC... Is it normal to have no resources show when going to the https://api.run.house/ dashboard interface?

@dongreenberg
Copy link
Contributor

If you've logged into runhouse (i.e. you've run runhouse login and have your API token saved in the ~/.rh/config.yaml), calling .save() on any resource should save it in the resource naming system and it should show up in the dashboard. I see that you have no resources saved on my side as well. If you're not logged in, your resource metadata should be saving to an rh/ directory inside your working directory.

Would you mind just please confirming in the server if the RPC hanging is the Ray resource insufficiency again? If so, I'll raise it to the ray team, because it looks like a bug.

@dcavadia
Copy link
Author

If you've logged into runhouse (i.e. you've run runhouse login and have your API token saved in the ~/.rh/config.yaml), calling .save() on any resource should save it in the resource naming system and it should show up in the dashboard. I see that you have no resources saved on my side as well. If you're not logged in, your resource metadata should be saving to an rh/ directory inside your working directory.

Would you mind just please confirming in the server if the RPC hanging is the Ray resource insufficiency again? If so, I'll raise it to the ray team, because it looks like a bug.

Oh i see. And yes, the resources problem dosent seem to appear anymore. This is the logs from the server:

INFO | 2023-02-27 22:59:07,628 | Reloaded module langchain.llms.self_hosted_hugging_face
(raylet) [2023-02-27 22:59:41,974 E 69555 69555] (raylet) worker_pool.cc:502: Some workers of the worker process(69854) have not registered within the timeout. The process is dead, probably it crashed during start.
(raylet) Traceback (most recent call last):
(raylet)   File "/home/ubuntu/miniconda3/lib/python3.10/site-packages/ray/_private/workers/default_worker.py", line 8, in <module>
(raylet)     import ray
(raylet)   File "/home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/__init__.py", line 101, in <module>
(raylet)     _configure_system()
(raylet)   File "/home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/__init__.py", line 98, in _configure_system
(raylet)     CDLL(so_path, ctypes.RTLD_GLOBAL)
(raylet)   File "/home/ubuntu/miniconda3/lib/python3.10/ctypes/__init__.py", line 374, in __init__
(raylet)     self._handle = _dlopen(self._name, mode)
(raylet) OSError: /home/ubuntu/ubuntu/.local/lib/python3.8/site-packages/ray/_raylet.so: undefined symbol: _Py_CheckRecursionLimit

And this is the status of the ray:

======== Autoscaler status: 2023-02-27 22:57:07.933530 ========
Node status
---------------------------------------------------------------
Healthy:
 1 node_0ba826e055591e93b1eedf2ca00b44c0c8e2ac28fa7b77053bca62f9
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 9.999999999976694e-05/30.0 CPU
 9.999999999998899e-05/1.0 GPU
 0.0/1.0 accelerator_type:A10
 0.00/127.547 GiB memory
 0.00/58.654 GiB object_store_memory

Demands:
 {'CPU': 0.0001, 'GPU': 0.0001}: 1+ pending tasks/actor

@dongreenberg
Copy link
Contributor

Perfect, thank you. I'll report this to Ray, it looks like a bug. The requested resources are clearly less than the available resources, so I'm not sure why Ray is blocking. I've run your code and it worked for me (also on Lambda):
image

@dcavadia
Copy link
Author

dcavadia commented Mar 1, 2023

Perfect, thank you. I'll report this to Ray, it looks like a bug. The requested resources are clearly less than the available resources, so I'm not sure why Ray is blocking. I've run your code and it worked for me (also on Lambda): image

Mhm, can you make sure you are setting the exact sames requirements as me?

pip install runhouse
pip install langchain
pip install git+https://github.com/skypilot-org/skypilot
pip install -U pyOpenSSL
mkdir -p ~/.lambda_cloud
echo "api_key = <your_api_key_here>" > ~/.lambda_cloud/lambda_keys

@dcavadia
Copy link
Author

dcavadia commented Mar 2, 2023

Perfect, thank you. I'll report this to Ray, it looks like a bug. The requested resources are clearly less than the available resources, so I'm not sure why Ray is blocking. I've run your code and it worked for me (also on Lambda): image

Let me know. I appreciate all the help so far.

@dongreenberg
Copy link
Contributor

dongreenberg commented Mar 3, 2023

Thanks for your patience and sorry for the delay. I filed the issue above into Ray. While filing I noticed that your traceback has both Python 3.8 (miniconda) and Python 3.10 (user) and is probably calling different Ray versions through different layers. Do you know why that would be?

@dcavadia
Copy link
Author

dcavadia commented Mar 3, 2023

Thanks for your patience and sorry for the delay. I filed the issue above into Ray. While filing I noticed that your traceback has both Python 3.8 (miniconda) and Python 3.10 (user) and is probably calling different Ray versions through different layers. Do you know why that would be?

Interesting, i didnt notice that. Im not sure why would that happen but i'll dig on that right now. On the other hand, can you confirm you used in your lambda instance these same libraries/req as me?.

pip install runhouse
pip install langchain
pip install git+https://github.com/skypilot-org/skypilot
pip install -U pyOpenSSL
mkdir -p ~/.lambda_cloud
echo "api_key = <your_api_key_here>" > ~/.lambda_cloud/lambda_keys

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants