New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Allow users to connect pr_agent to an existing SageMaker inference endpoint #609
Comments
@krrishdholakia do you think this request is feasible ? |
Hey @mattiaciollaro @mrT23 we already support sagemaker - https://docs.litellm.ai/docs/providers/aws_sagemaker What am i missing? |
@mattiaciollaro is this PR still relevant? |
Sorry for the delay guys.
https://docs.litellm.ai/docs/providers/aws_sagemaker seems to support SageMaker JumpStart models specifically. I am thinking of a different situation where a model is already deployed via SageMaker and a reference to the inference endpoint name is available (as in here). In that case, how can we instruct pr-agent to leverage the LLM behind that pre-existing endpoint? I am not sure I see a way of doing this via https://docs.litellm.ai/docs/providers/aws_sagemaker In the context of a POC with my team, the way we accomplished this was to hack the pr-agent's default AI handler (which is the LiteLLM AI handler) and use the sagemaker SDK (specifically, the HF predictor to make requests to the pre-existing SageMaker endpoint. I imagine a cleaner solution would be to implement a dedicated AI handler for this usecase?
I don't have a PR out for this, but yes: I think the feature request is still relevant :) My apologies again for the delay! |
Context: assume a user has e.g. a pre-configured LLM inference endpoint in SageMaker (for example, a self-hosted Llama model as described here). It would be nice to be able to allow the user to configure pr-agent to leverage that endpoint e.g. by means of a dedicated AI handler.
Discord chat: https://discord.com/channels/1057273017547378788/1057273018084237344/1197261978591309884
cc: @krrishdholakia
The text was updated successfully, but these errors were encountered: