Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎅 I WISH LITELLM HAD... #361

Open
krrishdholakia opened this issue Sep 13, 2023 · 136 comments
Open

🎅 I WISH LITELLM HAD... #361

krrishdholakia opened this issue Sep 13, 2023 · 136 comments

Comments

@krrishdholakia
Copy link
Contributor

krrishdholakia commented Sep 13, 2023

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

@krrishdholakia krrishdholakia pinned this issue Sep 13, 2023
@krrishdholakia
Copy link
Contributor Author

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

@krrishdholakia
Copy link
Contributor Author

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 13, 2023

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

@ishaan-jaff
Copy link
Contributor

ishaan-jaff commented Sep 13, 2023

[Spend Dashboard] View analytics for spend per llm and per user

  • This allows me to see what my most expensive llms are and what users are using litellm heavily

@ishaan-jaff
Copy link
Contributor

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

@Pipboyguy
Copy link

Integration with NLP Cloud

@krrishdholakia
Copy link
Contributor Author

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

@krrishdholakia krrishdholakia changed the title LiteLLM Wishlist 🎅 I WISH LITELLM ADDED... Sep 14, 2023
@krrishdholakia krrishdholakia changed the title 🎅 I WISH LITELLM ADDED... 🎅 I WISH LITELLM HAD... Sep 14, 2023
@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 14, 2023

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner.
Screenshot 2023-09-14 at 10 54 50 AM

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

@haseeb-heaven
Copy link
Contributor

Option to use Inference API so we can use any model from Hugging Face 🤗

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 17, 2023

@haseeb-heaven you can already do this -

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

@haseeb-heaven
Copy link
Contributor

@haseeb-heaven you can already do this -

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

@smig23
Copy link

smig23 commented Sep 18, 2023

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

@ishaan-jaff
Copy link
Contributor

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

@shauryr
Copy link
Contributor

shauryr commented Sep 18, 2023

finetuning wrapper for openai, huggingface etc.

@krrishdholakia
Copy link
Contributor Author

@shauryr i created an issue to track this - feel free to add any missing details here

@smig23
Copy link

smig23 commented Sep 18, 2023

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

@ranjancse26
Copy link

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

@ranjancse26
Copy link

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

@ranjancse26
Copy link

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

@ishaan-jaff
Copy link
Contributor

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

@ranjancse26
Copy link

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

image

@ishaan-jaff
Copy link
Contributor

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

@ranjancse26
Copy link

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

@ranjancse26
Copy link

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

@ranjancse26
Copy link

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

  • Text classification
  • Text summarization
  • Text translation
  • Text generation
  • Code generation

@ranjancse26
Copy link

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

  • Pinecone
  • Qdrant
  • Weaviate
  • Milvus
  • DuckDB
  • Sqlite

@ranjancse26
Copy link

ranjancse26 commented Sep 21, 2023

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser
https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

@bsu3338
Copy link

bsu3338 commented Feb 11, 2024

Please add redisvl module to the requirements.txt for semantic redis caching. This is so I do not have to build a custom docker container. Thank you and thanks for adding the feature! Just noticed in commit history it was added and then removed. Will this be coming back?

@nivibilla
Copy link

sglang support pretty please!

@ranjancse26
Copy link

It would be great if you could provide a support for groq. Essentially, groq provides an Open AI based interface.

@hlohaus
Copy link

hlohaus commented Feb 26, 2024

Is support for the g4f package planned?
If wanted, I can create a pull request.

@s-jse
Copy link

s-jse commented Mar 23, 2024

I wish LiteLLM would show a progress bar for batch_completion(). It is nice to have when working with large batch jobs.

@rlippmann
Copy link

rlippmann commented Mar 24, 2024

Not sure if this is already implemented, but...

Proactive routing. Instead of trying to route, failing, and falling back, maybe keep model max tokens so it can tell if the inference will fail anyway beforehand.

Also, perhaps a max parallelism for number of requests that can simultaneously be sent to an endpoint. This way it could round robin on empty endpoints instead of overloading one endpoint, and failing over.

@andaldanar
Copy link

andaldanar commented Mar 25, 2024

I wish LiteLLM could support Cohere's Rerank API endpoint - thank you!

https://docs.cohere.com/docs/reranking

@krrishdholakia
Copy link
Contributor Author

@rlippmann

pre-call checks for max tokens is live - https://docs.litellm.ai/docs/routing#pre-call-checks-context-window

max parallelism for number of requests -> explain to me how this might work? So do you want to set a max parallel request for an endpoint?

@K-J-VV
Copy link

K-J-VV commented Mar 27, 2024

Plans to add Private-GPT's API? https://github.com/zylon-ai/private-gpt

@nileshtrivedi
Copy link

I wish LiteLLM had a client library for Elixir, removing the need for me to run a separate proxy server.

@RobertLiu0905
Copy link

I wish LiteLLM had simple serverless ability, some proxy services are not used continuously

@ishaan-jaff
Copy link
Contributor

@RobertLiu0905 Cloudflare Python workers are here, we have an active issue to get litellm support on Cloudflare workers: cloudflare/workerd#1943

Is this what you wanted ? Open to suggestions on other approaches

cc @TranquilMarmot

@zuberahmed1987
Copy link

I wish LiteLLM had support for IBM watsonx.ai.
https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html

Thanks

@nbaav1
Copy link

nbaav1 commented Apr 17, 2024

I wish LiteLLM Proxy server had a config setting for proxy_base_url. For example hosting the server at
http://0.0.0.0:4000/<proxy_base_url> or http://0.0.0.0:4000/abc/xyz.
Meaning that I could do something like:
litellm --model gpt-3.5-turbo --proxy_base_url abc/xyz
And then:

import openai
client = openai.OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000/abc/xyz"
)

response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

This would simplify our infrastructure in AWS and still comply with company policies.
Thanks!

@twardoch
Copy link

WISH: Expand batching

The Google AI Studio API for Genini Pro 1.5 has very harsh restrictions on RPM & TPM ( https://ai.google.dev/pricing ) but you get a FREE or $7+$21/M (1M+8k) LLM API.

The NEW OpenAI Batch API is 50% cheaper than the normal API so for GPT-4 Turbo is $5+$15/M (128k+4k) — but schedules processing, processes it very asynchronously on their end, and delivers results “later”.

https://help.openai.com/en/articles/9197833-batch-api-faq

It would be great to create an OpenAI-compatible Batch API abstraction which for OpenAI uses their Batch API abstraction directly, but for other models uses local batching, pooling, RPM&TPM limitation etc., and works in a similar way.

I imagine that other API providers may follow suit with their native, cheaper batch API, so an abstraction would be highly desirable.

I know LiteLLM has its own batching already (which is slightly different in concept), so my request might be an extension to that.

Why?

Well, many of us have use cases for MASS LLM processing: translation, summsrization, rewriting (like coreference resolution, NER etc.). We don't need "ASAP async" for those, but cheaper is always better 😃

@motin
Copy link

motin commented Apr 24, 2024

I wish it was possible to specify which callbacks LiteLLM would use on a per request basis (e.g. without modifying global state)

@andersskog
Copy link

I wish LiteLLM logger would support json logging, with a more succinct message and extra fields with longer strings. Logging of requests to LLM providers is specially long and unformatted.

@andersskog
Copy link

I wish LiteLLM would implement stronger typing for methods.

As an example, when I call:

response = await litellm.acompletion(stream=True, **kwargs)

I need to do the following assertions:

assert isinstance(response, litellm.CustomStreamWrapper)
    async for chunk in response:
        assert isinstance(chunk, litellm.ModelResponse)
        assert isinstance(chunk.choices[0], litellm.utils.StreamingChoices)

since I'm working in a typed codebase enforced with pyright.

@krrishdholakia
Copy link
Contributor Author

Hey @andersskog just pushed the v1 for json logging - b46db8b

You can enable it with litellm.json_logs = True. It currently just logs the raw request sent by litellm. Open to feedback on this.

@zhaoninge
Copy link

zhaoninge commented May 3, 2024

I wish litellm had an API to check available models from providers in real time.

@QwertyJack
Copy link

I wish LiteLLM had support for Sambaverse.
https://docs.sambanova.ai/sambaverse/latest/index.html

Thanks

@horahoradev
Copy link

Discord alerting would be nice

@ggallotti
Copy link

Wilcard for model_name property in model_list:

model_list:
  - model_name: "vertex_ai/*"
    litellm_params:
      model: "vertex_ai/*"
      vertex_project: os.environ/VERTEXAI_PROJECT
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: os.environ/ANTHROPIC_API_KEY      
  - model_name: "gemini/*"
    litellm_params:
      model: "gemini/*"
      api_key: os.environ/GEMINI_API_KEY

@krrishdholakia
Copy link
Contributor Author

@ggallotti would that be similar to how we do it for openai today -
Screenshot 2024-05-13 at 1 45 08 PM

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

@ggallotti
Copy link

@ggallotti would that be similar to how we do it for openai today - Screenshot 2024-05-13 at 1 45 08 PM

https://docs.litellm.ai/docs/providers/openai#2-start-the-proxy

Thanks for the response.
But that configuration does not works, as will force the OpenAI apikey for other models.

@ducnvu
Copy link

ducnvu commented May 17, 2024

Streamlined way to call vision and non-vision models would be great. Being LLM-agnostic is a big reason why I use the package but currently still have to handle different request format depending on which model it goes to.

For example: Calling GPT4 Vision, messages.content is an array. Using the same code to call Azure's Command R+ would result in

litellm.exceptions.APIError: OpenAIException - Error code: 400 - {'message': 'invalid type: parameter messages.content is of type array but
 should be of type string.'}

I'm aware this is on the model provider's side, but GPT's non-vision models for example support both format.

@krrishdholakia
Copy link
Contributor Author

@ducnvu seems like something we need to fix - can you share the command r call?

@ducnvu
Copy link

ducnvu commented May 17, 2024

@krrishdholakia Thanks for the prompt response, the call is something like this. I don't have access to all models supported by litellm to test but so far OpenAI models work with both string messages.content and the format below, Command R is where I first encounter this error. All my calls are through Azure.

dict = {'temperature': 0.7, 'n': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'messages': [{'role': 'system', 'content': [{'type': 'text', 'text': "You are Command R Plus, answer as concisely as possible (e.g. don't be verbose). When writing code, specify the language as per the markdown format."}]}, {'role': 'user', 'content': [{'type': 'text', 'text': 'hi'}]}], 'timeout': 600, 'stream': True, 'model': 'azure/command-r-plus', 'api_base': BASE, 'api_key': KEY}

await litellm.acompletion(**dict())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests