Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI serving tool #702

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

CLI serving tool #702

wants to merge 2 commits into from

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Feb 22, 2024

In this PR I introduce an outlines command-line interface which allows users to serve locally JSON-structured generation workflows. The workflows consists in a prompt template, a LLM and a Pydantic model. The API's parameters are the prompt template's arguments and it returns a JSON object that respect the JSON Schema implicitly defined by the Pydantic model.

The use of the CLI is as follows:

outlines serve [SCRIPT] [--port 8000] [--name fn]

I am still not sure whether serving should happen via llama.cpp or vLLM. The interface to define the API using Outlines is also not completely defined:

from pydantic import BaseModel
import outlines


fn = outlines.Function("mistralai/Mistral-7B-v0.1")

@fn.prompt
def prompt_template(arg):
    """{arg}}"""
    
@fn.schema
class Schema(BaseModel):
    foo: int
    bar: str

It should also be possible to call the thus defined function from another script.

@wang-haoxian
Copy link

Hello Rémi,
Nice work!

I just checked Llama cpp's scheme function.
https://github.com/abetlen/llama-cpp-python?tab=readme-ov-file#json-schema-mode
It seems that llama cpp is working on its way of structuring output.

For the sake of efficiency, I am thinking of vllm.
There is an interesting benchmark that I found.
https://www.inferless.com/learn/exploring-llms-speed-benchmarks-independent-analysis---part-2

From the PoV of scalability, I think vllm is a good choice because:

  • it's highly optimized for efficiency
  • we can leverage Triton to use vllm

I will be glad to contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants