Triton inference server with Python backend and transformers

Nvidia Triton inference server using the Python backend with HuggingFace Transformers library.

Example request using curl:

curl -X 'POST' \
  'http://0.0.0.0:8080/v2/models/ner-model/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": [
    {
      "name": "prompt",
      "shape": [1],
      "datatype": "BYTES",
      "data": ["My name is Wolfgang and I live in Berlin"]
    }
  ]
}'

For the Triton ensemble model (transformers), transform the model into an ONNX format. For that, use the onnx-export.sh script that uses the optimum library.

The following libraries need to be installed via pip.

Then run bash onnx-export.sh.

Example request of the transformers ensemble:

curl -X 'POST' \
  'http://0.0.0.0:8080/v2/models/transformers/infer' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "inputs": [
    {
      "name": "text",
      "shape": [1],
      "datatype": "BYTES",
      "data": ["My name is Wolfgang and I live in Berlin"]
    }
  ]
}'

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
models		models
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
onnx-export.sh		onnx-export.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run-local.sh		run-local.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton inference server with Python backend and transformers

About

Languages

SteliosGian/triton-server-transformers

Folders and files

Latest commit

History

Repository files navigation

Triton inference server with Python backend and transformers

About

Topics

Resources

Stars

Watchers

Forks

Languages