Skip to content

flexchar/tiktoken-counter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Tiktoken Counter

Tiktoken Counter is an API for counting tokens in a text using the Tiktoken library. The API is built using Flask and served with Gunicorn. The Docker image is available at ghcr.io/flexchar/tiktoken-counter.

Purpose

This API is designed to help developers count tokens in a given text, from another service/language, which is particularly useful when working with OpenAI's GPT models, such as GPT-3.5 (ChatGPT) and GPT-4. By utilizing the Tiktoken library, this API provides a simple way to count tokens for specific encodings.

For example, I use it from inside Laravel (php) code base where I need to estimate tokens without calling any APIs outside the server. Docker makes it super easy to expand with the microservice concept.

Building the Docker Image

To build the Docker image, simply run the following command in your terminal:

make build

This command will build the Docker image with the tag tiktoken-counter.

Running the API

To run the API, use the following command:

make run

This command will start the API on port 8000.

Pushing the Docker Image

To push the Docker image to GitHub Container Registry, use the following command:

make push

This command will tag the image with your GitHub username and push it to the registry.

Example Usage

Here is an example of how to use the API for counting tokens:

curl -X POST -H "Content-Type: application/json" -d '{"text": "Hello, World!", "encoding": "cl100k_base"}' http://localhost:8000/count

This request will return the number of tokens in the given text.

{
    "tokens": 5
}

In this example, the text parameter contains the text for which you want to count tokens, and the encoding parameter specifies the encoding used for tokenization. The default encoding is cl100k_base.