Skip to content

Serve pytorch inference requests using batching with redis for faster performance.

Notifications You must be signed in to change notification settings

SABER-labs/torch_batcher

Repository files navigation

Torch Batcher

Serve batched requests using redis, can scale linearly by increasing the number of workers per device and along devices.

Dependencies

Usage

  • For Linear Scaling, start nvidia-cuda-mps-control, Check Section 2.1.1 GPU utilization for details.

    nvidia-cuda-mps-control -d # To start
    
    # To exit mps after stoping the server do.
    nvidia-cuda-mps-control # Will enter the command prompt
    quit # enter command to quit
  • Start Redis

    redis-server --save "" --appendonly no
  • Start Batch-Serving

    supervisord -c supervisor.conf # Start 3 workers on a single gpu
  • Start Batch benchmark

    python3 bench_batched.py

About

Serve pytorch inference requests using batching with redis for faster performance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages