You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a piper http server comes under heavy load, GPU memory usage can spike up multiple GBs and remain high until the server is stopped. Sometimes requests can get OOM errors if memory usage increases too much.
I'm not sure if these are bugs or expected behaviors:
Why does memory usage permanently remain high and not decrease if inference load decreases?
Initial memory usage for a loaded model is around 500MB, much larger than a low/medium quality model itself (50MB)
To recreate, run the http_server and serve it high requests per second: python3 -m piper.http_server -m en_US-lessac-medium --cuda --port 6000
The text was updated successfully, but these errors were encountered:
If a piper http server comes under heavy load, GPU memory usage can spike up multiple GBs and remain high until the server is stopped. Sometimes requests can get OOM errors if memory usage increases too much.
I'm not sure if these are bugs or expected behaviors:
To recreate, run the http_server and serve it high requests per second:
python3 -m piper.http_server -m en_US-lessac-medium --cuda --port 6000
The text was updated successfully, but these errors were encountered: