-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastApi多模型,多情绪,3090显卡,推理压测并发量超过10个请求,推理程序就会因为使用率100%卡死 #1065
Comments
我自己也不敢并发执行,似乎进程间会对显存有抢占,最后报错,不知道这个项目现在的并发功能是否完善?我自己的并发是做了一个线程池然后API调用,但是会报错 @baddogly 请问你是怎么做的? |
@BOCEAN-FENG 开启流式调用"streaming_mode": true,每一个请求都会占用一部分显存,每一个请求都会均分显卡的算力 |
streaming_mode好像是设置是否流式响应的吧,我看了一下fast_inference的分支,下面正好有个参数 |
想问下大佬怎么修改代码可以支持多模型多情绪,感谢🙏 |
多模型的话得随时切换,比如现在是角色A那就切换到角色A的模型,角色B就切换到B的模型,多情绪这一块得为每个角色喜怒哀乐都准备一段参考音频了 |
|
代码里面/set_gpt_weights、/set_sovits_weights这俩接口加载到全局变量里,然后推理的地方用的地方也从全局变量里面拿就可以了 @TC10127 @BOCEAN-FENG |
大佬方便分享下代码么🙏 |
GPU原生就是并行计算,设置多个worker实现多实例进行并发,会导致GPU的计算资源抢占,反而会降低每个请求的平均响应。 |
压测并发数:20,约在5分钟时因为显卡使用率100%,导致程序卡死
推理参数:
{
"text": "优化设置,降低游戏的图形设置或调整应用程序的性能参数,以减轻GPU的负担关闭不必要的后台程序",
"prompt_lang": "zh",
"top_k": 5,
"top_p": 1,
"temperature": 1,
"text_split_method": "cut5",
"batch_size": 1,
"batch_threshold": 0.75,
"split_bucket": true,
"speed_factor": 1.0,
"fragment_interval": 0.3,
"seed": -1,
"media_type": "wav",
"streaming_mode": true,
"parallel_infer": true,
"repetition_penalty": 1.35
}
The text was updated successfully, but these errors were encountered: