Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After fresh install of Windows, roop-unleashed is running incredibly slow, and CPU is at 70% usage, GPU ~5%. #638

Open
2 of 9 tasks
Hunt3rseeker opened this issue Apr 26, 2024 · 9 comments

Comments

@Hunt3rseeker
Copy link

Describe the bug
I've been using roop-unleashed for well over a month, and it's been working pretty well. I recently made a fresh install of my Windows, and now after installing everything, the WebUI is incredibly slow, taking many seconds for it to respond. It's as if the webUI is frozen for 10 seconds, then starts working for maybe 3-4 seconds, then going back to being frozen. As soon as I start the bat.file, my CPU's going from 2% usage to 70%, and 38 to 70 celcius. Basically, my computer is lifting some heavy shit going on, but I'm not sure why it acts this way. Something is clearly not right.

After I noticed that roop is not working as intended, I looked at the installation wiki to see if I might have misread something. The following part

CUDA (Nvidia) Install CUDA Toolkit 11.8 and cuDNN for Cuda 11.x https://developer.nvidia.com/cuda-11-8-0-download-archive https://developer.nvidia.com/rdp/cudnn-archive
pip uninstall onnxruntime onnxruntime-gpu pip install onnxruntime-gpu==1.15.1
CoreML Apple Silicon
pip uninstall onnxruntime pip install onnxruntime-silicon
Pip package and custom wheels from:
https://github.com/cansik/onnxruntime-silicon/

was not super clear to me. I thought maybe you're supposed to not only run pip uninstall onnxruntime onnxruntime-gpu, pip install onnxruntime-gpu==1.15.1, but also others. So I also ran pip uninstall onnxruntime pip install onnxruntime-silicon. This seemed to have done nothing. The issue was still the same, and I ran this after I saw that roop didn't run as intended, so I won't include this in the reproducing part.

PS; I have no background in programming at all. I've been learning and teaching myself for about over a month ago, when I found stable diffusion, so I'm quite novice when it comes to computer language and the like. Just a heads up!

To Reproduce
Steps to reproduce the behavior:

  1. Have the following software installed:
    Python 3.10.6, Git 2.44.0.windows.1, Visual Studio 17.9.6, CUDA 11.8.0, cuDNN for CUDA 11.x (9.1.0), ffmpeg version N-114902-g277f051ff6-20240421 (this should be everything, although I could have missed something). Also have CUDA, cuDNN, Python, Git and ffmpeg in PATH Env.

  2. Go to roop-unleashed, then head to the installation in the wiki.

  3. Follow the manual installation as follows:
    git clone https://github.com/C0untFloyd/roop-unleashed
    preferably create a venv or conda environment (created environment with python -m venv /path/to/new/virtual/environment
    cd roop-unleashed
    pip install -r requirements.txt

  4. Run the windows_run.bat file to install roop-unleashed. When complete, run pip uninstall onnxruntime onnxruntime-gpu pip install onnxruntime-gpu==1.15.1 Then run the windows_run.bat file again to start roop-unleashed.

  5. See error

Details
What OS are you using?

  • Linux
  • Linux in WSL
  • Windows (10)
  • Mac

Are you using a GPU?

  • No. CPU FTW
  • NVIDIA
  • AMD
  • Intel
  • Mac

Which version of roop unleashed are you using?
3.9.0

Screenshots
Even though I'm using my GPU (NVIDIA Geforce RTX 3070 ti), as you can see the GPU temp/usage/Vram has hardly changed at all after I launched roop unleashed (where the CPU usage goes from ca 10-20% to 70%). It looks like my CPU is doing all the lifting. I find this strange because I use my GPU with cuda every day when I'm on Stable Diffusion-Forge.
roop

I also caught this showing in my terminal. I'm quite novice when it comes to programming, not clue what it means.

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118
Ignoring torch: markers 'sys_platform == "darwin"' don't match your environment
Ignoring torchvision: markers 'sys_platform == "darwin"' don't match your environment
Ignoring onnxruntime: markers 'sys_platform == "darwin" and platform_machine != "arm64"' don't match your environment
Ignoring onnxruntime-silicon: markers 'sys_platform == "darwin" and platform_machine == "arm64"' don't match your environment

I believe the code below is part of the launch, showing my providers. Not sure if should look like this though.

o create a public link, set `share=True` in `launch()`.
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'cudnn_conv_algo_search': 'EXHAUSTIVE', 'device_id': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'has_user_compute_stream': '0', 'gpu_external_alloc': '0', 'enable_cuda_graph': '0', 'gpu_mem_limit': '18446744073709551615', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0'}, 'CPUExecutionProvider': {}}
find model: C:\Stable_Diffusion\roop-unleashed\installer\roop-unleashed\models\buffalo_l\1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'cudnn_conv_algo_search': 'EXHAUSTIVE', 'device_id': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'has_user_compute_stream': '0', 'gpu_external_alloc': '0', 'enable_cuda_graph': '0', 'gpu_mem_limit': '18446744073709551615', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0'}, 'CPUExecutionProvider': {}}
find model: C:\Stable_Diffusion\roop-unleashed\installer\roop-unleashed\models\buffalo_l\2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'cudnn_conv_algo_search': 'EXHAUSTIVE', 'device_id': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'has_user_compute_stream': '0', 'gpu_external_alloc': '0', 'enable_cuda_graph': '0', 'gpu_mem_limit': '18446744073709551615', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0'}, 'CPUExecutionProvider': {}}
find model: C:\Stable_Diffusion\roop-unleashed\installer\roop-unleashed\models\buffalo_l\det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'cudnn_conv_algo_search': 'EXHAUSTIVE', 'device_id': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'has_user_compute_stream': '0', 'gpu_external_alloc': '0', 'enable_cuda_graph': '0', 'gpu_mem_limit': '18446744073709551615', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0'}, 'CPUExecutionProvider': {}}
find model: C:\Stable_Diffusion\roop-unleashed\installer\roop-unleashed\models\buffalo_l\genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CUDAExecutionProvider', 'CPUExecutionProvider'], with options: {'CUDAExecutionProvider': {'cudnn_conv_algo_search': 'EXHAUSTIVE', 'device_id': '0', 'cudnn_conv1d_pad_to_nc1d': '0', 'has_user_compute_stream': '0', 'gpu_external_alloc': '0', 'enable_cuda_graph': '0', 'gpu_mem_limit': '18446744073709551615', 'gpu_external_free': '0', 'gpu_external_empty_cache': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'do_copy_in_default_stream': '1', 'cudnn_conv_use_max_workspace': '1', 'tunable_op_enable': '0', 'tunable_op_tuning_enable': '0', 'tunable_op_max_tuning_duration_ms': '0', 'enable_skip_layer_norm_strict_mode': '0'}, 'CPUExecutionProvider': {}}
find model: C:\Stable_Diffusion\roop-unleashed\installer\roop-unleashed\models\buffalo_l\w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
@lysxelapsed
Copy link

As far as I understand, you did a manual install and an automatic with the windows_run.bat. It's quite possible, that this has things messed up in your environment variables or elsewhere. You don't need to install python / git / ffmpeg when using the windows_run.bat. Do either step 1/3 or 4 from your steps, not both. If you do a manual installation (do the git clone inside of the activated venv, as explained in first part of #210), you need to start it with python run.py (or create a simple bat-file, also explained in #210 at the end), not the windows_run.bat.

My suggestion: Start fresh. Delete the roop-unleashed folder(s).
Either uninstall python / git / ffmpeg (+ remove the path entry) - leave visual studio, cuda and cudnn as it is - and just run the windows_run.bat.
Or: Do a manual install with venv / git clone and run it with python run.py
The "don't match your environment" errors you can ignore.
I never had to reinstall the onnx runtime. Only do the pip uninstall onnxruntime onnxruntime-gpu pip install onnxruntime-gpu==1.15.1 if cuda doesn't show up under providers in the settings tab.

Last, but not least: No matter if you do a manual or automatic install, do it in a short folder in C: or whereever (C:\roop for example) to avoid problems with windows' 256 character path limit.

@Hunt3rseeker
Copy link
Author

As far as I understand, you did a manual install and an automatic with the windows_run.bat. It's quite possible, that this has things messed up in your environment variables or elsewhere. You don't need to install python / git / ffmpeg when using the windows_run.bat. Do either step 1/3 or 4 from your steps, not both. If you do a manual installation (do the git clone inside of the activated venv, as explained in first part of #210), you need to start it with python run.py (or create a simple bat-file, also explained in #210 at the end), not the windows_run.bat.

My suggestion: Start fresh. Delete the roop-unleashed folder(s). Either uninstall python / git / ffmpeg (+ remove the path entry) - leave visual studio, cuda and cudnn as it is - and just run the windows_run.bat. Or: Do a manual install with venv / git clone and run it with python run.py The "don't match your environment" errors you can ignore. I never had to reinstall the onnx runtime. Only do the pip uninstall onnxruntime onnxruntime-gpu pip install onnxruntime-gpu==1.15.1 if cuda doesn't show up under providers in the settings tab.

Last, but not least: No matter if you do a manual or automatic install, do it in a short folder in C: or whereever (C:\roop for example) to avoid problems with windows' 256 character path limit.

Thank you for replying!

It may have been vital information to point out, but I already had python / git / ffmpeg installed, which I use for stable diffusion forge. Doesn't windows recognize that I already have certain parts, like python / git / ffmpeg, and therefore skip them?

Anyway, I'm gonna try installing it fresh via your steps. I'll be back!

@lysxelapsed
Copy link

Alright, if you already have all that installed for stable diffusion, don't bother with the windows_run.bat, do a manual install and run it with python run.py. It's definitely cleaner.
Report back if you encounter further problems 😉

@Hunt3rseeker
Copy link
Author

Alright, if you already have all that installed for stable diffusion, don't bother with the windows_run.bat, do a manual install and run it with python run.py. It's definitely cleaner. Report back if you encounter further problems 😉

Alright, said and done. Managed to get everything right with the manual installation, after running into some minor problems (like not understanding that cd.. is actually a command lol). Roop is functioning better this time, but only around 5-10% better. Starting faster, a little bit less lag in the webUI, but my processor is still at 70% use and 70 celcius, while my GPU is running nothing at all. That seems to be my only problem at the moment.

I'm thinking that maybe I should reinstall python, git and ffmpeg, so I get a real clean install. Apart from that, I'm out of ideas 😩

@lysxelapsed
Copy link

lysxelapsed commented Apr 26, 2024

cd stands for "change directory" 😉 With the ".." it changes to the parent directory.
Now let's figure out, why it's not using your GPU, I have some additional questions:

  • Do you have an additional integrated GPU?
  • Are you trying to run your stable diffusion stuff at the same time?
  • How many threads are you running? With your 3070ti (8 GB VRAM, right?) you should limit the threads to 1 or 2.
  • Is cuda selected as provider in the settings tab?
  • What does your terminal output say? It should look like this:
    cuda

@Hunt3rseeker
Copy link
Author

cd stands for "change directory" 😉 With the ".." it changes to the parent directory. Now let's figure out, why it's not using your GPU, I have some additional questions:

  • Do you have an additional integrated GPU?
  • Are you trying to run your stable diffusion stuff at the same time?
  • How many threads are you running? With your 3070ti (8 GB VRAM, right?) you should limit the threads to 1 or 2.
  • Is cuda selected as provider in the settings tab?
  • What does your terminal output say? It should look like this:
    cuda

Alright!

  • I only have one GPU.
  • I'm only running roop-unleashed. I've even been closing all other programs to see if it improved, with no success.
  • I was running 2 threads at first, then I changed it to 3 to see if something would improve. It did not, and then I kinda forgot about it 😎 I'll put it on 1 and see if that's better.
  • Yes, Cuda is selected in the settings tab.
  • Yeah, Cuda is activated in my terminal. In fact, it's almost the only thing showing up in my terminal. If I recall correctly, there should be more lines when launching roop, no?
    launchroop

@lysxelapsed
Copy link

Yeah, Cuda is activated in my terminal. In fact, it's almost the only thing showing up in my terminal. If I recall correctly, there should be more lines when launching roop, no?

No, that what it's supposed to look like. Everything in your screenshots looks fine actually.
Any luck with 1 thread? I think that will be your limit with most enhancers. I'm at 9.7GB total VRAM usage with 2 threads and Codeformer. That would go over your available mount an the and the speed / gpu usage tanks, like explained in the FAQ.
Aside from that I'm out of ideas 🤷‍♂️

@Verdufake
Copy link

I am experiencing the same problem, did you find a solution? Thank you

@Hunt3rseeker
Copy link
Author

I am experiencing the same problem, did you find a solution? Thank you

Unfortunatly I did not solve the problem. Instead, I moved on to the Reactor extension for Stable Diffusion. You can get some really good results with it, if you tweak the settings. I recommend you checking it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants