Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows + python3.9 + OpenVoice v2 = not possible without CUDA? #188

Open
dusekdan opened this issue Apr 26, 2024 · 4 comments
Open

Windows + python3.9 + OpenVoice v2 = not possible without CUDA? #188

dusekdan opened this issue Apr 26, 2024 · 4 comments

Comments

@dusekdan
Copy link

Hi,

I followed the windows installation guide, and tried both on latest python 3.12 and 3.9.12 (per recommendation from the guide for python to be 3.9).

When I attempt to run the v2 example from demo_part3.ipynb, I am getting error originating from the following line:

target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)

This is the error:

Traceback (most recent call last):
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\demov2_.py", line 23, in <module>
    target_se, audio_name = se_extractor.get_se(reference_speaker, tone_color_converter, vad=False)
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\openvoice\se_extractor.py", line 146, in get_se
    wavs_folder = split_audio_whisper(audio_path, target_dir=target_dir, audio_name=audio_name)
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\openvoice\se_extractor.py", line 22, in split_audio_whisper
    model = WhisperModel(model_size, device="cuda", compute_type="float16")
  File "C:\Users\user\Source\VoiceCloningTests\OpenVoice\env39\lib\site-packages\faster_whisper\transcribe.py", line 128, in __init__
    self.model = ctranslate2.models.Whisper(
RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

In the beginning of my script - I follow the demo and setup my device variable in the same way to fallback to cpu. When I open the se_extractor.py file from the error above, I see that a device is hardcoded to be CUDA. And I end up with the error above.

My device is using integrated graphics from Intel, so afaik it is not even CUDA-enabled. Does this mean I cannot run OpenVoice v2, without NVidia graphics?

This is the code from the library - se_extractor.py - with hardcoded cuda string, that raises the issue:

def split_audio_whisper(audio_path, audio_name, target_dir='processed'):
    global model
    if model is None:
        model = WhisperModel(model_size, device="cuda", compute_type="float16")
    # ... 
@jicka
Copy link

jicka commented Apr 26, 2024

Hello,

I just ran into the same issue. I replaced the line in question with:
model = WhisperModel(model_size, device="cpu", compute_type="float32")
And it works. Hope it helps you too !

@dusekdan
Copy link
Author

dusekdan commented Apr 26, 2024

Lol, just came back to say I iterated towards the same solution and got it working. Thanks for the tip.

For all those who will come to this issue looking for the same, here's how I iterated towards the solution:

  1. I looked through the files raising the error. In code above, you can see it comes from WhisperModel.
  2. I located the WhisperModel in my environment (installed by pip into virtualenv) in venv/Lib/site-packages/faster_whisper - I know it's in the faster_whisper module, because the WhisperModel is imported in the very beginning of the example from this module.
  3. Error came from transcribe.py, so that's the file I open and look for class definition

Now there are two parameters of interest there - device and compute_type. When I previously just tried hardcoding cpu for device, I would end up being told float16 is unsupported. So, my line of thinking was that I will look for what other types are supported and try their combinations.

Line 91 in transcribe contains a really long comment, out of which I will take out the most important parts:

"""
Initializes the Whisper model.

        Args:
          [...]
          device: Device to use for computation ("cpu", "cuda", "auto").
          compute_type: Type to use for computation.
            See https://opennmt.net/CTranslate2/quantization.html.
          [...]
"""

Quantization link provides a reference table of implicit type conversions on load, in which I was able to look up what is the default value for CPU for float16, in my architecture (Intel, x64, it is float32).

I changed the corresponding line to hardcode cpu for device variable and float32 for compute_type and got result on the output.

Reference table for your convenience:
image

@jicka
Copy link

jicka commented Apr 26, 2024

haha you went about it much more professionally than I did :)
Happy we both found a solution.

@mambari
Copy link

mambari commented Apr 28, 2024

Hello, I have the same error even with the fix :
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants