Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to have it always listening without start/stop buttons? #73

Open
ElectroGamesDev opened this issue Feb 2, 2024 · 5 comments
Open
Labels
question Further information is requested

Comments

@ElectroGamesDev
Copy link

I would like to add voice commands to my game, how could I have it always listening without having to click a start and stop button?
I did try checking the volume from the mic using audioClip.GetData, but that seems to break after I run microphoneRecord.StartRecord().
How could this be done? Thanks!

@Macoron
Copy link
Owner

Macoron commented Feb 2, 2024

Right now, there are two ways you can do that:

  1. Use streaming input. Start it once and listen to events OnSegmentUpdated, OnSegmentFinished. Check the example scene here for more details.
  2. Use circular microphone buffer. There is a really good implementation in this PR Added voice commands demo. #52 with build-in commands detection.

The third option would be to use simpler network to detect activation word (like "Alexa" or "Siri") and only start whisper speech recognition after that. However, there is no build-in solution for word spotter.

@Macoron Macoron added the question Further information is requested label Feb 2, 2024
@ElectroGamesDev
Copy link
Author

Thanks.
I checked both of them out, although I'm encountering issues with both solutions.

With the Streaming Input solution, it seems the streaming stops after it being enabled for ~1 minute, so OnStreamFinished I tried stopping the recording and starting the stream and recording so then when ever the streaming stops, it will be started back up, but this caused it to constantly be stopping and starting after the initial streaming stop after ~1 minute. This solution also seemed to freeze the editor every so often. Also after the first few segments, it started taking like 20 seconds to run the OnFinishSegment() despite me only talking for a second and it only taking 1-2 seconds when it was first started.

With the second solution, I tried out the Voice Commands Demo PR, but its very delayed. Sometimes it was taking 2 seconds to complete the inferencing, other times it took 12 seconds, although your test video seems to be nearly instant. I'm sure your PC specs are better than mine, but 12 seconds to inference two words doesn't seem right.

@Macoron
Copy link
Owner

Macoron commented Feb 2, 2024

With the Streaming Input solution, it seems the streaming stops after it being enabled for ~1 minute, so OnStreamFinished I tried stopping the recording and starting the stream and recording so then when ever the streaming stops, it will be started back up, but this caused it to constantly be stopping and starting after the initial streaming stop after ~1 minute.

Streaming example scene should have Loop mode set to true in MicrophoneRecord script. It allows you to record audio for more than 1 minute (Max Length Sec parameter). Double check if it's set to true.

This solution also seemed to freeze the editor every so often. Also after the first few segments, it started taking like 20 seconds to run the OnFinishSegment() despite me only talking for a second and it only taking 1-2 seconds when it was first started.

What model weights do you use (tiny, base, large, etc)? Could you share your hardware specs? Do you use CPU or GPU inference?

@ElectroGamesDev
Copy link
Author

Ah, I never noticed that Loop option, that should fix the issue.

I'm using the Tiny model, my CPU is a Ryzen 5 2600 and GTX 970 GPU (obviously not the best specs, but it shouldn't deliver such unreliable results like it is), and I'm using what ever is default, I don't see an option to set it to use GPU or CPU.

@Macoron
Copy link
Owner

Macoron commented Feb 3, 2024

You can try to use CUDA inference. It might be faster on your hardware, but you would need to install CUDA toolkit.

You can also try to enable "Speed Up" setting in WhisperManager script. It could give better performance by slightly reducing quality.

Finally, you can play around with streaming settings, like StepSec or LengthSec in WhisperManager. Maybe you will find configuration which works better for your use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants