-
Notifications
You must be signed in to change notification settings - Fork 562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applying for residency - "speech-to-text" first-party plugin **open source** (will update later) #126
base: master
Are you sure you want to change the base?
Conversation
Create test_speech_to_text_plugin.py
Create speech_to_text_plugin.py
Revert "Create speech_to_text_plugin.py"
Create speech_to_text_plugin.py
updated "/src/autogpt_plugins/speech_to_text/init.py" after initial PR: from typing import Any, Dict, List, Optional, Tuple, TypedDict, TypeVar from auto_gpt_plugin_template import AutoGPTPluginTemplate PromptGenerator = TypeVar("PromptGenerator") class SpeechToTextPlugin(AutoGPTPluginTemplate):
|
.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove and add to .gitignore
src/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
src/autogpt_plugins/.DS_Store
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
|
||
To install the plugins, follow these steps: | ||
|
||
1. Clone this repository: `git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap this to the Significant-Gravitas
repo
def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator: | ||
prompt.add_command( | ||
"Transcribe spoken input", | ||
"transcribe_audio", | ||
{ | ||
"audio": "<audio>", | ||
}, | ||
transcribe_audio, | ||
) | ||
return prompt | ||
|
||
# Add more methods as needed, such as can_handle_on_response, on_response, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a prompt command that the AI calls or should it replace keyboard entry?
def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator: | ||
prompt.add_command( | ||
"Transcribe spoken input", | ||
"transcribe_audio", | ||
{ | ||
"audio": "<audio>", | ||
}, | ||
transcribe_audio, | ||
) | ||
return prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only register if the environment variables you need exist
from google.cloud.speech_v1p1beta1 import types | ||
import autogpt | ||
|
||
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/credentials.json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be fixed up a bit. Look at other examples in the repo of how we read things in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this load successfully for you? I'm not seeing several of the required methods implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove file, add to .gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @ntindle ,
Thank you for your suggestions and your guidance. Your experience is a valuable part of this project. The next steps to bring this plug-in to life would be to reopen the closed pull requests and edit the original codebase with our suggestions. Please respond at your earliest convenience. I look forward to seeing the pull requests reopened. Thanks!
Best regards,
Nigel Phillips.
Founder, FRAME TECH SOLUTIONS LTD., CO. 框架技術解決方案
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
@@ -0,0 +1,65 @@ | |||
# Changes: | |||
This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One plugin per folder, clarify this to be only for speech-to-text
Nigel Phillips a.k.a. Swooshcode
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips
https://discord.com/channels/1092243196446249134/1104075991191666899/1104086687040163941
Files requested to be merged(locations):
/src/autogpt_plugins/speech_to_text @swooshcode
Changes:
This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.
Table of Contents
Speech-to-Text Plugin
The speech-to-text plugin allows users to transcribe spoken input in real-time and feed the transcribed text into the AutoGPT model for processing. This plugin uses the Google Cloud Speech-to-Text API for transcription and PyAudio for real-time audio recording from the user's microphone.
Features
Usage
speech_to_text_plugin.py
file to use the correct path to your API credentials.pip install google-cloud-speech pyaudio
speech_to_text_plugin.py
file to start recording and transcribing audio input.Installation
To install the plugins, follow these steps:
git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git
src/autogpt_plugins
directory.Contributing
Nigel Phillips a.k.a. Swooshcode
Software Developer
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips
License
MIT License
Copyright (c) 2023 Toran Bruce Richards
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
The
__init__.py
file contains theSpeechToTextPlugin
class, which is an extension of theAutoGPTPluginTemplate
class. This plugin is designed to transcribe spoken input in real-time and process it through the AutoGPT model. The class includes a constructor that initializes the plugin's name, version, and description.The
SpeechToTextPlugin
class also implements methods such ascan_handle_post_prompt
,post_prompt
, and other methods based on your requirements. These methods enable the plugin to interact with the AutoGPT model and add functionalities like transcribing spoken input, processing the transcribed text, and handling responses.The plugin integrates with the AutoGPT model by adding commands and functionalities using the
PromptGenerator
class. The__init__.py
file also imports thetranscribe_audio
function from thespeech_to_text_plugin.py
file to transcribe spoken input.This code uses the PyAudio library to record audio from your built-in microphone in real-time and transcribe it using the Google Cloud Speech-to-Text API. The transcribed text is then processed by your AutoGPT model. To use the PyAudio library, you'll need to install it by entering on your command line(need to have bash first):
pip install pyaudio
Please note that on Mac M1, you may need to follow additional installation steps for the PyAudio library due to compatibility issues. You can find a solution here.
Once you have the necessary dependencies installed, you can run the updated code to test the real-time speech-to-text transcription and integration with your AutoGPT model.
My use case is real-time voice commands with Google Cloud Speech-to-Text that provides low-latency transcription, as compared to conventional Speech-to-text services such as the Hugging Face Audio to text model. It (conventional models) cannot transcribe spoken input in near real-time nor is it more suitable for my use case. Hugging Face Audio to text models require training to learn Legal and Medical terminology. Conventional models require fine-tuning on a domain-specific corpus of speech data. For example, if your voice commands are related to finance, you must fine-tune the model on a corpus of financial speech data. Using Google Cloud Speech-to-Text is versatile and already developed.
This test suite contains two unit tests:
test_transcribe_streaming: This test checks the transcribe_streaming function by mocking the Google Cloud Speech-to-Text API response and verifying that the returned transcript is correct.
test_process_transcribed_text: This test checks the process_transcribed_text function by mocking your AutoGPT model's process_input function and verifying that the returned processed text is correct.
To run the test suite, simply execute the test_speech_to_text_plugin.py file. Note that due to the real-time audio recording nature of the plugin, it might be challenging to write a test for the record_audio function. Therefore, manual testing of the complete system is recommended to ensure the proper functioning of audio recording, transcription, and processing.
portal-140902.mp4