Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying for residency - "speech-to-text" first-party plugin **open source** (will update later) #126

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

swooshcode
Copy link

Nigel Phillips a.k.a. Swooshcode
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips
https://discord.com/channels/1092243196446249134/1104075991191666899/1104086687040163941

Files requested to be merged(locations):

  1. .github/CODEOWNERS
  2. /src/autogpt_plugins/speech_to_text/README.md
  3. /src/autogpt_plugins/speech_to_text/init.py
  4. /src/autogpt_plugins/speech_to_text/speech_to_text_plugin/speech_to_text_plugin.py
  5. /src/autogpt_plugins/speech_to_text/speech_to_text_plugin/test_speech_to_text_plugin.py

  1. CODEOWNERS
    /src/autogpt_plugins/speech_to_text @swooshcode

  1. README.md
    Changes:
    This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.

Table of Contents

  1. Speech-to-Text Plugin
  2. Installation
  3. Contributing
  4. License

Speech-to-Text Plugin

The speech-to-text plugin allows users to transcribe spoken input in real-time and feed the transcribed text into the AutoGPT model for processing. This plugin uses the Google Cloud Speech-to-Text API for transcription and PyAudio for real-time audio recording from the user's microphone.

Features

  • Real-time audio recording from the user's microphone
  • Transcription of spoken input using Google Cloud Speech-to-Text API
  • Integration with the AutoGPT model for processing transcribed text

Usage

  1. Set up the Google Cloud Speech-to-Text API and obtain your API credentials as a JSON file.
  2. Update the speech_to_text_plugin.py file to use the correct path to your API credentials.
  3. Install the required dependencies: pip install google-cloud-speech pyaudio
  4. Run the speech_to_text_plugin.py file to start recording and transcribing audio input.

Installation

To install the plugins, follow these steps:

  1. Clone this repository: git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git
  2. Navigate to the src/autogpt_plugins directory.
  3. Install the required dependencies for each plugin as specified in their respective README files or source code comments.

Contributing

Nigel Phillips a.k.a. Swooshcode
Software Developer
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips

License

MIT License

Copyright (c) 2023 Toran Bruce Richards

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


  1. init.py
    The __init__.py file contains the SpeechToTextPlugin class, which is an extension of the AutoGPTPluginTemplate class. This plugin is designed to transcribe spoken input in real-time and process it through the AutoGPT model. The class includes a constructor that initializes the plugin's name, version, and description.

The SpeechToTextPlugin class also implements methods such as can_handle_post_prompt, post_prompt, and other methods based on your requirements. These methods enable the plugin to interact with the AutoGPT model and add functionalities like transcribing spoken input, processing the transcribed text, and handling responses.

The plugin integrates with the AutoGPT model by adding commands and functionalities using the PromptGenerator class. The __init__.py file also imports the transcribe_audio function from the speech_to_text_plugin.py file to transcribe spoken input.


  1. speech_to_text_plugin.py
    This code uses the PyAudio library to record audio from your built-in microphone in real-time and transcribe it using the Google Cloud Speech-to-Text API. The transcribed text is then processed by your AutoGPT model. To use the PyAudio library, you'll need to install it by entering on your command line(need to have bash first):

pip install pyaudio

Please note that on Mac M1, you may need to follow additional installation steps for the PyAudio library due to compatibility issues. You can find a solution here.

Once you have the necessary dependencies installed, you can run the updated code to test the real-time speech-to-text transcription and integration with your AutoGPT model.

My use case is real-time voice commands with Google Cloud Speech-to-Text that provides low-latency transcription, as compared to conventional Speech-to-text services such as the Hugging Face Audio to text model. It (conventional models) cannot transcribe spoken input in near real-time nor is it more suitable for my use case. Hugging Face Audio to text models require training to learn Legal and Medical terminology. Conventional models require fine-tuning on a domain-specific corpus of speech data. For example, if your voice commands are related to finance, you must fine-tune the model on a corpus of financial speech data. Using Google Cloud Speech-to-Text is versatile and already developed.


  1. test_speech_to_text_plugin.py
    This test suite contains two unit tests:

test_transcribe_streaming: This test checks the transcribe_streaming function by mocking the Google Cloud Speech-to-Text API response and verifying that the returned transcript is correct.
test_process_transcribed_text: This test checks the process_transcribed_text function by mocking your AutoGPT model's process_input function and verifying that the returned processed text is correct.
To run the test suite, simply execute the test_speech_to_text_plugin.py file. Note that due to the real-time audio recording nature of the plugin, it might be challenging to write a test for the record_audio function. Therefore, manual testing of the complete system is recommended to ensure the proper functioning of audio recording, transcription, and processing.

portal-140902.mp4

@swooshcode swooshcode requested review from a team May 6, 2023 13:37
@swooshcode
Copy link
Author

swooshcode commented May 6, 2023

updated "/src/autogpt_plugins/speech_to_text/init.py" after initial PR:


from typing import Any, Dict, List, Optional, Tuple, TypedDict, TypeVar

from auto_gpt_plugin_template import AutoGPTPluginTemplate
from .speech_to_text_plugin import transcribe_audio

PromptGenerator = TypeVar("PromptGenerator")

class SpeechToTextPlugin(AutoGPTPluginTemplate):
"""
This is the Auto-GPT Speech-to-Text plugin.
"""

def __init__(self):
    super().__init__()
    self._name = "Auto-GPT-Speech-to-Text-Plugin"
    self._version = "0.0.1"
    self._description = "Auto-GPT Speech-to-Text Plugin: Transcribe spoken input in real-time."

def can_handle_post_prompt(self) -> bool:
    return True

def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
    prompt.add_command(
        "Transcribe spoken input",
        "transcribe_audio",
        {
            "audio": "<audio>",
        },
        transcribe_audio,
    )
    return prompt

# Add more methods as needed, such as can_handle_on_response, on_response, etc.

.DS_Store Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove and add to .gitignore

src/.DS_Store Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


To install the plugins, follow these steps:

1. Clone this repository: `git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Swap this to the Significant-Gravitas repo

src/autogpt_plugins/speech_to_text/README.md Outdated Show resolved Hide resolved
Comment on lines +22 to +33
def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
prompt.add_command(
"Transcribe spoken input",
"transcribe_audio",
{
"audio": "<audio>",
},
transcribe_audio,
)
return prompt

# Add more methods as needed, such as can_handle_on_response, on_response, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a prompt command that the AI calls or should it replace keyboard entry?

Comment on lines +22 to +31
def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
prompt.add_command(
"Transcribe spoken input",
"transcribe_audio",
{
"audio": "<audio>",
},
transcribe_audio,
)
return prompt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only register if the environment variables you need exist

from google.cloud.speech_v1p1beta1 import types
import autogpt

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/credentials.json'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be fixed up a bit. Look at other examples in the repo of how we read things in

Copy link
Member

@ntindle ntindle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this load successfully for you? I'm not seeing several of the required methods implemented

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove file, add to .gitignore

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @ntindle ,

Thank you for your suggestions and your guidance. Your experience is a valuable part of this project. The next steps to bring this plug-in to life would be to reopen the closed pull requests and edit the original codebase with our suggestions. Please respond at your earliest convenience. I look forward to seeing the pull requests reopened. Thanks!

Best regards,

Nigel Phillips.
Founder, FRAME TECH SOLUTIONS LTD., CO. 框架技術解決方案

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@@ -0,0 +1,65 @@
# Changes:
This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One plugin per folder, clarify this to be only for speech-to-text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants