Applying for residency - "speech-to-text" first-party plugin open source (will update later) #126

swooshcode · 2023-05-06T13:37:29Z

Nigel Phillips a.k.a. Swooshcode
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips
https://discord.com/channels/1092243196446249134/1104075991191666899/1104086687040163941

Files requested to be merged(locations):

.github/CODEOWNERS
/src/autogpt_plugins/speech_to_text/README.md
/src/autogpt_plugins/speech_to_text/init.py
/src/autogpt_plugins/speech_to_text/speech_to_text_plugin/speech_to_text_plugin.py
/src/autogpt_plugins/speech_to_text/speech_to_text_plugin/test_speech_to_text_plugin.py

CODEOWNERS
/src/autogpt_plugins/speech_to_text @swooshcode

README.md
Changes:
This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.

Speech-to-Text Plugin

The speech-to-text plugin allows users to transcribe spoken input in real-time and feed the transcribed text into the AutoGPT model for processing. This plugin uses the Google Cloud Speech-to-Text API for transcription and PyAudio for real-time audio recording from the user's microphone.

Features

Real-time audio recording from the user's microphone
Transcription of spoken input using Google Cloud Speech-to-Text API
Integration with the AutoGPT model for processing transcribed text

Usage

Set up the Google Cloud Speech-to-Text API and obtain your API credentials as a JSON file.
Update the speech_to_text_plugin.py file to use the correct path to your API credentials.
Install the required dependencies: pip install google-cloud-speech pyaudio
Run the speech_to_text_plugin.py file to start recording and transcribing audio input.

Installation

To install the plugins, follow these steps:

Clone this repository: git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git
Navigate to the src/autogpt_plugins directory.
Install the required dependencies for each plugin as specified in their respective README files or source code comments.

Contributing

Nigel Phillips a.k.a. Swooshcode
Software Developer
Founder of Frame Tech Solutions Ltd., Co. 框架技術解決方案
For inquiries: https://tinyurl.com/nigelphillips

License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

init.py
The __init__.py file contains the SpeechToTextPlugin class, which is an extension of the AutoGPTPluginTemplate class. This plugin is designed to transcribe spoken input in real-time and process it through the AutoGPT model. The class includes a constructor that initializes the plugin's name, version, and description.

The SpeechToTextPlugin class also implements methods such as can_handle_post_prompt, post_prompt, and other methods based on your requirements. These methods enable the plugin to interact with the AutoGPT model and add functionalities like transcribing spoken input, processing the transcribed text, and handling responses.

The plugin integrates with the AutoGPT model by adding commands and functionalities using the PromptGenerator class. The __init__.py file also imports the transcribe_audio function from the speech_to_text_plugin.py file to transcribe spoken input.

speech_to_text_plugin.py
This code uses the PyAudio library to record audio from your built-in microphone in real-time and transcribe it using the Google Cloud Speech-to-Text API. The transcribed text is then processed by your AutoGPT model. To use the PyAudio library, you'll need to install it by entering on your command line(need to have bash first):

pip install pyaudio

Please note that on Mac M1, you may need to follow additional installation steps for the PyAudio library due to compatibility issues. You can find a solution here.

Once you have the necessary dependencies installed, you can run the updated code to test the real-time speech-to-text transcription and integration with your AutoGPT model.

My use case is real-time voice commands with Google Cloud Speech-to-Text that provides low-latency transcription, as compared to conventional Speech-to-text services such as the Hugging Face Audio to text model. It (conventional models) cannot transcribe spoken input in near real-time nor is it more suitable for my use case. Hugging Face Audio to text models require training to learn Legal and Medical terminology. Conventional models require fine-tuning on a domain-specific corpus of speech data. For example, if your voice commands are related to finance, you must fine-tune the model on a corpus of financial speech data. Using Google Cloud Speech-to-Text is versatile and already developed.

test_speech_to_text_plugin.py
This test suite contains two unit tests:

test_transcribe_streaming: This test checks the transcribe_streaming function by mocking the Google Cloud Speech-to-Text API response and verifying that the returned transcript is correct.
test_process_transcribed_text: This test checks the process_transcribed_text function by mocking your AutoGPT model's process_input function and verifying that the returned processed text is correct.
To run the test suite, simply execute the test_speech_to_text_plugin.py file. Note that due to the real-time audio recording nature of the plugin, it might be challenging to write a test for the record_audio function. Therefore, manual testing of the complete system is recommended to ensure the proper functioning of audio recording, transcription, and processing.

portal-140902.mp4

Create test_speech_to_text_plugin.py

Create speech_to_text_plugin.py

Revert "Create speech_to_text_plugin.py"

Create speech_to_text_plugin.py

swooshcode · 2023-05-06T13:45:11Z

updated "/src/autogpt_plugins/speech_to_text/init.py" after initial PR:

from typing import Any, Dict, List, Optional, Tuple, TypedDict, TypeVar

from auto_gpt_plugin_template import AutoGPTPluginTemplate
from .speech_to_text_plugin import transcribe_audio

PromptGenerator = TypeVar("PromptGenerator")

class SpeechToTextPlugin(AutoGPTPluginTemplate):
"""
This is the Auto-GPT Speech-to-Text plugin.
"""

def __init__(self):
    super().__init__()
    self._name = "Auto-GPT-Speech-to-Text-Plugin"
    self._version = "0.0.1"
    self._description = "Auto-GPT Speech-to-Text Plugin: Transcribe spoken input in real-time."

def can_handle_post_prompt(self) -> bool:
    return True

def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
    prompt.add_command(
        "Transcribe spoken input",
        "transcribe_audio",
        {
            "audio": "<audio>",
        },
        transcribe_audio,
    )
    return prompt

# Add more methods as needed, such as can_handle_on_response, on_response, etc.

ntindle · 2023-05-10T02:01:06Z

.DS_Store

Remove and add to .gitignore

ntindle · 2023-05-10T02:01:15Z

src/.DS_Store

ntindle · 2023-05-10T02:01:18Z

src/autogpt_plugins/.DS_Store

ntindle · 2023-05-10T02:01:21Z

src/autogpt_plugins/speech_to_text/.DS_Store

ntindle · 2023-05-10T02:01:54Z

src/autogpt_plugins/speech_to_text/README.md

+
+To install the plugins, follow these steps:
+
+1. Clone this repository: `git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git`


Swap this to the Significant-Gravitas repo

src/autogpt_plugins/speech_to_text/README.md

ntindle · 2023-05-10T02:03:10Z

src/autogpt_plugins/speech_to_text/__init__.py

+ def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
+ prompt.add_command(
+ "Transcribe spoken input",
+ "transcribe_audio",
+ {
+ "audio": "<audio>",
+ },
+ transcribe_audio,
+ )
+ return prompt
+
+ # Add more methods as needed, such as can_handle_on_response, on_response, etc.


Should this be a prompt command that the AI calls or should it replace keyboard entry?

ntindle · 2023-05-10T02:03:41Z

src/autogpt_plugins/speech_to_text/__init__.py

+ def post_prompt(self, prompt: PromptGenerator) -> PromptGenerator:
+ prompt.add_command(
+ "Transcribe spoken input",
+ "transcribe_audio",
+ {
+ "audio": "<audio>",
+ },
+ transcribe_audio,
+ )
+ return prompt


Only register if the environment variables you need exist

ntindle · 2023-05-10T02:04:12Z

src/autogpt_plugins/speech_to_text/speech_to_text_plugin/speech_to_text_plugin.py

+from google.cloud.speech_v1p1beta1 import types
+import autogpt
+
+os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/credentials.json'


This line should be fixed up a bit. Look at other examples in the repo of how we read things in

Co-authored-by: Nicholas Tindle <[email protected]>

resolved

resolved.

ntindle

Does this load successfully for you? I'm not seeing several of the required methods implemented

ntindle · 2023-05-30T15:45:22Z

.DS_Store

Remove file, add to .gitignore

Dear @ntindle ,

Thank you for your suggestions and your guidance. Your experience is a valuable part of this project. The next steps to bring this plug-in to life would be to reopen the closed pull requests and edit the original codebase with our suggestions. Please respond at your earliest convenience. I look forward to seeing the pull requests reopened. Thanks!

Best regards,

Nigel Phillips.
Founder, FRAME TECH SOLUTIONS LTD., CO. 框架技術解決方案

ntindle · 2023-05-30T15:45:31Z

src/.DS_Store

ntindle · 2023-05-30T15:45:35Z

src/autogpt_plugins/.DS_Store

ntindle · 2023-05-30T15:45:40Z

src/autogpt_plugins/speech_to_text/.DS_Store

ntindle · 2023-05-30T15:46:10Z

src/autogpt_plugins/speech_to_text/README.md

@@ -0,0 +1,65 @@
+# Changes: 
+This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.


One plugin per folder, clarify this to be only for speech-to-text

swooshcode and others added 12 commits May 6, 2023 08:10

Create test_speech_to_text_plugin.py

eafac8f

Merge pull request #1 from swooshcode/swooshcode-patch-2

1d1ecdb

Create test_speech_to_text_plugin.py

Create speech_to_text_plugin.py

578406c

Merge pull request #2 from swooshcode/swooshcode-patch-1

c39a08e

Create speech_to_text_plugin.py

Revert "Create speech_to_text_plugin.py"

bdfa7b6

Merge pull request #3 from swooshcode/revert-2-swooshcode-patch-1

ea1c0f8

Revert "Create speech_to_text_plugin.py"

Create speech_to_text_plugin.py

313aed7

Merge pull request #5 from swooshcode/swooshcode-patch-3

f88edfa

Create speech_to_text_plugin.py

here is the framework

360e406

Create __init__.py

5457774

will update later

defd7ff

Update CODEOWNERS

2853229

swooshcode requested review from a team May 6, 2023 13:37

Update __init__.py

0ef84ba

ntindle requested changes May 10, 2023

View reviewed changes

swooshcode and others added 7 commits May 13, 2023 11:54

Update src/autogpt_plugins/speech_to_text/README.md

f63291e

Co-authored-by: Nicholas Tindle <[email protected]>

Update README.md

be40dc6

Update README.md

7072aaf

resolved

Update README.md

81379c5

resolved

Update README.md

7e05469

resolved

Update speech_to_text_plugin.py

1aaa763

resolved.

Merge branch 'master' into master

e2cb74e

ntindle requested changes May 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying for residency - "speech-to-text" first-party plugin open source (will update later) #126

Applying for residency - "speech-to-text" first-party plugin open source (will update later) #126

swooshcode commented May 6, 2023

swooshcode commented May 6, 2023 •

edited

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle May 10, 2023

ntindle left a comment

ntindle May 30, 2023

swooshcode Jun 2, 2023

ntindle May 30, 2023

ntindle May 30, 2023

ntindle May 30, 2023

ntindle May 30, 2023


		To install the plugins, follow these steps:

		1. Clone this repository: `git clone https://github.com/Frame-Tech-Solutions-Ltd-Co/Auto-GPT-Plugins.git`

		@@ -0,0 +1,65 @@
		# Changes:
		This repository contains various plugins developed for use with the AutoGPT model. These plugins extend the functionality of AutoGPT by providing additional features, such as speech-to-text transcription, integration with external APIs, and more.

Applying for residency - "speech-to-text" first-party plugin **open source** (will update later) #126

Are you sure you want to change the base?

Applying for residency - "speech-to-text" first-party plugin **open source** (will update later) #126

Conversation

swooshcode commented May 6, 2023

Table of Contents

Speech-to-Text Plugin

Features

Usage

Installation

Contributing

License

swooshcode commented May 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntindle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Applying for residency - "speech-to-text" first-party plugin open source (will update later) #126

Applying for residency - "speech-to-text" first-party plugin open source (will update later) #126

swooshcode commented May 6, 2023 •

edited