Skip to content

A simple UI tool written in Python, for recording audio from a microphone and automatically transcribing the recording using OpenAI's Whisper model via OpenAI's API.

License

Notifications You must be signed in to change notification settings

interactive-applications/speech-to-clipboard

Repository files navigation

Speech To Clipboard

PyPI Supported Versions License Supported Platforms Donate

A tool for recording audio from a microphone, transcribing the recording, and copying the transcription to the clipboard.

Developed by Claus Helfenschneider Interactive Applications.

Features

  • The transcription is copied to the clipboard for easy pasting into other applications.
  • Comes with a CLI, a UI, and is usable as a python module.
  • Supports configurable text replacements, similar to the voice recording feature on iOS.
    For example, it can replace the text "new line" with an actual new line or "bullet point" with "".
  • If ffmpeg is installed (optional), the audio will be converted to mp3 prior to transcription, for faster uploads when using the OpenAI API.
  • Configurable via a config file (config.ini), command line arguments (CLI), and replacements-mapping file (replacements.json).

Transcription backend

Either one or both of the following transcription backends are supported and can be used:

  • Local whisper model. For this, openai-whisper must be installed.
  • OpenAI's Whisper model via the OpenAI API. For this, an OpenAI API key is required.

Screenshots

CLI UI
Verbose Mode:
CLI
Silent Mode:
CLI
UI

Installation

  1. Clone the repository.
  2. (Optional but recommended) Set up a virtual environment. Requires Python 3.11 or higher.
  3. Install the requirements:
    • pip install -r requirements.in for the latest versions (recommended), or
    • pip install -r requirements.txt for pinned versions.
  4. To build an .exe, install the dev requirements:
    • pip install -r requirements-dev.in
  5. Set the environment variable WHISPER_KEYBOARD_API_KEY to your OpenAI API Key. You can either set it in your global environment, add it to an .env file or specify it in the config.ini file under [openai] api_key.
    • Note: An environment variable takes precedence over the value set in the config file.

Usage

CLI (Command Line Interface)

  1. Run the command line interface: python speech_to_clipboard_cli.py
  2. For available options, run python speech_to_clipboard_cli.py --help.

UI (User Interface)

  1. Run the UI: python speech_to_clipboard_ui.pyw
  2. Select your preferred microphone from the dropdown menu.
  3. Press REC to start recording.
  4. Press Stop Recording to end the recording. The audio will be sent to the OpenAI API for transcription, and the result will be copied to your clipboard.

Python Module

from settings import Settings
from core.speech_to_clipboard import SpeechToClipboard

speech_to_clip = SpeechToClipboard(
    audio_file_path=Settings.AUDIO_FILE_PATH,
    config_file_path=Settings.CONFIG_FILE_PATH,
    replacemetns_file_path=Settings.REPLACEMENTS_FILE_PATH,
    openai_api_key_env_var=Settings.OPENAI_API_KEY_ENV_VAR,
)

speech_to_clip.start_recording()
print("Recording...")
input("Press Enter to stop recording...")
speech_to_clip.stop_and_save_recording()
transcription = speech_to_clip.transcribe_recording()
print(transcription)

Create Executable With AutoPyToExe

To build an executable file (.exe on Windows) using AutoPyToExe, follow these steps:

  1. Install the dev requirements: pip install -r requirements-dev.in
  2. Depending on whether you want to build the UI or the CLI app, choose the corresponding configuration file:
  3. There are some absolute paths in the configuration file, which have to be replaced by the path to your local project. Alternatively you can just take the config file as a reference to adjust the settings in the UI.
  4. Execute auto-py-to-exe -c <YOUR_CONFIG_FILE> with the adjusted config file.
  5. Click *Convert .py to .exe
  6. Note: In case you want to build UI and CLI, you need to build those separately, but after building both, you can move both excutables into the same directory, so that they use the same config file and resources, and delete the other/obsolete build directory.

Text Replacer

The tool features a simple text replacement system. When enabled via the Replacer checkbox, it can replace certain expressions as follows:

Expression Replacement
new line \n
bullet point
en dash
... ...

Configure or edit the replacements in the resources/replacements.json file.

Contributions & Feedback

Contributions and feedback are welcome! Please open an issue or submit a pull request.

Credits

By Claus Helfenschneider Interactive Applications @ www.interactive-applications.com

If you enjoy this project, please consider buying me a coffee, check out my website, and reach out to me. I'd love to hear from you!

I am open for hire/commissions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Third-Party Licenses

This project uses the following third-party packages. Please refer to the respective license files for more details.

Package License License File
CustomTKinter MIT Link
OpenAI MIT Link
python-sounddevice MIT Link
python-soundfile BSD-3-Clause Link
numpy BSD-3-Clause Link
pyperclip BSD-3-Clause Link
pydub MIT Link
humanize MIT Link
auto-py-to-exe MIT Link

Development dependencies:

Package License License File
pylint GPL-2.0-or-later Link
yapf Apache-2.0 Link

About

A simple UI tool written in Python, for recording audio from a microphone and automatically transcribing the recording using OpenAI's Whisper model via OpenAI's API.

Topics

Resources

License

Stars

Watchers

Forks