A tool for recording audio from a microphone, transcribing the recording, and copying the transcription to the clipboard.
Developed by Claus Helfenschneider Interactive Applications.
- The transcription is copied to the clipboard for easy pasting into other applications.
- Comes with a CLI, a UI, and is usable as a python module.
- Supports configurable text replacements, similar to the voice recording feature on iOS.
For example, it can replace the text "new line" with an actual new line or "bullet point" with "•
". - If ffmpeg is installed (optional), the audio will be converted to mp3 prior to transcription, for faster uploads when using the OpenAI API.
- Configurable via a config file (config.ini), command line arguments (CLI), and replacements-mapping file (replacements.json).
Either one or both of the following transcription backends are supported and can be used:
- Local whisper model. For this,
openai-whisper
must be installed. - OpenAI's Whisper model via the OpenAI API. For this, an OpenAI API key is required.
CLI | UI |
---|---|
Verbose Mode: Silent Mode: |
- Clone the repository.
- (Optional but recommended) Set up a virtual environment. Requires Python 3.11 or higher.
- Install the requirements:
pip install -r requirements.in
for the latest versions (recommended), orpip install -r requirements.txt
for pinned versions.
- To build an
.exe
, install the dev requirements:pip install -r requirements-dev.in
- Set the environment variable
WHISPER_KEYBOARD_API_KEY
to your OpenAI API Key. You can either set it in your global environment, add it to an.env
file or specify it in the config.ini file under[openai] api_key
.- Note: An environment variable takes precedence over the value set in the config file.
- Run the command line interface:
python speech_to_clipboard_cli.py
- For available options, run
python speech_to_clipboard_cli.py --help
.
- Run the UI:
python speech_to_clipboard_ui.pyw
- Select your preferred microphone from the dropdown menu.
- Press REC to start recording.
- Press Stop Recording to end the recording. The audio will be sent to the OpenAI API for transcription, and the result will be copied to your clipboard.
from settings import Settings
from core.speech_to_clipboard import SpeechToClipboard
speech_to_clip = SpeechToClipboard(
audio_file_path=Settings.AUDIO_FILE_PATH,
config_file_path=Settings.CONFIG_FILE_PATH,
replacemetns_file_path=Settings.REPLACEMENTS_FILE_PATH,
openai_api_key_env_var=Settings.OPENAI_API_KEY_ENV_VAR,
)
speech_to_clip.start_recording()
print("Recording...")
input("Press Enter to stop recording...")
speech_to_clip.stop_and_save_recording()
transcription = speech_to_clip.transcribe_recording()
print(transcription)
To build an executable file (.exe
on Windows) using AutoPyToExe, follow these steps:
- Install the dev requirements:
pip install -r requirements-dev.in
- Depending on whether you want to build the UI or the CLI app, choose the corresponding configuration file:
- There are some absolute paths in the configuration file, which have to be replaced by the path to your local project. Alternatively you can just take the config file as a reference to adjust the settings in the UI.
- Execute
auto-py-to-exe -c <YOUR_CONFIG_FILE>
with the adjusted config file. - Click *Convert .py to .exe
- Note: In case you want to build UI and CLI, you need to build those separately, but after building both, you can move both excutables into the same directory, so that they use the same config file and resources, and delete the other/obsolete build directory.
The tool features a simple text replacement system. When enabled via the Replacer checkbox, it can replace certain expressions as follows:
Expression | Replacement |
---|---|
new line |
\n |
bullet point |
• |
en dash |
– |
... | ... |
Configure or edit the replacements in the resources/replacements.json file.
Contributions and feedback are welcome! Please open an issue or submit a pull request.
By Claus Helfenschneider Interactive Applications @ www.interactive-applications.com
If you enjoy this project, please consider buying me a coffee, check out my website, and reach out to me. I'd love to hear from you!
I am open for hire/commissions.
This project is licensed under the MIT License. See the LICENSE file for details.
This project uses the following third-party packages. Please refer to the respective license files for more details.
Package | License | License File |
---|---|---|
CustomTKinter | MIT | Link |
OpenAI | MIT | Link |
python-sounddevice | MIT | Link |
python-soundfile | BSD-3-Clause | Link |
numpy | BSD-3-Clause | Link |
pyperclip | BSD-3-Clause | Link |
pydub | MIT | Link |
humanize | MIT | Link |
auto-py-to-exe | MIT | Link |
Development dependencies:
Package | License | License File |
---|---|---|
pylint | GPL-2.0-or-later | Link |
yapf | Apache-2.0 | Link |