Skip to content

Commit

Permalink
restructure project add screen ai
Browse files Browse the repository at this point in the history
  • Loading branch information
bigsk1 committed Jun 18, 2024
1 parent 2196491 commit 1ed0d0a
Show file tree
Hide file tree
Showing 29 changed files with 214 additions and 95 deletions.
48 changes: 24 additions & 24 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -1,39 +1,39 @@
# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when ran
# use either ollama or openai, can mix and match, use local olllama with openai speech or use openai model with local xtts, ect..
# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when run.
# You can mix and match; use local Ollama with OpenAI speech or use OpenAI model with local XTTS, etc.

# openai or ollama
# Model Provider: openai or ollama
MODEL_PROVIDER=ollama

# Enter charactor name to use - samantha, wizard, pirate, valleygirl, newscaster1920s, alien_scientist, cyberpunk, detective,
CHARACTER_NAME=pirate
# Character to use - Options: samantha, wizard, pirate, valleygirl, newscaster1920s, alien_scientist, cyberpunk, detective
CHARACTER_NAME=wizard

# Text-to-Speech Provider - (xtts local uses the custom charactor .wav) or (openai text to speech uses openai tts voice)
# xtts or openai
TTS_PROVIDER=xtts
# Text-to-Speech Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice)
TTS_PROVIDER=xtts

# The voice speed for xtts only ( 1.0 - 1.5 , default 1.1)
XTTS_SPEED=1.1
# OpenAI TTS Voice - When TTS_PROVIDER is set to openai above, it will use the chosen voice.
# If MODEL_PROVIDER is ollama, then it will use the .wav in the character folder.
# Voice options: alloy, echo, fable, onyx, nova, shimmer
OPENAI_TTS_VOICE=onyx

# OpenAI TTS Voice - When TTS Provider is set to openai above it will use the chosen voice
# Examples here https://platform.openai.com/docs/guides/text-to-speech
# Choose the desired voice options are - alloy, echo, fable, onyx, nova, and shimmer
OPENAI_TTS_VOICE=onyx


# SET THESE BELOW AND NO NEED TO CHANGE OFTEN #

# Endpoints
# Endpoints (set these below and no need to change often)
OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
OLLAMA_BASE_URL=http://localhost:11434

# OpenAI API Key for models and speech
OPENAI_API_KEY=sk-11111111
# OpenAI API Key for models and speech (replace with your actual API key)
OPENAI_API_KEY=sk-proj-1111111111

# Models to use - llama3 works good for local
# Models to use - llama3 works well for local usage.
# OPTIONAL: For screen analysis, if MODEL_PROVIDER is ollama, llava will be used by default.
# Ensure you have llava downloaded with Ollama. If OpenAI is used, gpt-4o works well.
OPENAI_MODEL=gpt-4o
OLLAMA_MODEL=llama3

# The voice speed for XTTS only (1.0 - 1.5, default is 1.1)
XTTS_SPEED=1.2



# NOTES:
# List of trigger phrases to have the model view your desktop (desktop, browser, images, etc.).
# It will describe what it sees, and you can ask questions about it:
# "what's on my screen", "take a screenshot", "show me my screen", "analyze my screen",
# "what do you see on my screen", "screen capture", "screenshot"
32 changes: 14 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,13 @@ Voice Chat AI is a project that allows you to interact with different AI charact

## Features

- Supports both OpenAI and Ollama language models.
- Provides text-to-speech synthesis using XTTS or OpenAI TTS.
- Analyzes user mood and adjusts AI responses accordingly.
- Easy configuration through environment variables.
- **Supports both OpenAI and Ollama language models**: Choose the model that best fits your needs.
- **Provides text-to-speech synthesis using XTTS or OpenAI TTS**: Enjoy natural and expressive voices.
- **No typing needed, just speak!**: Hands-free interaction makes conversations smooth and effortless.
- **Analyzes user mood and adjusts AI responses accordingly**: Get personalized responses based on your mood.
- **You can, just by speaking, have the AI analyze your screen and chat about it**: Seamlessly integrate visual context into your conversations.
- **Easy configuration through environment variables**: Customize the application to suit your preferences with minimal effort.


## Installation

Expand All @@ -21,7 +24,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact
- Python 3.10
- CUDA-enabled GPU
- Microphone
- A sence of humor
- A sense of humor

### Steps

Expand All @@ -41,7 +44,6 @@ Voice Chat AI is a project that allows you to interact with different AI charact

or use conda just make it python 3.10


```bash
conda create --name voice-chat-ai python=3.10
conda activate voice-chat-ai
Expand All @@ -56,14 +58,12 @@ Voice Chat AI is a project that allows you to interact with different AI charact

3. Install dependencies:


For GPU (CUDA) version:

```bash
pip install -r requirements.txt
```


For CPU-only version:

```bash
Expand All @@ -74,8 +74,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact

You need to download the checkpoints for the models used in this project. You can download them from the GitHub releases page and extract the zip into the project folder.

- [Download EN Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)

- [Download Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)
- [Download XTTS-v2](https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zip)

After downloading, place the folders as follows:
Expand All @@ -99,7 +98,6 @@ voice-chat-ai/

You can use the following commands to download and extract the files directly into the project directory:


```sh
# Navigate to the project directory
cd /path/to/your/voice-chat-ai
Expand All @@ -113,7 +111,6 @@ wget https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zi
unzip XTTS-v2.zip -d .
```


## Configuration

1. Rename the .env.sample to `.env` in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.
Expand All @@ -140,7 +137,6 @@ unzip XTTS-v2.zip -d .
# Choose the desired voice options are - alloy, echo, fable, onyx, nova, and shimmer
OPENAI_TTS_VOICE=onyx
# SET THESE BELOW AND NO NEED TO CHANGE OFTEN #
# Endpoints
Expand All @@ -156,7 +152,6 @@ unzip XTTS-v2.zip -d .
OLLAMA_MODEL=llama3
```


## Usage

Run the application:
Expand All @@ -171,18 +166,19 @@ python app.py

## Adding New Characters

1. Create a new folder for the character in the project directory.
1. Create a new folder for the character in the project's characters directory.
2. Add a text file with the character's prompt (e.g., `wizard/wizard.txt`).
3. Add a JSON file with mood prompts (e.g., `wizard/prompts.json`).

## Example Character Configuration

`wizard/wizard.txt`

```
You are a wise and ancient wizard who speaks with a mystical and enchanting tone. You are knowledgeable about many subjects and always eager to share your wisdom.
```

### `wizard/prompts.json`
`wizard/prompts.json`

```json
{
Expand All @@ -196,9 +192,9 @@ You are a wise and ancient wizard who speaks with a mystical and enchanting tone
"disgusted": "RESPOND WITH UNDERSTANDING AND COMFORT, LIKE A WISE OLD SAGE WHO KNOWS THAT DISGUST IS A PART OF LIFE."
}
```
For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automaticly find the .wav when it has the characters name and use it.

For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automaticly find the .wav when it has the characters name and use it.

## License

This project is licensed under the MIT License.
This project is licensed under the MIT License.
Loading

0 comments on commit 1ed0d0a

Please sign in to comment.