restructure project add screen ai

bigsk1 · Jun 18, 2024 · 1ed0d0a · 1ed0d0a
1 parent 2196491
commit 1ed0d0a
Show file tree

Hide file tree

Showing 29 changed files with 214 additions and 95 deletions.
diff --git a/.env.sample b/.env.sample
@@ -1,39 +1,39 @@
-# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when ran 
-# use either ollama or openai, can mix and match, use local olllama with openai speech or use openai model with local xtts, ect..
+# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when run.
+# You can mix and match; use local Ollama with OpenAI speech or use OpenAI model with local XTTS, etc.
 
-# openai or ollama
+# Model Provider: openai or ollama
 MODEL_PROVIDER=ollama
 
-# Enter charactor name to use - samantha, wizard, pirate, valleygirl, newscaster1920s, alien_scientist, cyberpunk, detective, 
-CHARACTER_NAME=pirate
+# Character to use - Options: samantha, wizard, pirate, valleygirl, newscaster1920s, alien_scientist, cyberpunk, detective
+CHARACTER_NAME=wizard
 
-# Text-to-Speech Provider - (xtts local uses the custom charactor .wav) or (openai text to speech uses openai tts voice)
-# xtts or openai
-TTS_PROVIDER=xtts 
+# Text-to-Speech Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice)
+TTS_PROVIDER=xtts
 
-# The voice speed for xtts only ( 1.0 - 1.5 , default 1.1)
-XTTS_SPEED=1.1
+# OpenAI TTS Voice - When TTS_PROVIDER is set to openai above, it will use the chosen voice.
+# If MODEL_PROVIDER is ollama, then it will use the .wav in the character folder.
+# Voice options: alloy, echo, fable, onyx, nova, shimmer
+OPENAI_TTS_VOICE=onyx
 
-# OpenAI TTS Voice - When TTS Provider is set to openai above it will use the chosen voice
-# Examples here https://platform.openai.com/docs/guides/text-to-speech
-# Choose the desired voice options are - alloy, echo, fable, onyx, nova, and shimmer
-OPENAI_TTS_VOICE=onyx 
-
-
-# SET THESE BELOW AND NO NEED TO CHANGE OFTEN #
-
-# Endpoints
+# Endpoints (set these below and no need to change often)
 OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
 OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
 OLLAMA_BASE_URL=http://localhost:11434
 
-# OpenAI API Key for models and speech
-OPENAI_API_KEY=sk-11111111
+# OpenAI API Key for models and speech (replace with your actual API key)
+OPENAI_API_KEY=sk-proj-1111111111
 
-# Models to use - llama3 works good for local
+# Models to use - llama3 works well for local usage.
+# OPTIONAL: For screen analysis, if MODEL_PROVIDER is ollama, llava will be used by default.
+# Ensure you have llava downloaded with Ollama. If OpenAI is used, gpt-4o works well.
 OPENAI_MODEL=gpt-4o
 OLLAMA_MODEL=llama3
 
+# The voice speed for XTTS only (1.0 - 1.5, default is 1.1)
+XTTS_SPEED=1.2
 
-
-
+# NOTES:
+# List of trigger phrases to have the model view your desktop (desktop, browser, images, etc.).
+# It will describe what it sees, and you can ask questions about it:
+# "what's on my screen", "take a screenshot", "show me my screen", "analyze my screen", 
+# "what do you see on my screen", "screen capture", "screenshot"
diff --git a/README.md b/README.md
@@ -9,10 +9,13 @@ Voice Chat AI is a project that allows you to interact with different AI charact
 
 ## Features
 
-- Supports both OpenAI and Ollama language models.
-- Provides text-to-speech synthesis using XTTS or OpenAI TTS.
-- Analyzes user mood and adjusts AI responses accordingly.
-- Easy configuration through environment variables.
+- **Supports both OpenAI and Ollama language models**: Choose the model that best fits your needs.
+- **Provides text-to-speech synthesis using XTTS or OpenAI TTS**: Enjoy natural and expressive voices.
+- **No typing needed, just speak!**: Hands-free interaction makes conversations smooth and effortless.
+- **Analyzes user mood and adjusts AI responses accordingly**: Get personalized responses based on your mood.
+- **You can, just by speaking, have the AI analyze your screen and chat about it**: Seamlessly integrate visual context into your conversations.
+- **Easy configuration through environment variables**: Customize the application to suit your preferences with minimal effort.
+
 
 ## Installation
 
@@ -21,7 +24,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact
 - Python 3.10
 - CUDA-enabled GPU
 - Microphone
-- A sence of humor
+- A sense of humor
 
 ### Steps
 
@@ -41,7 +44,6 @@ Voice Chat AI is a project that allows you to interact with different AI charact
 
  or use conda just make it python 3.10
 
-
  ```bash
  conda create --name voice-chat-ai python=3.10
  conda activate voice-chat-ai
@@ -56,14 +58,12 @@ Voice Chat AI is a project that allows you to interact with different AI charact
 
 3. Install dependencies:
 
-
  For GPU (CUDA) version:
 
  ```bash
  pip install -r requirements.txt
  ```
 
-
  For CPU-only version:
 
  ```bash
@@ -74,8 +74,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact
 
 You need to download the checkpoints for the models used in this project. You can download them from the GitHub releases page and extract the zip into the project folder.
 
-- [Download EN Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)
-
+- [Download Checkpoint](https://github.com/bigsk1/voice-chat-ai/releases/download/models/checkpoints.zip)
 - [Download XTTS-v2](https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zip)
 
 After downloading, place the folders as follows:
@@ -99,7 +98,6 @@ voice-chat-ai/
 
 You can use the following commands to download and extract the files directly into the project directory:
 
-
 ```sh
 # Navigate to the project directory
 cd /path/to/your/voice-chat-ai
@@ -113,7 +111,6 @@ wget https://github.com/bigsk1/voice-chat-ai/releases/download/models/XTTS-v2.zi
 unzip XTTS-v2.zip -d .
 ```
 
-
 ## Configuration
 
 1. Rename the .env.sample to `.env` in the root directory of the project and configure it with the necessary environment variables: - The app is controlled based on the variables you add.
@@ -140,7 +137,6 @@ unzip XTTS-v2.zip -d .
  # Choose the desired voice options are - alloy, echo, fable, onyx, nova, and shimmer
  OPENAI_TTS_VOICE=onyx 
 
-
  # SET THESE BELOW AND NO NEED TO CHANGE OFTEN #
 
  # Endpoints
@@ -156,7 +152,6 @@ unzip XTTS-v2.zip -d .
  OLLAMA_MODEL=llama3
  ```
 
-
 ## Usage
 
 Run the application:
@@ -171,18 +166,19 @@ python app.py
 
 ## Adding New Characters
 
-1. Create a new folder for the character in the project directory.
+1. Create a new folder for the character in the project's characters directory.
 2. Add a text file with the character's prompt (e.g., `wizard/wizard.txt`).
 3. Add a JSON file with mood prompts (e.g., `wizard/prompts.json`).
 
 ## Example Character Configuration
 
 `wizard/wizard.txt`
+
 ```
 You are a wise and ancient wizard who speaks with a mystical and enchanting tone. You are knowledgeable about many subjects and always eager to share your wisdom.
 ```
 
-### `wizard/prompts.json`
+`wizard/prompts.json`
 
 ```json
 {
@@ -196,9 +192,9 @@ You are a wise and ancient wizard who speaks with a mystical and enchanting tone
  "disgusted": "RESPOND WITH UNDERSTANDING AND COMFORT, LIKE A WISE OLD SAGE WHO KNOWS THAT DISGUST IS A PART OF LIFE."
 }
 ```
-For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automaticly find the .wav when it has the characters name and use it. 
 
+For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automaticly find the .wav when it has the characters name and use it. 
 
 ## License
 
-This project is licensed under the MIT License. 
+This project is licensed under the MIT License.