Skip to content

Commit

Permalink
build webui
Browse files Browse the repository at this point in the history
  • Loading branch information
bigsk1 committed Jun 21, 2024
1 parent 3ea068e commit 4effe26
Show file tree
Hide file tree
Showing 12 changed files with 1,104 additions and 15 deletions.
14 changes: 9 additions & 5 deletions .env.sample
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Conditional API Usage: Depending on the value of MODEL_PROVIDER, that's what will be used when run.
# You can mix and match; use local Ollama with OpenAI speech or use OpenAI model with local XTTS, etc.

# Model Provider: openai or ollama
# Model Provider: openai or ollama - once set if run webui can't change in ui until you stop server and restart
# openai or ollama
MODEL_PROVIDER=ollama

# Character to use - Options: samantha, wizard, pirate, valleygirl, newscaster1920s, alien_scientist, cyberpunk, detective
CHARACTER_NAME=wizard

# Text-to-Speech Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice)

# Text-to-Speech Provider - Options: xtts (local uses the custom character .wav) or openai (uses OpenAI TTS voice) - once set if run webui can't change in ui until you stop server and restart
# openai or xtts
TTS_PROVIDER=xtts

# OpenAI TTS Voice - When TTS_PROVIDER is set to openai above, it will use the chosen voice.
Expand All @@ -20,9 +23,6 @@ OPENAI_BASE_URL=https://api.openai.com/v1/chat/completions
OPENAI_TTS_URL=https://api.openai.com/v1/audio/speech
OLLAMA_BASE_URL=http://localhost:11434

# OpenAI API Key for models and speech (replace with your actual API key)
OPENAI_API_KEY=sk-proj-1111111111

# Models to use - llama3 works well for local usage.
# OPTIONAL: For screen analysis, if MODEL_PROVIDER is ollama, llava will be used by default.
# Ensure you have llava downloaded with Ollama. If OpenAI is used, gpt-4o works well.
Expand All @@ -32,6 +32,10 @@ OLLAMA_MODEL=llama3
# The voice speed for XTTS only (1.0 - 1.5, default is 1.1)
XTTS_SPEED=1.2


# OpenAI API Key for models and speech (replace with your actual API key)
OPENAI_API_KEY=sk-proj-1111111

# NOTES:
# List of trigger phrases to have the model view your desktop (desktop, browser, images, etc.).
# It will describe what it sees, and you can ask questions about it:
Expand Down
32 changes: 22 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact
- **Analyzes user mood and adjusts AI responses accordingly**: Get personalized responses based on your mood.
- **You can, just by speaking, have the AI analyze your screen and chat about it**: Seamlessly integrate visual context into your conversations.
- **Easy configuration through environment variables**: Customize the application to suit your preferences with minimal effort.
- **WebUI or Terminal usage**: Can be ran with either


## Installation
Expand Down Expand Up @@ -72,7 +73,7 @@ Voice Chat AI is a project that allows you to interact with different AI charact
pip install -r cpu_requirements.txt
```

Need to have Microsoft C++ Build Tools for TTS
Need to have Microsoft C++ Build Tools on windows for TTS
[Microsoft Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)

### Download Checkpoints
Expand Down Expand Up @@ -166,8 +167,17 @@ XTTS_SPEED=1.2

Run the application:

Web UI
```bash
python app.py
uvicorn app.main:app --host 0.0.0.0 --port 8000
```
Find on http://localhost:8000/


CLI Only

```bash
python cli.py
```

### Commands
Expand Down Expand Up @@ -211,28 +221,30 @@ You are a wise and ancient wizard who speaks with a mystical and enchanting tone
}
```

For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automaticly find the .wav when it has the characters name and use it.
For XTTS find a .wav voice and add it to the wizard folder and name it as wizard.wav , the voice only needs to be 6 seconds long. Running the app will automatically find the .wav when it has the characters name and use it. If only using Openai Speech a .wav isn't needed


## Watch the Demos

GPU - 100% local - ollama llama3, xtts-v2
Webui - OpenAI and Ollama

[![Watch the video](https://img.youtube.com/vi/WsWbYnITdCo/maxresdefault.jpg)](https://youtu.be/WsWbYnITdCo)
[![Watch the video](https://img.youtube.com/vi/bgdQkzGltdk/maxresdefault.jpg)](https://youtu.be/bgdQkzGltdk)



CPU Only mode
CLI

Alien conversation using openai gpt4o and openai speech for tts.
GPU - 100% local - ollama llama3, xtts-v2

[![Watch the video](https://img.youtube.com/vi/d5LbRLhWa5c/maxresdefault.jpg)](https://youtu.be/d5LbRLhWa5c)
[![Watch the video](https://img.youtube.com/vi/WsWbYnITdCo/maxresdefault.jpg)](https://youtu.be/WsWbYnITdCo)


Valley girl conversation using ollama llama3, openai tts

[![Watch the video](https://img.youtube.com/vi/HSEFH0UnZEk/maxresdefault.jpg)](https://youtu.be/HSEFH0UnZEk)
CPU Only mode CLI

Alien conversation using openai gpt4o and openai speech for tts.

[![Watch the video](https://img.youtube.com/vi/d5LbRLhWa5c/maxresdefault.jpg)](https://youtu.be/d5LbRLhWa5c)


## License
Expand Down
Empty file added app/__init__.py
Empty file.
Loading

0 comments on commit 4effe26

Please sign in to comment.