GitHub - Dartvauder/NeuroSandboxWebUI: (Windows/Linux) Local WebUI with neural network models (LLM, Stable Diffusion, AudioCraft, AudioLDM2, TTS, Bark, Whisper, Demucs, LibreTranslate, ZeroScope2, TripoSR, Shap-E, GLIGEN, Wav2Lip, Roop, Rembg, CodeFormer, Moondream 2) on python (In Gradio interface)

Features | Dependencies | SystemRequirements | Install | Usage | Models | Wiki | Acknowledgment | Licenses

Work in progress! (ALPHA)
English | Русский

Description:

A simple and convenient interface for using various neural network models. You can communicate with LLM and Moondream2 using text, voice and image input, use StableDiffusion to generate images, ZeroScope 2 to generate videos, TripoSR and Shap-E to generate 3D objects, AudioCraft and AudioLDM 2 to generate music and audio, CoquiTTS and SunoBark for text-to-speech, OpenAI-Whisper for speech-to-text, Wav2Lip for lip-sync, Roop to faceswap, Rembg to remove background, CodeFormer for face restore, LibreTranslate for text translation and Demucs for audio file separation. You can also download the LLM and StableDiffusion models, change the application settings inside the interface and check system sensors

The goal of the project - to create the easiest possible application to use neural network models

LLM:

TTS-STT:

SunoBark:

LibreTranslate:

Wav2Lip:

StableDiffusion:

ZeroScope 2:

TripoSR:

Shap-E:

AudioCraft:

AudioLDM 2:

Demucs:

ModelDownloader:

Settings:

System:

Features:

Easy installation via install.bat(Windows) or install.sh(Linux)
You can use the application via your mobile device in localhost(Via IPv4) or anywhere online(Via Share)
Flexible and optimized interface (By Gradio)
Authentication via admin:admin (You can enter your login details in the GradioAuth.txt file)
Support for Transformers and llama.cpp models (LLM)
Support for diffusers and safetensors models (StableDiffusion) - txt2img, img2img, depth2img, pix2pix, controlnet, upscale, inpaint, gligen, animatediff, video, cascade and extras tabs
AudioCraft support (Models: musicgen, audiogen and magnet)
AudioLDM 2 support (Models: audio and music)
Supports TTS and Whisper models (For LLM and TTS-STT)
Supports Lora, Textual inversion (embedding), Vae, Img2img, Depth, Pix2Pix, Controlnet, Upscale, Inpaint, GLIGEN, AnimateDiff, Videos, Cascade, Rembg, CodeFormer and Roop models (For StableDiffusion)
Support Multiband Diffusion model (For AudioCraft)
Support LibreTranslate (Local API)
Support ZeroScope 2
Support SunoBark
Support Demucs
Support Shap-E
Support TripoSR
Support Wav2Lip
Support Multimodal (Moondream 2), LORA (transformers) and WebSearch (with GoogleSearch) for LLM
Model settings inside the interface
ModelDownloader (For LLM and StableDiffusion)
Application settings
Ability to see system sensors

Required Dependencies:

Python (3.10+)
Git
CUDA (12.X) and cuDNN (9.X)
FFMPEG

C+ compiler
- Windows: VisualStudio
- Linux: GCC

Minimum System Requirements:

System: Windows or Linux
GPU: 6GB+ or CPU: 8 core 3.2GHZ
RAM: 16GB+
Disk space: 20GB+
Internet for downloading models and installing

How to install:

Windows

Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
Run the Install.bat and wait for installation
After installation, run Start.bat
Select the file version and wait for the application to launch
Now you can start generating!

To get update, run Update.bat To work with the virtual environment through the terminal, run Venv.bat

Linux

Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
In the terminal, run the ./Install.sh and wait for installation of all dependencies
After installation, run ./Start.sh
Wait for the application to launch
Now you can start generating!

To get update, run ./Update.sh To work with the virtual environment through the terminal, run ./Venv.sh

How to use:

Interface has fifteen tabs: LLM, TTS-STT, SunoBark, LibreTranslate, Wav2Lip, StableDiffusion, ZeroScope 2, TripoSR, Shap-E, AudioCraft, AudioLDM 2, Demucs, ModelDownloader, Settings and System. Select the one you need and follow the instructions below

LLM:

First upload your models to the folder: inputs/text/llm_models
Select your model from the drop-down list
Select model type (transformers or llama)
Set up the model according to the parameters you need
Type (or speak) your request
Click the Submit button to receive the generated text and audio response

Optional: you can enable `TTS` mode, select the `voice` and `language` needed to receive an audio response. You can enable `multimodal` and upload an image to get its description. You can enable `websearch` for Internet access. You can enable `libretranslate` to get the translate. Also you can choose `LORA` model to improve generation

Voice samples = inputs/audio/voices

LORA = inputs/text/llm_models/lora

The voice must be pre-processed (22050 kHz, mono, WAV)

TTS-STT:

Type text for text to speech
Input audio for speech to text
Click the Submit button to receive the generated text and audio response

Voice samples = inputs/audio/voices

The voice must be pre-processed (22050 kHz, mono, WAV)

SunoBark:

Type your request
Set up the model according to the parameters you need
Click the Submit button to receive the generated audio response

LibreTranslate:

First you need to install and run LibreTranslate

Select source and target languages
Click the Submit button to get the translate

Optional: you can save the translation history by turning on the corresponding button

Wav2Lip:

Upload the initial image of face
Upload the initial audio of voice
Set up the model according to the parameters you need
Click the Submit button to receive the lip-sync

StableDiffusion - has twelve sub-tabs:

txt2img:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

Optional: You can select your `vae`, `embedding` and `lora` models to improve the generation method, also you can enable `upscale` to increase the size of the generated image

vae = inputs/image/sd_models/vae

lora = inputs/image/sd_models/lora

embedding = inputs/image/sd_models/embedding

img2img:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Upload the initial image with which the generation will take place
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

Optional: You can select your `vae` model

vae = inputs/image/sd_models/vae

depth2img:

Upload the initial image
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

pix2pix:

Upload the initial image
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

controlnet:

First upload your stable diffusion models to the folder: inputs/image/sd_models
Upload the initial image
Select your stable diffusion and controlnet models from the drop-down lists
Set up the models according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image

upscale:

Upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the upscaled image

inpaint:

First upload your models to the folder: inputs/image/sd_models/inpaint
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Upload the image with which the generation will take place to initial image and mask image
In mask image, select the brush, then the palette and change the color to #FFFFFF
Draw a place for generation and enter your request (+ and - for prompt weighting)
Click the Submit button to get the inpainted image

Optional: You can select your `vae` model

vae = inputs/image/sd_models/vae

gligen:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Select model type (SD, SD2 or SDXL)
Set up the model according to the parameters you need
Enter your request for prompt (+ and - for prompt weighting) and GLIGEN phrases (in "" for box)
Enter GLIGEN boxes (Like a [0.1387, 0.2051, 0.4277, 0.7090] for box)
Click the Submit button to get the generated image

animatediff:

First upload your models to the folder: inputs/image/sd_models
Select your model from the drop-down list
Set up the model according to the parameters you need
Enter your request (+ and - for prompt weighting)
Click the Submit button to get the generated image animation

video:

Upload the initial image
Enter your request (for IV2Gen-XL)
Set up the model according to the parameters you need
Click the Submit button to get the video from image

cascade:

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated image

extras:

Upload the initial image
Select the options you need
Click the Submit button to get the modified image

ZeroScope 2:

Enter your request
Set up the model according to the parameters you need
Click the Submit button to get the generated video

TripoSR:

Upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D object

Shap-E:

Enter your request or upload the initial image
Set up the model according to the parameters you need
Click the Submit button to get the generated 3D object

AudioCraft:

Select a model from the drop-down list
Select model type (musicgen or audiogen)
Set up the model according to the parameters you need
Enter your request
(Optional) upload the initial audio if you are using melody model
Click the Submit button to get the generated audio

Optional: You can enable `multiband diffusion` to improve the generated audio

AudioLDM 2:

Select a model from the drop-down list
Set up the model according to the parameters you need
Enter your request
Click the Submit button to get the generated audio

Demucs:

Upload the initial audio to separate
Click the Submit button to get the separated audio

ModelDownloader:

Here you can download LLM and StableDiffusion models. Just choose the model from the drop-down list and click the Submit button

`LLM` models are downloaded here: inputs/text/llm_models

`StableDiffusion` models are downloaded here: inputs/image/sd_models

Settings:

Here you can change the application settings. For now you can only change Share mode to True or False

System:

Here you can see the indicators of your computer's sensors by clicking on the Submit button

Additional Information:

All generations are saved in the outputs folder
You can press the Clear button to reset your selection
To stop the generation process, click the Stop generation button
You can turn off the application using the Close terminal button
You can open the outputs folder by clicking on the Folder button

Where can i get models and voices?

LLM models can be taken from HuggingFace or from ModelDownloader inside interface
StableDiffusion, vae, inpaint, embedding and lora models can be taken from CivitAI or from ModelDownloader inside interface
AudioCraft, AudioLDM 2, TTS, Whisper, Wav2Lip, SunoBark, MoonDream2, Upscale, GLIGEN, Depth, Pix2Pix, Controlnet, AnimateDiff, Videos, Cascade, Rembg, Roop, CodeFormer, TripoSR, Shap-E, Demucs, ZeroScope and Multiband diffusion models are downloads automatically in inputs folder when are they used
You can take voices anywhere. Record yours or take a recording from the Internet. Or just use those that are already in the project. The main thing is that it is pre-processed!

Wiki

https://github.com/Dartvauder/NeuroSandboxWebUI/wiki

Acknowledgment to developers

Many thanks to these projects because thanks to their applications/libraries, i was able to create my application:

First of all, I want to thank the developers of PyCharm and GitHub. With the help of their applications, i was able to create and share my code

gradio - https://github.com/gradio-app/gradio
transformers - https://github.com/huggingface/transformers
tts - https://github.com/coqui-ai/TTS
openai-whisper - https://github.com/openai/whisper
torch - https://github.com/pytorch/pytorch
soundfile - https://github.com/bastibe/python-soundfile
cuda-python - https://github.com/NVIDIA/cuda-python
gitpython - https://github.com/gitpython-developers/GitPython
diffusers - https://github.com/huggingface/diffusers
llama.cpp-python - https://github.com/abetlen/llama-cpp-python
audiocraft - https://github.com/facebookresearch/audiocraft
AudioLDM2 - https://github.com/haoheliu/AudioLDM2
xformers - https://github.com/facebookresearch/xformers
demucs - https://github.com/facebookresearch/demucs
libretranslate - https://github.com/LibreTranslate/LibreTranslate
libretranslatepy - https://github.com/argosopentech/LibreTranslate-py
rembg - https://github.com/danielgatis/rembg
trimesh - https://github.com/mikedh/trimesh
googlesearch-python - https://github.com/Nv7-GitHub/googlesearch
torchmcubes - https://github.com/tatsy/torchmcubes
suno-bark - https://github.com/suno-ai/bark

Third Party Licenses:

Many models have their own license for use. Before using it, I advise you to familiarize yourself with them:

Donation

If you liked my project and want to donate, here is options to donate. Thank you very much in advance!

CryptoWallet(BEP-20) - 0x3d86bdb5f50b92d0d7Eb44F1a833acC5e91aAEcA

Name		Name	Last commit message	Last commit date
Latest commit History 1,215 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
configs/sd		configs/sd
inputs		inputs
outputs		outputs
tsr		tsr
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
GradioAuth.txt		GradioAuth.txt
Install.bat		Install.bat
Install.sh		Install.sh
LICENSE		LICENSE
README.md		README.md
README_RU.md		README_RU.md
SECURITY.md		SECURITY.md
Start.bat		Start.bat
Start.sh		Start.sh
Update.bat		Update.bat
Update.sh		Update.sh
Venv.bat		Venv.bat
Venv.sh		Venv.sh
appEN.py		appEN.py
appRU.py		appRU.py
requirements-cuda.txt		requirements-cuda.txt
requirements-llama-cpp.txt		requirements-llama-cpp.txt
requirements.txt		requirements.txt

License

Dartvauder/NeuroSandboxWebUI

Folders and files

Latest commit

History

Repository files navigation

Features | Dependencies | SystemRequirements | Install | Usage | Models | Wiki | Acknowledgment | Licenses

Description:

LLM:

TTS-STT:

SunoBark:

LibreTranslate:

Wav2Lip:

StableDiffusion:

ZeroScope 2:

TripoSR:

Shap-E:

AudioCraft:

AudioLDM 2:

Demucs:

ModelDownloader:

Settings:

System:

Features:

Required Dependencies:

Minimum System Requirements:

How to install:

Windows

Linux

How to use:

Interface has fifteen tabs: LLM, TTS-STT, SunoBark, LibreTranslate, Wav2Lip, StableDiffusion, ZeroScope 2, TripoSR, Shap-E, AudioCraft, AudioLDM 2, Demucs, ModelDownloader, Settings and System. Select the one you need and follow the instructions below

LLM:

Voice samples = inputs/audio/voices

LORA = inputs/text/llm_models/lora

The voice must be pre-processed (22050 kHz, mono, WAV)

TTS-STT:

Voice samples = inputs/audio/voices

The voice must be pre-processed (22050 kHz, mono, WAV)

SunoBark:

LibreTranslate:

Optional: you can save the translation history by turning on the corresponding button

Wav2Lip:

StableDiffusion - has twelve sub-tabs:

txt2img:

Optional: You can select your vae, embedding and lora models to improve the generation method, also you can enable upscale to increase the size of the generated image

vae = inputs/image/sd_models/vae

lora = inputs/image/sd_models/lora

embedding = inputs/image/sd_models/embedding

img2img:

Optional: You can select your vae model

vae = inputs/image/sd_models/vae

depth2img:

pix2pix:

controlnet:

upscale:

inpaint:

Optional: You can select your vae model

vae = inputs/image/sd_models/vae

gligen:

animatediff:

video:

cascade:

extras:

ZeroScope 2:

TripoSR:

Shap-E:

AudioCraft:

Optional: You can enable multiband diffusion to improve the generated audio

AudioLDM 2:

Demucs:

ModelDownloader:

LLM models are downloaded here: inputs/text/llm_models

StableDiffusion models are downloaded here: inputs/image/sd_models

Settings:

System:

Additional Information:

Where can i get models and voices?

Wiki

Acknowledgment to developers

Many thanks to these projects because thanks to their applications/libraries, i was able to create my application:

Optional: You can select your `vae`, `embedding` and `lora` models to improve the generation method, also you can enable `upscale` to increase the size of the generated image

Optional: You can select your `vae` model

Optional: You can select your `vae` model

Optional: You can enable `multiband diffusion` to improve the generated audio

`LLM` models are downloaded here: inputs/text/llm_models

`StableDiffusion` models are downloaded here: inputs/image/sd_models