Skip to content

(Windows/Linux) Local WebUI with neural network models (LLM, Stable Diffusion, AudioCraft, AudioLDM2, TTS, Bark, Whisper, Demucs, LibreTranslate, ZeroScope2, TripoSR, Shap-E, GLIGEN, Wav2Lip, Roop, Rembg, CodeFormer, Moondream 2) on python (In Gradio interface)

License

Dartvauder/NeuroSandboxWebUI

Repository files navigation

main

Description:

A simple and convenient interface for using various neural network models. You can communicate with LLM and Moondream2 using text, voice and image input, use StableDiffusion to generate images, ZeroScope 2 to generate videos, TripoSR and Shap-E to generate 3D objects, AudioCraft and AudioLDM 2 to generate music and audio, CoquiTTS and SunoBark for text-to-speech, OpenAI-Whisper for speech-to-text, Wav2Lip for lip-sync, Roop to faceswap, Rembg to remove background, CodeFormer for face restore, LibreTranslate for text translation and Demucs for audio file separation. You can also download the LLM and StableDiffusion models, change the application settings inside the interface and check system sensors

The goal of the project - to create the easiest possible application to use neural network models

LLM: 1

TTS-STT: 2

SunoBark: 3

LibreTranslate: 4

Wav2Lip: 5

StableDiffusion: 6

ZeroScope 2: 7

TripoSR: 8

Shap-E: 9

AudioCraft: 10

AudioLDM 2: 11

Demucs: 12

ModelDownloader: 13

Settings: 14

System: 15

Features:

  • Easy installation via install.bat(Windows) or install.sh(Linux)
  • You can use the application via your mobile device in localhost(Via IPv4) or anywhere online(Via Share)
  • Flexible and optimized interface (By Gradio)
  • Authentication via admin:admin (You can enter your login details in the GradioAuth.txt file)
  • Support for Transformers and llama.cpp models (LLM)
  • Support for diffusers and safetensors models (StableDiffusion) - txt2img, img2img, depth2img, pix2pix, controlnet, upscale, inpaint, gligen, animatediff, video, cascade and extras tabs
  • AudioCraft support (Models: musicgen, audiogen and magnet)
  • AudioLDM 2 support (Models: audio and music)
  • Supports TTS and Whisper models (For LLM and TTS-STT)
  • Supports Lora, Textual inversion (embedding), Vae, Img2img, Depth, Pix2Pix, Controlnet, Upscale, Inpaint, GLIGEN, AnimateDiff, Videos, Cascade, Rembg, CodeFormer and Roop models (For StableDiffusion)
  • Support Multiband Diffusion model (For AudioCraft)
  • Support LibreTranslate (Local API)
  • Support ZeroScope 2
  • Support SunoBark
  • Support Demucs
  • Support Shap-E
  • Support TripoSR
  • Support Wav2Lip
  • Support Multimodal (Moondream 2), LORA (transformers) and WebSearch (with GoogleSearch) for LLM
  • Model settings inside the interface
  • ModelDownloader (For LLM and StableDiffusion)
  • Application settings
  • Ability to see system sensors

Required Dependencies:

Minimum System Requirements:

  • System: Windows or Linux
  • GPU: 6GB+ or CPU: 8 core 3.2GHZ
  • RAM: 16GB+
  • Disk space: 20GB+
  • Internet for downloading models and installing

How to install:

Windows

  1. Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
  2. Run the Install.bat and wait for installation
  3. After installation, run Start.bat
  4. Select the file version and wait for the application to launch
  5. Now you can start generating!

To get update, run Update.bat To work with the virtual environment through the terminal, run Venv.bat

Linux

  1. Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git to any location
  2. In the terminal, run the ./Install.sh and wait for installation of all dependencies
  3. After installation, run ./Start.sh
  4. Wait for the application to launch
  5. Now you can start generating!

To get update, run ./Update.sh To work with the virtual environment through the terminal, run ./Venv.sh

How to use:

Interface has fifteen tabs: LLM, TTS-STT, SunoBark, LibreTranslate, Wav2Lip, StableDiffusion, ZeroScope 2, TripoSR, Shap-E, AudioCraft, AudioLDM 2, Demucs, ModelDownloader, Settings and System. Select the one you need and follow the instructions below

LLM:

  1. First upload your models to the folder: inputs/text/llm_models
  2. Select your model from the drop-down list
  3. Select model type (transformers or llama)
  4. Set up the model according to the parameters you need
  5. Type (or speak) your request
  6. Click the Submit button to receive the generated text and audio response

Optional: you can enable TTS mode, select the voice and language needed to receive an audio response. You can enable multimodal and upload an image to get its description. You can enable websearch for Internet access. You can enable libretranslate to get the translate. Also you can choose LORA model to improve generation

Voice samples = inputs/audio/voices

LORA = inputs/text/llm_models/lora

The voice must be pre-processed (22050 kHz, mono, WAV)

TTS-STT:

  1. Type text for text to speech
  2. Input audio for speech to text
  3. Click the Submit button to receive the generated text and audio response

Voice samples = inputs/audio/voices

The voice must be pre-processed (22050 kHz, mono, WAV)

SunoBark:

  1. Type your request
  2. Set up the model according to the parameters you need
  3. Click the Submit button to receive the generated audio response

LibreTranslate:

  1. Select source and target languages
  2. Click the Submit button to get the translate

Optional: you can save the translation history by turning on the corresponding button

Wav2Lip:

  1. Upload the initial image of face
  2. Upload the initial audio of voice
  3. Set up the model according to the parameters you need
  4. Click the Submit button to receive the lip-sync

StableDiffusion - has twelve sub-tabs:

txt2img:

  1. First upload your models to the folder: inputs/image/sd_models
  2. Select your model from the drop-down list
  3. Select model type (SD, SD2 or SDXL)
  4. Set up the model according to the parameters you need
  5. Enter your request (+ and - for prompt weighting)
  6. Click the Submit button to get the generated image

Optional: You can select your vae, embedding and lora models to improve the generation method, also you can enable upscale to increase the size of the generated image

vae = inputs/image/sd_models/vae

lora = inputs/image/sd_models/lora

embedding = inputs/image/sd_models/embedding

img2img:

  1. First upload your models to the folder: inputs/image/sd_models
  2. Select your model from the drop-down list
  3. Select model type (SD, SD2 or SDXL)
  4. Set up the model according to the parameters you need
  5. Upload the initial image with which the generation will take place
  6. Enter your request (+ and - for prompt weighting)
  7. Click the Submit button to get the generated image

Optional: You can select your vae model

vae = inputs/image/sd_models/vae

depth2img:

  1. Upload the initial image
  2. Set up the model according to the parameters you need
  3. Enter your request (+ and - for prompt weighting)
  4. Click the Submit button to get the generated image

pix2pix:

  1. Upload the initial image
  2. Set up the model according to the parameters you need
  3. Enter your request (+ and - for prompt weighting)
  4. Click the Submit button to get the generated image

controlnet:

  1. First upload your stable diffusion models to the folder: inputs/image/sd_models
  2. Upload the initial image
  3. Select your stable diffusion and controlnet models from the drop-down lists
  4. Set up the models according to the parameters you need
  5. Enter your request (+ and - for prompt weighting)
  6. Click the Submit button to get the generated image

upscale:

  1. Upload the initial image
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the upscaled image

inpaint:

  1. First upload your models to the folder: inputs/image/sd_models/inpaint
  2. Select your model from the drop-down list
  3. Select model type (SD, SD2 or SDXL)
  4. Set up the model according to the parameters you need
  5. Upload the image with which the generation will take place to initial image and mask image
  6. In mask image, select the brush, then the palette and change the color to #FFFFFF
  7. Draw a place for generation and enter your request (+ and - for prompt weighting)
  8. Click the Submit button to get the inpainted image

Optional: You can select your vae model

vae = inputs/image/sd_models/vae

gligen:

  1. First upload your models to the folder: inputs/image/sd_models
  2. Select your model from the drop-down list
  3. Select model type (SD, SD2 or SDXL)
  4. Set up the model according to the parameters you need
  5. Enter your request for prompt (+ and - for prompt weighting) and GLIGEN phrases (in "" for box)
  6. Enter GLIGEN boxes (Like a [0.1387, 0.2051, 0.4277, 0.7090] for box)
  7. Click the Submit button to get the generated image

animatediff:

  1. First upload your models to the folder: inputs/image/sd_models
  2. Select your model from the drop-down list
  3. Set up the model according to the parameters you need
  4. Enter your request (+ and - for prompt weighting)
  5. Click the Submit button to get the generated image animation

video:

  1. Upload the initial image
  2. Enter your request (for IV2Gen-XL)
  3. Set up the model according to the parameters you need
  4. Click the Submit button to get the video from image

cascade:

  1. Enter your request
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the generated image

extras:

  1. Upload the initial image
  2. Select the options you need
  3. Click the Submit button to get the modified image

ZeroScope 2:

  1. Enter your request
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the generated video

TripoSR:

  1. Upload the initial image
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the generated 3D object

Shap-E:

  1. Enter your request or upload the initial image
  2. Set up the model according to the parameters you need
  3. Click the Submit button to get the generated 3D object

AudioCraft:

  1. Select a model from the drop-down list
  2. Select model type (musicgen or audiogen)
  3. Set up the model according to the parameters you need
  4. Enter your request
  5. (Optional) upload the initial audio if you are using melody model
  6. Click the Submit button to get the generated audio

Optional: You can enable multiband diffusion to improve the generated audio

AudioLDM 2:

  1. Select a model from the drop-down list
  2. Set up the model according to the parameters you need
  3. Enter your request
  4. Click the Submit button to get the generated audio

Demucs:

  1. Upload the initial audio to separate
  2. Click the Submit button to get the separated audio

ModelDownloader:

  • Here you can download LLM and StableDiffusion models. Just choose the model from the drop-down list and click the Submit button

LLM models are downloaded here: inputs/text/llm_models

StableDiffusion models are downloaded here: inputs/image/sd_models

Settings:

  • Here you can change the application settings. For now you can only change Share mode to True or False

System:

  • Here you can see the indicators of your computer's sensors by clicking on the Submit button

Additional Information:

  1. All generations are saved in the outputs folder
  2. You can press the Clear button to reset your selection
  3. To stop the generation process, click the Stop generation button
  4. You can turn off the application using the Close terminal button
  5. You can open the outputs folder by clicking on the Folder button

Where can i get models and voices?

  • LLM models can be taken from HuggingFace or from ModelDownloader inside interface
  • StableDiffusion, vae, inpaint, embedding and lora models can be taken from CivitAI or from ModelDownloader inside interface
  • AudioCraft, AudioLDM 2, TTS, Whisper, Wav2Lip, SunoBark, MoonDream2, Upscale, GLIGEN, Depth, Pix2Pix, Controlnet, AnimateDiff, Videos, Cascade, Rembg, Roop, CodeFormer, TripoSR, Shap-E, Demucs, ZeroScope and Multiband diffusion models are downloads automatically in inputs folder when are they used
  • You can take voices anywhere. Record yours or take a recording from the Internet. Or just use those that are already in the project. The main thing is that it is pre-processed!

Wiki

Acknowledgment to developers

Many thanks to these projects because thanks to their applications/libraries, i was able to create my application:

First of all, I want to thank the developers of PyCharm and GitHub. With the help of their applications, i was able to create and share my code

Third Party Licenses:

Many models have their own license for use. Before using it, I advise you to familiarize yourself with them:

Donation

If you liked my project and want to donate, here is options to donate. Thank you very much in advance!

  • CryptoWallet(BEP-20) - 0x3d86bdb5f50b92d0d7Eb44F1a833acC5e91aAEcA

  • "Buy Me A Coffee"

About

(Windows/Linux) Local WebUI with neural network models (LLM, Stable Diffusion, AudioCraft, AudioLDM2, TTS, Bark, Whisper, Demucs, LibreTranslate, ZeroScope2, TripoSR, Shap-E, GLIGEN, Wav2Lip, Roop, Rembg, CodeFormer, Moondream 2) on python (In Gradio interface)

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks