Skip to content

benbenz/django-llm-portal

 
 

Repository files navigation

features

  • same features as ryan-blunden/django-chatgpt-clone
  • safer client conversations handling (not using window.localStorage)
  • conversations stored in database, per user
  • added R.A.G. mode using LlamaIndex
  • document preview + highlighting
  • "dynamic applications" (DynApps): upload a set of documents
  • connection to MS Graph API for live sharepoint ingestion
  • sequential ingestion (repetitive), with recovery
  • parametrization of RAG through settings: chat behavior, query type (summarize, etc)
  • handling of OpenAI, Azure OpenAI and local LLM
  • handling of multiple indexing configurations (loading params, chunking params, index type, embeddings, language, etc.)
  • handling of different vector stores: simple (default), qdrant, pgvector, chroma, elastic search
  • handling of multiple tenants+applications from the same code base
  • possibility of mutualizing the embeddings when creating indexes accross multiple stores
  • search feature (powered by Elasticsearch)
  • ASGI + asyncio implementation (as well as WSGI + sync)
  • full offline mode with local LLM/embeddings and local static files
  • TODO: different levels of security and sharding per tenants

install

docker-compose

Make sure:

  1. HOST environment variable is set in the .env file.
  2. HOST value also needs to match the ./.certs/HOST/ path that needs to be present for NGinx to work.
  3. logs, data and storage directories exist
  4. apps.json and llms.json in chat fixtures, users.json in users fixtures and .env file ready
# install docker
https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository

# generate localhost certificates if you want to test locally
./make_certs_localhost.sh

# run
docker-compose up -d
# or (using the plugin)
docker compose up -d

# build and run
docker-compose up -d --build

# if issues: force rebuild with debug output
docker-compose build --progress=plain --no-cache

# zoom-in
docker-compose run --rm -it -django_app bash

locally

applications / env

nix-shell # if not using nix-shell you will need openssl library, libjpeg , zlib and docker (if using qdrant)
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements/local.txt 

Init the *.json and portal image

cp src/chat/fixtures/apps_default.json src/chat/fixtures/apps.json 
cp src/chat/fixtures/llms_default.json src/chat/fixtures/llms.json 
cp src/chat/fixtures/tenants_default.json src/chat/fixtures/tenants.json 
cp src/users/fixtures/users_default.json src/users/fixtures/users.json 
curl http://...... --output src/llm_portal/static/images/portal_image.svg

# edit the files accordingly to your liking ....

init data

python manage.py migrate
python manage.py inittenants
python manage.py initusers
python manage.py initapps
python manage.py initllms

set config

cp sample.env .env

... edit the .env config ... (api keys, system prompts, options, etc.)

Index the data

Edit the ./idx files to your liking:

cp idx/rag.idx.sample idx/rag.idx
cp idx/ragdyn.idx.sample idx/ragdyn.idx
# use the no cache config to index ...
DJANGO_SETTINGS_MODULE=llm_portal.settings.production_no_cache python manage.py index YOUR_APP

Build the Tailwind files / deploy etc.

./reset_deploy.sh

run server + indexing + cleaning process for dynapps & files

DynApps indexing process

python manage index_dynapps --use_loop

DynApps and Files cleaning process

python manage clean_all --use_loop

Sync

python manage.py runserver

gunicorn -w 8 src.llm_portal.wsgi

Test sync concurrency with: gunicorn -w 1 src.llm_portal.wsgi (this should actually NOT be concurrent)

Important Note: currently the streaming response is blocking and - if you only have one worker registered with the server - it will block the other requests until the streaming response is finished !

ASync

python -m uvicorn --loop asyncio src.llm_portal.asgi:application

Test async concurrency with: python -m uvicorn --workers 1 --loop asyncio src.llm_portal.asgi:application

The async URLs are /async/chat and /async/files

Important Note: if you want to run the portal with the async methods, you should use uvicorn instead of daphne. The streaming is choppy with daphne but not with uvicorn...

vector stores

For more information about the vector stores, please see here

Using NGinx

When using NGinx, it is good to use the following configuration:

server {
    listen 443 ssl;

    server_name YOUR_DOMAIN;
    index index.php index.html index.htm;

    ssl_certificate      PATH_TO_FULLCHAIN_PEM;
    ssl_certificate_key  PATH_TO_PRIVKEY_PEM;

    gzip on;
    gzip_disable "msie6";

    gzip_vary on;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_buffers 16 8k;
    gzip_http_version 1.1;
    gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript application/pdf;

    location / {
        proxy_pass YOUR_UVICORN_HOST_PORT;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	    proxy_set_header X-Forwarded-Proto https;
        proxy_set_header Host $http_host;
	    proxy_buffering off;
        proxy_redirect off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }    error_page 404 /404.html;
    error_page 500 502 503 504 /50x.html;
    
    location = /50x.html {
        root /usr/share/nginx/html;
    }
}

Testing

  1. Make sure production docker containers are running

  2. Start PGBouncer (if using llm_portal.settings.loadtesting)

cd conf/pgbouncer
pgbouncer pgbouncer.simple.ini
  1. init users, appswithout users.conf/apps.conf but users_default.conf/apps_default.conf:
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting python manage.py initusers --use_default
# for v2+
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting python manage.py initapps --use_default
# And init the LLMs now (for v2)
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting python manage.py initllms
  1. Index the 'testing' application
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting python manage.py index testing
# or, older way (v1)
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting APP_NAME=testing python manage.py

Note: you need to have some data in the data/testing directory ...

  1. Run server

for v1 Make sure the .env settings are connecting to pgbouncer and not the databases directly. You can have the same user/passwords in the userlist.txt for pgbouncer so this avoids the need to rewrite the user/password in the .env file everytime you want to switch back to direct connections.

...
PGVECTOR_HOST="127.0.0.1"
PGVECTOR_DB="db_bouncer_pgvector" # HERE
PGVECTOR_PORT="6432" # HERE
PGVECTOR_USER="pgvectoruser"
PGVECTOR_PASSWORD="pgvectorpassword"
PGVECTOR_TEXT_SEARCH_CONFIG="english"
...
DB_ENGINE="django.db.backends.postgresql"
DB_HOST="localhost"
DB_NAME="db_bouncer_pgsql" # HERE
DB_PORT="6432" # HER 
DB_USER="pgsql"
DB_PASSWORD="pgsqlpassword"
...
# with uvicorn (async)
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting APP_NAME=testing python -m  uvicorn --workers 4 --port 8000 --host 0.0.0.0 --loop asyncio src.llm_portal.asgi:application

# with gunicorn (sync)
DJANGO_SETTINGS_MODULE=llm_portal.settings.loadtesting APP_NAME=testing gunicorn -w 4 -b :8000 --timeout 320 src.llm_portal.wsgi
  1. Functionnal testing
DJANGO_SETTINGS_MODULE=llm_portal.settings.testing python manage.py test
  1. Load testing with Locus
locust -f test/load/locustfile.py

Note 1: Make sure that the variables LOCUST_*** are correct in the .env file.

Note 2: If the client can't login and you see this error in the Locust debug: "gaierror(8, 'nodename nor servname provided, or not known')", you may need to add your domain to the /etc/hosts file. For example, for http://testing.localhost:8000/, add 127.0.0.1 testing.localhost

Note 3: Do NOT add a trailing slash at the end of the server URL

Notes (from older project ... to be updated)

  1. PostGreSQL:
  • Ubuntu: /etc/postgresql/15/main/postgresql.conf

  • set max db connections in PostGreSQL to 100

  • set work_mem to 96M maybe?

  • Dont forget to make sure this is set: ALTER ROLE DB_USER SET client_encoding TO 'utf8'; ALTER ROLE DB_USER SET default_transaction_isolation TO 'read committed'; ALTER ROLE DB_USER SET timezone TO 'UTC';

    For PostGres 15: ALTER DATABASE DB_NAME OWNER TO DB_USER;

  • TUNER FOR PostGreSQL: https://pgtune.leopard.in.ua/#/

  1. PGBOUNCER (new terminal window):
  • set max_client_conn in pgbouncer to 10000 =>>
  • set ulimit to 10212 !! (suggested by pgbouncer at startup) ulimit -n 10212
  • pgbouncer pgbouncer.simple.ini -q
  1. Memcached (2GB memory allocated + 5000 concurrent connections) (new terminal window):
  • ulimit -n 10212
  • memcached -m 2048 -c 5000 -vvv start or
  • memcached -m 2048 -c 5000 -s ../3rdparty/memcached/memcached.sock -vvv -a 0770 start and change settings.py so that location is ./3rdparty/memcached.sock file
  1. Django (new terminal window)
  • python makemigrations
  • python migrate issue with PostGreSql 15: https://stackoverflow.com/questions/74110708/postgres-15-permission-denied-for-schema-public (SCHEMA PUBLIC etc...)
  • python manage.py initdata_massive
  • 'CONN_MAX_AGE' : 0 in DB settings
  • ulimit -n 10212 (before starting server too) (for files)
  • launchctl limit maxproc 2000 2048 (for processes/threads)
  • ulimit -u 1000 (for processes/threads) (note: sometimes those changes wont be accepted: start a new window and try again?)
  • source .venv/bin/activate
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; python manage.py runserver or
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; gunicorn --workers=8 --threads=2 --worker-connections=3000 t3mp3st_api.wsgi:application or
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; uwsgi --http :8000 --wsgi-file ./t3mp3st_api/wsgi.py --master --processes 4 or
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; daphne -b 0.0.0.0 -p 8001 t3mp3st_api.asgi:application or
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; uvicorn --workers=5 --lifespan off t3mp3st_api.asgi:application or
  • DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; uvicorn --workers=9 --lifespan off --host 0.0.0.0 t3mp3st_api.asgi:application

If uvicorn runs on AWS/EC2, use --host 0.0.0.0 option

  1. Locust (new terminal window):
  • source .venv/bin/activate
  • set ulimit too !! (suggested by Locust) (or it wont be able to access host file and we get "gaierror(8, 'nodename nor servname provided, or not known')") ulimit -n 10000
  • locust -f ../test/load/locustfile.py
  1. Run test:

LOAD TEST ISSUES:

  • USE loadtesting SETTINGS to avoid more "too many open files" issues !! (based on production server)
  • too many open files >> ulimit -n 10212 (for django, memcached, locust and pgbouncer)
  • too many DB connections >> use pgbouncer
  • too many threads:

USING THE ASYNC SERVER WITH DAPHNE: DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; daphne -b 0.0.0.0 -p 8001 t3mp3st_api.asgi:application DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; daphne -b 0.0.0.0 -p 8001 t3mp3st_api.asgi:application USING THE ASYNC SERVER WITH UVICORN: DJANGO_SETTINGS_MODULE=t3mp3st_api.settings.loadtesting; uvicorn --workers=9 --lifespan off t3mp3st_api.asgi:application

ON local machine: memcached -m 2048 -c 5000 -s ../3rdparty/memcached/memcached.sock -a 0770 start pgbouncer pgbouncer.simple.ini -q

sudo -u postgres /Library/PostgreSQL/11/bin/pg_ctl -D /Library/PostgreSQL/11/data restart

MORE ABOUT SYSTEM limits: https://unix.stackexchange.com/questions/108174/how-to-persistently-control-maximum-system-resource-consumption-on-mac

Packages

No packages published

Languages

  • HTML 74.8%
  • JavaScript 10.6%
  • Fluent 7.9%
  • Python 4.1%
  • Java 1.7%
  • CSS 0.9%