Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker support #1239

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

Add docker support #1239

wants to merge 7 commits into from

Conversation

nopperl
Copy link

@nopperl nopperl commented May 31, 2023

Description

Add support to run the UI using docker.
Since the previous PRs (#403, #844) stalled, I merged their approaches and fixed remaining issues.

Notes

To improve security, the process is run using a non-root user inside the container. Since bind-mounts are owned by root inside the container, the entrypoint.sh script changes ownership to the non-root user to make it writable.

Environment and Testing

  • Ubuntu 22.04
  • Docker 24.0.2
  • Nvidia Container Toolkit 1.13.1

@vladmandic
Copy link
Owner

thanks for picking this up!

tcmalloc is amazing, but i don't want to go down the path of me installing it. can you remove all mentions of it?
and yes, tcmalloc should make its way into faq, but that's besides the point

don't modify README.md - better create a Wiki page for Docker and then it can be as short or as long as you want
i can create a link on README.md that points to Wiki page

do we need default ./data at all?
i totally agree that --data-dir should be specified, but why default to ./data?

@nopperl
Copy link
Author

nopperl commented Jun 2, 2023

@vladmandic I have now removed tcmalloc and the changes to the README.md.

Regarding --data-dir, I thought that using a subdir of the workdir of the container would be a sane default. But using /data or something else as default also works.

@vladmandic
Copy link
Owner

just an idea - having data inside container is really against the concept of containers.
how about making --data-dir mandatory instead?

for example:

RUN [ -z "$--data-dir" ] && echo "Must specify data directory" && exit 1 || true

@nopperl
Copy link
Author

nopperl commented Jun 2, 2023

Good idea, I have made it mandatory now

@vladmandic
Copy link
Owner

looks good to me, but please tell me you've actually tested it? :)

@nopperl
Copy link
Author

nopperl commented Jun 2, 2023

I built it again from scratch and noticed an error ^^
The requirements.txt file was ignored due to the /*.txt entry in the ignore file. Now it works.

@FullBleed
Copy link

Can you guys talk about the security benefits/pros and cons of using this?

@vladmandic
Copy link
Owner

Can you guys talk about the security benefits/pros and cons of using this?

talk about benefits of using docker in general? not really, that's really outside of the scope of this pr, this is to provide simple-to-use template.

@nopperl
Copy link
Author

nopperl commented Jun 20, 2023

@vladmandic I think its ready to be merged

@staff0rd
Copy link

staff0rd commented Jul 5, 2023

I merged master into this and have the following findings regarding docker compose up:

  • --skip-update appears no longer valid and should be removed
  • On recreate, installation is again attempted, including downloading of all packages including torch torchvision - probably the venv or wherever those get put should be a volume also

entrypoint.sh Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
@Kubuxu
Copy link
Contributor

Kubuxu commented Jul 6, 2023

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

environment:
DATA_DIR: "./data"
volumes:
- ./data:/webui/data
Copy link

@staff0rd staff0rd Jul 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solves the reinstall problem noted here.

Suggested change
- ./data:/webui/data
- ./data:/webui/data
- ./venv/lib:/webui/venv/lib
- ./repositories:/webui/repositories

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the above, i still see the following on re-create but i'm not sure where they are coming from:

Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 34.7MB/s]
Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 43.1MB/s]
Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 2.82MB/s]
Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 3.26MB/s]
Downloading (…)lve/main/config.json: 100% 4.52k/4.52k [00:00<00:00, 13.2MB/s]

Copy link

@djmaze djmaze Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clean to declare those folders as VOLUMEs in the Dockerfile. Then you could even leave out the bind mounts in the compose file so they will be created as anonymous volumes during launch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/webui/venv/lib contains files that were installed when building the image, so I think it would require additional changes to mount it

@nopperl
Copy link
Author

nopperl commented Jul 6, 2023

Very unfortunate that the --skip-update flag was removed, thanks for bringing it to my attention @staff0rd. I think solving this indirectly by storing the packages and repositories in a bind-mounted directory is suboptimal, since they're not application state and should be stored within the container. @vladmandic is there a plan to bring --skip-update back or is there an equivalent feature?

@nopperl
Copy link
Author

nopperl commented Jul 6, 2023

@Kubuxu thanks for the suggestions, I've fixed the env vars.

I would also suggest making the first argument to the entrypoint webui and setting it by default with RUN ["webui"] if the first argument is different that webui, exec arguments directly.

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

@Kubuxu
Copy link
Contributor

Kubuxu commented Jul 8, 2023

Could you clarify what you meant by this? Essentially running webui.sh instead of python launch.py in entrypoint.sh per default (with the possibility of specifying other commands)?

python launch.py is fine (even better as webui.sh is not needed). I didn't notice that you didn't use webui.sh.

Correction, not RUN but CMD.

But in essence, having the default run command in CMD either as "python", "launch.py" or as webui "alias" which is handed by entrypoint.sh, which then allows one to override it.

So for example

ENTRYPOINT ["/bin/bash", "-c", "${INSTALLDIR}/entrypoint.sh \"$0\" \"$@\""] # same as today
CMD ["webui"] 

Then the entrypoint.sh should detect webui at $1 and activate the env, and call python launch.py, otherwise it launches the command.
See postgress entrypoint as example:

#!/usr/bin/env bash
set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb
    fi
    shift
    exec gosu postgres "$@"
fi

exec "$@"

This will allow the user to both pass params to the launch.py like this: docker run image webui --api --backend diffusers and to run custom commands to test the image docker run --rm image nvidia-smi

Dockerfile Outdated Show resolved Hide resolved
@nopperl
Copy link
Author

nopperl commented Jul 10, 2023

@Kubuxu I think what you want to do here is already possible using the --entrypoint flag of docker run. So, for your example, you can do docker run --rm --entrypoint nvidia-smi image to override the entrypoint.

@Kubuxu
Copy link
Contributor

Kubuxu commented Jul 10, 2023

Yeah, this is another way of doing this. We can go down the --entrypoint path instead.

Dockerfile Outdated
@@ -0,0 +1,51 @@
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
Copy link

@djmaze djmaze Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose allowing to use different CUDA versions by using the following instead (adapted from llama.cpp):

ARG UBUNTU_VERSION=22.04
# This needs to generally match the container host's environment.
ARG CUDA_VERSION=11.8.0
# Target the CUDA runtime image
ARG BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}

FROM ${BASE_CUDA_CONTAINER}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented this now, although the user needs to be careful to specify compatible cuda and ubuntu versions.

fi

# Ensure that potentially bind-mounted directories are owned by the user that runs the service
chown -R $RUN_UID:$RUN_UID $DATA_DIR
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to add these lines as well. Otherwise image generation works only if the option Always save all generated images is enabled.

# Create directory for temporary files and assign it to the user that runs the service
mkdir /tmp/gradio
chown -R $RUN_UID:$RUN_UID /tmp/gradio


# Install automatic1111 dependencies (installer.py)
RUN . $INSTALLDIR/venv/bin/activate && \
python installer.py && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does anything, installer.py looks like it doesn't run any code if one tries to run it directly.

I think python launch.py --test is what you would want, except it installs the CPU-only version of Torch because docker compose build doesn't support runtimes if I understood it correctly. There is a --use-cuda flag but it doesn't actually force the use of cuda, but perhaps installer.py could be modified so that it does?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are plenty of flags that can be used as-is, there is even --skip-torch. but installing packages AND skipping torch is not viable since plenty of packages down the list require torch, so they would pull it in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to install torch and other dependencies while building the docker image, so skipping it wouldn't be of much use :) Bypassing the nvidia-smi check when --use-cuda is present makes this work for me at least.

The arg description says "force use nVidia CUDA backend", so IMO it would be ok to skip the check and crash if it doesn't work, but it's of course up to you to decide how you want your app to behave.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't assume docket image is for Nvidia only, but it's OK to require one of --use-xxxparams to be provided.

@hazrpg
Copy link

hazrpg commented Jul 30, 2023

I had issues building this from within Ubuntu (20.04). I'm going to document my experience so that you can see the troubles I had along the way to hopefully help me fix them, but ultimately fix it for others who might use it once this has been accepted as a merge. Please don't take this as negative criticism at all, cos I really do appreciate all the hard work you guys are putting into this! I hope my experiences can help to get this accepted. I just wish I knew more to help move things along.

I kept getting the error:

$ docker-compose up
ERROR: The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'

You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.

which I fixed by changing the docker-compose.yml file to not include name: sd-automatic since version 3.9 is defined on line 1 and name: is not supported. Please see: https://docs.docker.com/compose/compose-file/compose-file-v3/

You can also confirm it using the command docker-compose config which will tell you if the compose file is formatted correctly.

After I got past that error by removing the name variable, this was the error I got:

$ docker-compose up      
Building nvidia
Sending build context to Docker daemon   38.6MB
Step 1/17 : ARG UBUNTU_VERSION=22.04     CUDA_VERSION=11.8.0     BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}
Step 2/17 : FROM ${BASE_CUDA_CONTAINER}
invalid reference format
ERROR: Service 'nvidia' failed to build : Build failed

For some reason BASE_CUDA_CONTAINER=nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION} isn't being evaluated properly. I had to fix this by hardcoding it into the file so the line was:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

And I changed it to:

ARG UBUNTU_VERSION=20.04 \
    CUDA_VERSION=12.1.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04

Although I changed it to 20.04 and 12.1.0 (which I confirmed by going to: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=12.1.0-cudnn8-runtime-ubuntu), I'm pretty sure changing it to:

ARG UBUNTU_VERSION=22.04 \
    CUDA_VERSION=11.8.0 \
    BASE_CUDA_CONTAINER=nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04

Would work fine since that does exist too: https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=11.8.0-cudnn8-runtime-ubuntu

The main issue seems to be with BASE_CUDA_CONTAINER not accepting the variables ${CUDA_VERSION} and ${UBUNTU_VERSION} even though in my mind that looks sane. I tried putting quotes in so that the full line was BASE_CUDA_CONTAINER="nvidia/cuda:${CUDA_VERSION}-cudnn8-runtime-ubuntu${UBUNTU_VERSION}" but that didn't work.

The next issue is the tzdata, it would be good to set a default during installation with an ENV so that you can set your own, since just doing docker-compose up without any commands forces you with this dialogue after installing all the apt packages:

Configuring tzdata
------------------

Please select the geographic area in which you live. Subsequent configuration
questions will narrow this down by presenting a list of cities, representing
the time zones in which they are located.

  1. Africa      4. Australia  7. Atlantic  10. Pacific  13. Etc
  2. America     5. Arctic     8. Europe    11. SystemV
  3. Antarctica  6. Asia       9. Indian    12. US
Geographic area:

But when you type in 8 and hit enter, nothing happens. I had to stop the instance in portainer, and recreate it but with the -it flags so that I could interact with it in an attached tty window to the instance. That then allowed me to do the required continent, follow by the required city.

But once those were in, and it finished setting up. It just stopped running. Trying to re-run it, it obviously continues where it left off because all the packages are installed and tzdata is already set up, and then stops straight away. Trying to diagnose what the last message was and docker says there are no logs it can access for it.

Re-running it in the terminal again to make sure I didn't miss anything and I get:

$ docker run 5f270feee059 

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

Oh! So must be the GPU permission, but still:

$ docker run 5f270feee059 --gpus=all

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Slightly more information, tried it with the runtime=nvidia parameter as per the nvidia documentation for CUDA:

$ docker run 5f270feee059 --gpus all --runtime=nvidia

==========
== CUDA ==
==========

CUDA Version 12.1.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: --: invalid option
exec: usage: exec [-cl] [-a name] [command [arguments ...]] [redirection ...]

Hmmm... tried the nvidia test using the same base cuda I used for the installation of nvidia/cuda:12.1.0-base-ubuntu20.04:

$ sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.1.0-base-ubuntu20.04 nvidia-smi
[sudo] password for hazrpg: 
Unable to find image 'nvidia/cuda:12.1.0-base-ubuntu20.04' locally
12.1.0-base-ubuntu20.04: Pulling from nvidia/cuda
56e0351b9876: Already exists 
b0f696c0aebb: Pull complete 
e627444df06f: Pull complete 
dcf21018e934: Pull complete 
a2855a2ef2e0: Pull complete 
Digest: sha256:d0bf043a20ecc11940c5a452f67f239f9dec34a01d8f5583d2af93cf0da0f072
Status: Downloaded newer image for nvidia/cuda:12.1.0-base-ubuntu20.04
Sun Jul 30 02:40:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        Off | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P5              16W / 170W |   1572MiB / 12288MiB |     13%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

So everything is set up fine for docker, but the image still isn't working. Not sure where I am going wrong, but I feel like I'm close!

Note that I pulled this from the master branch on nopperl:master to test this out.

Edit: I realised after submitting that I hadn't tried the proper image for the nvidia test of nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04 so I changed it but still got the same result as above. I also realised that I had used sudo for the pre-build compose image (like I had for the nvidia test image) so re-ran sudo docker run 5f270feee059 --gpus all --runtime=nvidia to make sure the issue wasn't a permissions problem trying to access the hardware, but that still also gave me the same results as before. So not overly sure what's going on.

@hazrpg
Copy link

hazrpg commented Jul 30, 2023

Didn't want to give up, so tried one more time - scrapped and purged everything, reset the repo back to how it was, and did docker-compose up again. The BASE_CUDA_CONTAINER was still an issue, so instead of setting it to my ubuntu version and the cuda I have installed, I just used the 22.04 and 11.8.0 from the original file, and changed BASE_CUDA_CONTAINER to be hardcoded to those versions instead (figure maybe that was why there was an issue).

This time I got a lot further! It installed correctly, run through everything. But this time in the terminal it looked like it had stopped doing anything after Available models: ./data/models/Stable-diffusion 0.

Started up another terminal and attached to the running image, and saw a different message saying Download the default model? (y/N) so I typed in y and hit enter. It started downloading the sd 1.5 model - perfect!

Then afterwards I got:

nvidia_1  | 03:39:52-863637 ERROR    Module load: /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py: AttributeError

Followed by a long traceback log, but it looked like it was still going and did...

nvidia_1  | Image Browser: ImageReward is not installed, cannot be used.
nvidia_1  | 03:40:15-057529 INFO     Loading UI theme: name=black-orange style=Auto                                                                                            
nvidia_1  | Image Browser: Creating database
nvidia_1  | Image Browser: Database created
nvidia_1  | 03:40:16-030004 ERROR    Failed reading extension data from Git repository: a1111-sd-webui-lycoris: HEAD is a detached symbolic reference as it points to          
nvidia_1  |                          'b0d24ca645b6a5cb9752169691a1c6385c6fe6ae'                                                                                                
nvidia_1  | 03:40:16-036250 ERROR    Failed reading extension data from Git repository: clip-interrogator-ext: HEAD is a detached symbolic reference as it points to           
nvidia_1  |                          '9e6bbd9b8931bbe869a8e28e7005b0e13c2efff0'                                                                                                
nvidia_1  | 03:40:16-045836 ERROR    Failed reading extension data from Git repository: multidiffusion-upscaler-for-automatic1111: HEAD is a detached symbolic reference as it 
nvidia_1  |                          points to '70b3c5ea3c9f684d04e7ff59167565974415735c'                                                                                      
nvidia_1  | 03:40:16-053253 ERROR    Failed reading extension data from Git repository: sd-dynamic-thresholding: HEAD is a detached symbolic reference as it points to         
nvidia_1  |                          'f02cacfc923e8bbf73f25327d722d50c458d66bb'                                                                                                
nvidia_1  | 03:40:16-066565 ERROR    Failed reading extension data from Git repository: sd-extension-system-info: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          '8046b1544513cea06d1c41748c22727c930323ab'                                                                                                
nvidia_1  | 03:40:16-075336 ERROR    Failed reading extension data from Git repository: sd-webui-controlnet: HEAD is a detached symbolic reference as it points to             
nvidia_1  |                          '7b707dc1f03c3070f8a506ff70a2b68173d57bb5'                                                                                                
nvidia_1  | 03:40:16-085855 ERROR    Failed reading extension data from Git repository: sd-webui-model-converter: HEAD is a detached symbolic reference as it points to        
nvidia_1  |                          'f6e0fa5386fb82ef44feac74d66958af951fcc48'                                                                                                
nvidia_1  | 03:40:16-097230 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-images-browser: HEAD is a detached symbolic reference as it     
nvidia_1  |                          points to '75af6d0c32b72350b2f140f186cd8ce0e24dda10'                                                                                      
nvidia_1  | 03:40:16-111035 ERROR    Failed reading extension data from Git repository: stable-diffusion-webui-rembg: HEAD is a detached symbolic reference as it points to    
nvidia_1  |                          '657ae9f5486019a94dbe11d3560b28cccf35a0fd'                                                                                                
nvidia_1  | 03:40:16-147008 INFO     Setting Torch parameters: dtype=torch.float16 vae=torch.float16 unet=torch.float16                                                        
Loading weights: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.3 GB -:--:--
nvidia_1  | LatentDiffusion: Running in eps-prediction mode
nvidia_1  | DiffusionWrapper has 859.52 M params.
Downloading (…)olve/main/vocab.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 961k/961k [00:00<00:00, 2.82MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 1.84MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 389/389 [00:00<00:00, 2.08MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████| 905/905 [00:00<00:00, 5.89MB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████| 4.52k/4.52k [00:00<00:00, 23.9MB/s]
nvidia_1  | 03:40:19-248309 INFO     Model created from config: /webui/configs/v1-inference.yaml                                                                               
Calculating model hash: /webui/data/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 GB 0:00:00
nvidia_1  | 03:40:39-639737 INFO     Applying scaled dot product cross attention optimization                                                                                  
nvidia_1  | 03:40:39-649293 INFO     Embeddings loaded: 0 []                                                                                                                   
nvidia_1  | 03:40:39-661568 INFO     Model loaded in 23.5s (load=0.6s create=2.5s hash=2.2s apply=17.4s vae=0.5s move=0.3s)                                                    
nvidia_1  | 03:40:40-197750 INFO     Model load finished: {'ram': {'used': 9.04, 'total': 62.59}, 'gpu': {'used': 3.36, 'total': 11.75}, 'retries': 0, 'oom': 0}               
nvidia_1  | Running on local URL:  http://0.0.0.0:7860
nvidia_1  | 
nvidia_1  | To create a public link, set `share=True` in `launch()`.
nvidia_1  | 03:40:40-532231 INFO     Local URL: http://localhost:7860/                                                                                                         
nvidia_1  | 03:40:40-533238 INFO     API Docs: http://localhost:7860/docs                                                                                                      
nvidia_1  | 03:40:40-533900 INFO     Initializing middleware                                                                                                                   
nvidia_1  | ╭─────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────╮
nvidia_1  | │ /webui/launch.py:149 in <module>                                                                                                                                │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   148                                                                                                                                                           │
nvidia_1  | │ ❱ 149     instance = start_server(immediate=True, server=None)                                                                                                  │
nvidia_1  | │   150     while True:                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/launch.py:129 in start_server                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   128         else:                                                                                                                                             │
nvidia_1  | │ ❱ 129             server = server.webui()                                                                                                                       │
nvidia_1  | │   130     if args.profile:                                                                                                                                      │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:274 in webui                                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   273     start_common()                                                                                                                                        │
nvidia_1  | │ ❱ 274     start_ui()                                                                                                                                            │
nvidia_1  | │   275     load_model()                                                                                                                                          │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:265 in start_ui                                                                                                                                 │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   264     modules.progress.setup_progress_api(app)                                                                                                              │
nvidia_1  | │ ❱ 265     create_api(app)                                                                                                                                       │
nvidia_1  | │   266     ui_extra_networks.add_pages_to_demo(app)                                                                                                              │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/webui.py:166 in create_api                                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   165     log.debug('Creating API')                                                                                                                             │
nvidia_1  | │ ❱ 166     from modules.api.api import Api                                                                                                                       │
nvidia_1  | │   167     api = Api(app, queue_lock)                                                                                                                            │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/api.py:17 in <module>                                                                                                                        │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, images, scripts,                                                                   │
nvidia_1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-import, wildcard-impo                                                                  │
nvidia_1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, StableDiffusionProcessi                                                                  │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:106 in <module>                                                                                                                    │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   105     ]                                                                                                                                                     │
nvidia_1  | │ ❱ 106 ).generate_model()                                                                                                                                        │
nvidia_1  | │   107                                                                                                                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/modules/api/models.py:91 in generate_model                                                                                                               │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)                                                                                     │
nvidia_1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True                                                                                     │
nvidia_1  | │    92         DynamicModel.__config__.allow_mutation = True                                                                                                     │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construction.py:205 in __getattr__                                                           │
nvidia_1  | │                                                                                                                                                                 │
nvidia_1  | │   204                         return getattr(self, '__pydantic_core_schema__')                                                                                  │
nvidia_1  | │ ❱ 205             raise AttributeError(item)                                                                                                                    │
nvidia_1  | │   206                                                                                                                                                           │
nvidia_1  | ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
nvidia_1  | AttributeError: __config__
nvidia_1  | stable-diffusion-automatic-xl-docker_nvidia_1 exited with code 1

And that's when it exited.

Re-running docker-compose up or even just running the image directly, gives me all the same errors (except this time it isn't downloading anything, it looks like its just trying to use what it had).

So, still not working, but at least it was a derp moment on my part for putting in lower ubuntu version and a higher cuda version. There does appear to be an issue getting some of the needed dependencies, such as the extensions (although not fully required technically to get it working), and loading up the /webui/extensions-builtin/sd-webui-controlnet/scripts/api.py script. And also running the middleware. The middleware being the thing that crashes it.

@JohanAR
Copy link
Contributor

JohanAR commented Jul 30, 2023

I think docker-compose has been deprecated in favour of "docker compose". IIRC that ought to solve the top-level name tag error.

@hazrpg
Copy link

hazrpg commented Jul 31, 2023

@JohanAR Sure, you're not wrong that "docker compose" is the preferred method and "docker-compose" is deprecated and is a stub for legacy reasons to "docker compose" in the latest versions of docker.

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x (ref: nvidia install guide. Which meant I had to downgrade to 20.10 a long while back to get anything CUDA working without some hacky workaround.

And the docker command does not support docker compose on version 20.10.x:

$ docker compose
docker: 'compose' is not a docker command.
See 'docker --help'

Which means, most people should be running on docker 20.10.x if they want to have the CUDA toolkit on Linux properly, or even in the cloud for that matter. And I believe those on Windows will likely experience similar issues since that recommends going through the WDL2 route.

There are workarounds to this obviously on the latest version of docker, which as far as I understand crashes on the latest-latest (which means you have to always be running a slightly older version of 23.x.x or 24.x.x) but that would mean this repo would need to support said workarounds or other people will post issue after issue that it isn't working for them.

I'm going through the process of upgrading back to the latest version - cos I would love to be proved wrong - and will report back my findings, but I suspect I will end up having to figure a bunch of workarounds to get it to work properly.

@djmaze
Copy link

djmaze commented Jul 31, 2023

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

@hleroy
Copy link

hleroy commented Aug 1, 2023

Running in to the same issue as @hazrpg, it fails when "Initializing middleware". I'm not sure what the Python code is doing, but it seems to be missing some configuration attributes, maybe?

Configuration

Ubuntu 22.04.2 LTS
Docker version 24.0.5
Docker Compose version v2.20.2
NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ? The reason I'm asking is that it's quite uncommon to have to attach to the running container to answer setup parameters. It works but it's not usual with Docker builds.

sd-automatic-nvidia-1  | Running on local URL:  http://0.0.0.0:7860
sd-automatic-nvidia-1  | 
sd-automatic-nvidia-1  | To create a public link, set `share=True` in `launch()`.
sd-automatic-nvidia-1  | 14:55:30-633627 INFO     Local URL: http://localhost:7860/                      
sd-automatic-nvidia-1  | 14:55:30-637451 INFO     API Docs: http://localhost:7860/docs                   
sd-automatic-nvidia-1  | 14:55:30-640605 INFO     Initializing middleware                                
sd-automatic-nvidia-1  | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
sd-automatic-nvidia-1  | │ /webui/launch.py:149 in <module>                                             │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   148                                                                        │
sd-automatic-nvidia-1  | │ ❱ 149     instance = start_server(immediate=True, server=None)               │
sd-automatic-nvidia-1  | │   150     while True:                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/launch.py:129 in start_server                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   128         else:                                                          │
sd-automatic-nvidia-1  | │ ❱ 129             server = server.webui()                                    │
sd-automatic-nvidia-1  | │   130     if args.profile:                                                   │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:274 in webui                                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   273     start_common()                                                     │
sd-automatic-nvidia-1  | │ ❱ 274     start_ui()                                                         │
sd-automatic-nvidia-1  | │   275     load_model()                                                       │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:265 in start_ui                                              │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   264     modules.progress.setup_progress_api(app)                           │
sd-automatic-nvidia-1  | │ ❱ 265     create_api(app)                                                    │
sd-automatic-nvidia-1  | │   266     ui_extra_networks.add_pages_to_demo(app)                           │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/webui.py:166 in create_api                                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   165     log.debug('Creating API')                                          │
sd-automatic-nvidia-1  | │ ❱ 166     from modules.api.api import Api                                    │
sd-automatic-nvidia-1  | │   167     api = Api(app, queue_lock)                                         │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/api.py:17 in <module>                                     │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    16 from modules import errors, shared, sd_samplers, deepbooru, sd_hijack, │
sd-automatic-nvidia-1  | │ ❱  17 from modules.api.models import * # pylint: disable=unused-wildcard-imp │
sd-automatic-nvidia-1  | │    18 from modules.processing import StableDiffusionProcessingTxt2Img, Stabl │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:106 in <module>                                 │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   105     ]                                                                  │
sd-automatic-nvidia-1  | │ ❱ 106 ).generate_model()                                                     │
sd-automatic-nvidia-1  | │   107                                                                        │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/modules/api/models.py:91 in generate_model                            │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │    90         DynamicModel = create_model(self._model_name, **model_fields)  │
sd-automatic-nvidia-1  | │ ❱  91         DynamicModel.__config__.allow_population_by_field_name = True  │
sd-automatic-nvidia-1  | │    92         DynamicModel.__config__.allow_mutation = True                  │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │ /webui/venv/lib/python3.10/site-packages/pydantic/_internal/_model_construct │
sd-automatic-nvidia-1  | │ ion.py:205 in __getattr__                                                    │
sd-automatic-nvidia-1  | │                                                                              │
sd-automatic-nvidia-1  | │   204                         return getattr(self, '__pydantic_core_schema__ │
sd-automatic-nvidia-1  | │ ❱ 205             raise AttributeError(item)                                 │
sd-automatic-nvidia-1  | │   206                                                                        │
sd-automatic-nvidia-1  | ╰──────────────────────────────────────────────────────────────────────────────╯
sd-automatic-nvidia-1  | AttributeError: __config__
sd-automatic-nvidia-1  | 

@Nuullll
Copy link
Contributor

Nuullll commented Aug 2, 2023

Also, is it possible to pass a flag to avoid the prompt "Download the default model? (y/N)" ?

@hleroy --no-download

@aeberts
Copy link

aeberts commented Aug 7, 2023

Firstly, thanks to everyone for the great work put into vladmandic/automatic! I'm recording my experiences trying to use the Dockerfile with vast.ai in case it is useful for others. My apologies if the approach I took was not best practices or just plain wrong - I'm fairly new to docker so please take the following as the experiences of a naive end-user trying to get this to work on a GPU cloud provider.

My use case is that I have a Macbook Pro but I would like to build and use a docker image of vladmandic/automatic that can be used on a GPU cloud provider like vast.ai or runpod.io.

My config:

OS: MacOS Monterey 12.6
Docker engine: 24.0.2
Docker Compose: 2.19.1

Steps:

  • clone nopperl/automatic to my MBP
  • modify the Dockerfile FROM instruction: FROM --platform=linux/amd64 ${BASE_CUDA_CONTAINER}
  • run docker compose build -t alexeberts/stable-diffusion:sdnext-test-2 .
  • wait 30 mins
  • run docker push alexeberts/stable-diffusion:sdnext-test-2
  • setup template on vast.ai using alexeberts/stable-diffusion:sdnext-test-2
  • create instance on vast.ai using the ssh login option.
  • ssh into the instance and run entrypoint.sh

Results:

  • The container args INSTALLDIR etc are not automatically added to the new environment
  • After setting up the args manually, and running entrypoint.sh the server starts but with the same errors @hleroy and @hazrpg ran into.
  • I was not able to get a running instance of automatic.
  • I considered trying to build the image using docker compose build to see if I was missing configuration info from docker-compose.yml but I could not figure out how to ensure that docker compose build would build a linux/amd64 container (adding platform: linux/amd64 to the docker-compose.yml resulted in an error).

I'm happy to continue testing on vast.ai if someone can provide a linux image or instructions for how to successfully build a linux image from this repo on a MBP.

@hazrpg
Copy link

hazrpg commented Aug 9, 2023

However, NVIDIA CUDA Toolkit is only supported on Docker 20.10.x

I used it with Docker 23 as well as now 24, with Ubuntu 22.04 and now 23.04, using the apt package source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64.

It worked flawlessly out-of-the-box and I did not experience problems. IMHO there is no reason to keep using an old Docker version.

I did eventually try the upstream docker apt packages, instead of the Canonical/Debian ones. Looks like although the Nvidia toolkit says it doesn't support newer versions, the lovely docker peeps must have gotten around that and made sure it still works. So I stand corrected, thank you for pointing it out.

However I'm still stuck at the middleware stage sadly even with the newer docker and using docker compose.

@AIWintermuteAI
Copy link

Why did the PR stall? Was there a technical difficulty?

@vladmandic
Copy link
Owner

moving status to draft until comments are incorporated and maintainer is found.

@vladmandic vladmandic marked this pull request as draft November 18, 2023 16:25
@ilkersigirci
Copy link

What is the status of this PR? Using SD.next with docker install would be a huge win IMHO.

@vladmandic
Copy link
Owner

there are plenty of users using sdnext inside a docker container, but having an official dockerfile is tricky as everyone has their own idea what docker config should be like and it also varies on platform.

@FoxxMD
Copy link

FoxxMD commented Jan 11, 2024

On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:

docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion

@ilkersigirci
Copy link

On that note for anyone looking for a "one-click" docker deploy -- I have contributed to and am using grokuku/stable-diffusion on a linux host with nvidia gpu. It "just works" and stays up-to-date with master branch automatically. Read the readme ofc but an example run command:

docker run -d -p 9000:9000 -e "PUID=1000" -e "PGID=1000" -e "WEBUI_VERSION=04" -v /path/on/host/data:/config --runtime=nvidia --gpus all holaflenain/stable-diffusion

Thanks for the link, will try soon.

@abhiaagarwal
Copy link

I currently have a WIP branch right here for an NVIDIA CUDA-based docker image. Also installs onediff, tensorflow, and working on adding TensorRT support. It boots and works perfectly, but still tweaking it. Any contributions welcome for anyone who wants to add AMD/Intel/whatever support, I only have NVIDIA hardware so I can't test on other platforms.

Broadly, the installer.py file is hard to work with (I had to basically reverse engineer it and convert it into declarative docker statements) — perhaps it should be converted to a pyproject.toml file with dependencies declared imperatively rather than the current form. I think poetry is probably the best tool for this with its mature dependency group functionality, conda doesn't have the right ability.

abhiaagarwal@37944e9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet