Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail to create a docker container #1072

Closed
MuYi086 opened this issue Dec 18, 2023 · 11 comments
Closed

fail to create a docker container #1072

MuYi086 opened this issue Dec 18, 2023 · 11 comments
Labels
bug Something isn't working

Comments

@MuYi086
Copy link

MuYi086 commented Dec 18, 2023

Describe the bug
First i install CUDA Toolkit

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

Then use docker order and throw a error

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda
/usr/local/bin/com.docker.cli: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
ERRO[0000] error waiting for container: context canceled 

Information about your version

TabbyML/tabby latest

Information about your GPU

description: 3D controller
       product: GP107M [GeForce GTX 1050 Ti Mobile] [10DE:1C8C]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:146 memory:a3000000-a3ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:4000(size=128) memory:a4000000-a407ffff
  *-display
       description: VGA compatible controller
       product: UHD Graphics 630 (Mobile) [8086:3E9B]
       vendor: Intel Corporation [8086]
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:137 memory:a2000000-a2ffffff memory:80000000-8fffffff ioport:5000(size=64) memory:c0000-dffff

Additional context

system:  linux  deepin Community 20.9
processor:  Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
docker-desktop:  v4.26.1
@jbigler
Copy link

jbigler commented Dec 28, 2023

I think you need to install the nvidia-container-toolkit. That was how I got it running on my system.
I run the docker engine instead of the desktop, but I'm not sure if that will make a difference.

@faritor
Copy link

faritor commented Jan 1, 2024

Hello, i find a problem, i use docker compose run it, use nvidia gpu

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    container_name: tabby
    command: serve --model TabbyML/StarCoder-1B --device cuda
    volumes:
      - "./data:/data"
    ports:
      - 8080:8080
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - NVIDIA_VISIBLE_DEVICES=0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Mon Jan  1 14:07:33 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
...

docker install container toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

i use docker compose up -d, container create successful, but is not running, and no have any log.

i try use binary file running,

version:https://github.com/TabbyML/tabby/releases/tag/v0.7.0

./tabby_x86_64-manylinux2014 serve --model TabbyML/StarCoder-1B --device cuda

or

./tabby_x86_64-manylinux2014-cuda117 serve --model TabbyML/StarCoder-1B --device cuda

output

error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory

@jbigler
Copy link

jbigler commented Jan 6, 2024

i use docker compose up -d, container create successful, but is not running, and no have any log.

docker compose logs doesn't produce any output?
I see that you used sudo to run the standalone docker command to test the nvidia support, do you need to use sudo with docker compose as well? Have you configured docker to run without root privileges?

i try use binary file running,

version:https://github.com/TabbyML/tabby/releases/tag/v0.7.0

./tabby_x86_64-manylinux2014 serve --model TabbyML/StarCoder-1B --device cuda

or

./tabby_x86_64-manylinux2014-cuda117 serve --model TabbyML/StarCoder-1B --device cuda

output

error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory

This sounds like you are missing the openssl libraries on your local system, it shouldn't be related to the docker error.

Have you tried running it directly via docker without compose?
sudo docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda

@NicoMandel
Copy link

We are experiencing similar issues, where we are trying to run tabby on a local server to use for multiple people, where we have a few other docker instances running (p.ex. overleaf).
On a local machine with the following specs it works fine:

  • Ubuntu 20.04.3 LTS
  • Kernel 5.15.0-91-generic
  • 1x Geforce RTX 1080 Ti
  • Cuda 11.5

But on the local server machine with the following specs, it fails silently and we can't see where it fails:

  • Ubuntu 20.04.6 LTS
  • Kernel 5.15.0-91-generic
  • 2x Geforce RTX 2080 Ti
  • Cuda 12.3

Any hints?

@wsxiaoys
Copy link
Member

Hi,could you share the log output? it'll be also helpful to provide CPU information from /proc/cpuinfo

@NicoMandel
Copy link

There are unfortunately no outputs from the logs - that's the major issue! It just repeats

(base) root@rk6:~/tabby# docker compose up
[+] Running 1/0
 ✔ Container tabby  Created                                                                                                            0.0s
Attaching to tabby
tabby  | 2024-01-12T12:50:01.192111Z  INFO tabby::serve: crates/tabby/src/serve.rs:111: Starting server, this might takes a few minutes...
tabby exited with code 0

because it is set to restart in the docker compose file.

@NicoMandel
Copy link

Hi
here the output of cpuinfo. I omitted the 38 processors inbetween, because that's just redundant info:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 0
siblings	: 20
core id		: 0
cpu cores	: 10
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4988.31
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

[...]

processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4991.28
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

@faritor
Copy link

faritor commented Jan 28, 2024

docker compose logs doesn't produce any output? I see that you used sudo to run the standalone docker command to test the nvidia support, do you need to use sudo with docker compose as well? Have you configured docker to run without root privileges?

image

i try use sudo run it, but same

This sounds like you are missing the openssl libraries on your local system, it shouldn't be related to the docker error.

Have you tried running it directly via docker without compose? sudo docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda

image

i try it, but no output anything

@jbigler
Copy link

jbigler commented Feb 5, 2024

@faritor Does it work if you run it on the CPU instead of the GPU?

@faritor
Copy link

faritor commented Feb 6, 2024

jbigler

image

still so @jbigler

@wsxiaoys
Copy link
Member

wsxiaoys commented Feb 6, 2024

Hi here the output of cpuinfo. I omitted the 38 processors inbetween, because that's just redundant info:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 0
siblings	: 20
core id		: 0
cpu cores	: 10
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4988.31
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

[...]

processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4991.28
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Sorry for missing your response. The reason for exit immediately is that your cpu doesn't come with avx2, which is required by our llama.cpp build atm.

This is tracked in #1142

@wsxiaoys wsxiaoys added bug Something isn't working and removed bug-unconfirmed labels Feb 6, 2024
@wsxiaoys wsxiaoys closed this as completed Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants