fail to create a docker container #1072

MuYi086 · 2023-12-18T14:01:42Z

Describe the bug
First i install CUDA Toolkit

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

Then use docker order and throw a error

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda

/usr/local/bin/com.docker.cli: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
ERRO[0000] error waiting for container: context canceled

Information about your version

TabbyML/tabby latest

Information about your GPU

description: 3D controller
       product: GP107M [GeForce GTX 1050 Ti Mobile] [10DE:1C8C]
       vendor: NVIDIA Corporation [10DE]
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:146 memory:a3000000-a3ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:4000(size=128) memory:a4000000-a407ffff
  *-display
       description: VGA compatible controller
       product: UHD Graphics 630 (Mobile) [8086:3E9B]
       vendor: Intel Corporation [8086]
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 00
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:137 memory:a2000000-a2ffffff memory:80000000-8fffffff ioport:5000(size=64) memory:c0000-dffff

Additional context

system:  linux  deepin Community 20.9
processor:  Intel(R) Core(TM) i5-8300H CPU @ 2.30GHz
docker-desktop:  v4.26.1

jbigler · 2023-12-28T07:31:55Z

I think you need to install the nvidia-container-toolkit. That was how I got it running on my system.
I run the docker engine instead of the desktop, but I'm not sure if that will make a difference.

faritor · 2024-01-01T14:26:08Z

Hello, i find a problem, i use docker compose run it, use nvidia gpu

version: '3.5'

services:
  tabby:
    restart: always
    image: tabbyml/tabby
    container_name: tabby
    command: serve --model TabbyML/StarCoder-1B --device cuda
    volumes:
      - "./data:/data"
    ports:
      - 8080:8080
    environment:
      - CUDA_VISIBLE_DEVICES=0
      - NVIDIA_VISIBLE_DEVICES=0
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Mon Jan  1 14:07:33 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
...

docker install container toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

i use docker compose up -d, container create successful, but is not running, and no have any log.

i try use binary file running,

version：https://github.com/TabbyML/tabby/releases/tag/v0.7.0

./tabby_x86_64-manylinux2014 serve --model TabbyML/StarCoder-1B --device cuda

or

./tabby_x86_64-manylinux2014-cuda117 serve --model TabbyML/StarCoder-1B --device cuda

output

error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory

jbigler · 2024-01-06T11:40:19Z

i use docker compose up -d, container create successful, but is not running, and no have any log.

docker compose logs doesn't produce any output?
I see that you used sudo to run the standalone docker command to test the nvidia support, do you need to use sudo with docker compose as well? Have you configured docker to run without root privileges?

i try use binary file running,

version：https://github.com/TabbyML/tabby/releases/tag/v0.7.0

./tabby_x86_64-manylinux2014 serve --model TabbyML/StarCoder-1B --device cuda

or

./tabby_x86_64-manylinux2014-cuda117 serve --model TabbyML/StarCoder-1B --device cuda

output

error while loading shared libraries: libssl.so.10: cannot open shared object file: No such file or directory

This sounds like you are missing the openssl libraries on your local system, it shouldn't be related to the docker error.

Have you tried running it directly via docker without compose?
sudo docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda

NicoMandel · 2024-01-12T10:28:35Z

We are experiencing similar issues, where we are trying to run tabby on a local server to use for multiple people, where we have a few other docker instances running (p.ex. overleaf).
On a local machine with the following specs it works fine:

Ubuntu 20.04.3 LTS
Kernel 5.15.0-91-generic
1x Geforce RTX 1080 Ti
Cuda 11.5

But on the local server machine with the following specs, it fails silently and we can't see where it fails:

Ubuntu 20.04.6 LTS
Kernel 5.15.0-91-generic
2x Geforce RTX 2080 Ti
Cuda 12.3

Any hints?

wsxiaoys · 2024-01-12T12:32:02Z

Hi,could you share the log output? it'll be also helpful to provide CPU information from /proc/cpuinfo

NicoMandel · 2024-01-12T12:52:29Z

There are unfortunately no outputs from the logs - that's the major issue! It just repeats

(base) root@rk6:~/tabby# docker compose up
[+] Running 1/0
 ✔ Container tabby  Created                                                                                                            0.0s
Attaching to tabby
tabby  | 2024-01-12T12:50:01.192111Z  INFO tabby::serve: crates/tabby/src/serve.rs:111: Starting server, this might takes a few minutes...
tabby exited with code 0

because it is set to restart in the docker compose file.

NicoMandel · 2024-01-22T11:02:20Z

Hi
here the output of cpuinfo. I omitted the 38 processors inbetween, because that's just redundant info:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 0
siblings	: 20
core id		: 0
cpu cores	: 10
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4988.31
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

[...]

processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4991.28
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

faritor · 2024-01-28T07:59:37Z

docker compose logs doesn't produce any output? I see that you used sudo to run the standalone docker command to test the nvidia support, do you need to use sudo with docker compose as well? Have you configured docker to run without root privileges?

i try use sudo run it, but same

This sounds like you are missing the openssl libraries on your local system, it shouldn't be related to the docker error.

Have you tried running it directly via docker without compose? sudo docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda

i try it, but no output anything

jbigler · 2024-02-05T04:03:08Z

@faritor Does it work if you run it on the CPU instead of the GPU?

faritor · 2024-02-06T07:04:01Z

jbigler

still so @jbigler

wsxiaoys · 2024-02-06T07:05:45Z

Hi here the output of cpuinfo. I omitted the 38 processors inbetween, because that's just redundant info:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 0
siblings	: 20
core id		: 0
cpu cores	: 10
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4988.31
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

[...]

processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 62
model name	: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping	: 4
microcode	: 0x42e
cpu MHz		: 1200.000
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear flush_l1d
vmx flags	: vnmi preemption_timer posted_intr invvpid ept_x_only ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_unknown
bogomips	: 4991.28
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Sorry for missing your response. The reason for exit immediately is that your cpu doesn't come with avx2, which is required by our llama.cpp build atm.

This is tracked in #1142

MuYi086 added the bug-unconfirmed label Dec 18, 2023

wsxiaoys added bug Something isn't working and removed bug-unconfirmed labels Feb 6, 2024

wsxiaoys closed this as completed Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to create a docker container #1072

fail to create a docker container #1072

MuYi086 commented Dec 18, 2023

jbigler commented Dec 28, 2023 •

edited

Loading

faritor commented Jan 1, 2024 •

edited

Loading

jbigler commented Jan 6, 2024

NicoMandel commented Jan 12, 2024

wsxiaoys commented Jan 12, 2024

NicoMandel commented Jan 12, 2024

NicoMandel commented Jan 22, 2024

faritor commented Jan 28, 2024 •

edited

Loading

jbigler commented Feb 5, 2024

faritor commented Feb 6, 2024

wsxiaoys commented Feb 6, 2024 •

edited

Loading

fail to create a docker container #1072

fail to create a docker container #1072

Comments

MuYi086 commented Dec 18, 2023

jbigler commented Dec 28, 2023 • edited Loading

faritor commented Jan 1, 2024 • edited Loading

jbigler commented Jan 6, 2024

NicoMandel commented Jan 12, 2024

wsxiaoys commented Jan 12, 2024

NicoMandel commented Jan 12, 2024

NicoMandel commented Jan 22, 2024

faritor commented Jan 28, 2024 • edited Loading

jbigler commented Feb 5, 2024

faritor commented Feb 6, 2024

wsxiaoys commented Feb 6, 2024 • edited Loading

jbigler commented Dec 28, 2023 •

edited

Loading

faritor commented Jan 1, 2024 •

edited

Loading

faritor commented Jan 28, 2024 •

edited

Loading

wsxiaoys commented Feb 6, 2024 •

edited

Loading