Phi-3 not starting on TGI 2.0.3 in kubernetes cluster #1907

Cyb4Black · 2024-05-16T11:25:09Z

System Info

Trying to deploy with latest tgi release the container fails with:

{"timestamp":"2024-05-16T11:17:36.832915Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 258, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 222, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 420, in get_model\n return FlashLlama(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_llama.py\", line 84, in __init__\n model = FlashLlamaForCausalLM(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 368, in __init__\n self.model = FlashLlamaModel(prefix, config, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 292, in __init__\n [\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 293, in <listcomp>\n FlashLlamaLayer(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 232, in __init__\n self.self_attn = FlashLlamaAttention(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 108, in __init__\n self.query_key_value = load_attention(config, prefix, weights)\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/flash_llama_modeling.py\", line 43, in load_attention\n bias = config.attention_bias\n File \"/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py\", line 263, in __getattribute__\n return super().__getattribute__(key)\nAttributeError: 'Phi3Config' object has no attribute 'attention_bias'\n"},"target":"text_generation_launcher"}

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

deploy to cluster with:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: phi-3-mini-128k
spec:
  replicas: 1
  selector:
    matchLabels:
      app: phi-3-mini-128k
  template:
    metadata:
      labels:
        app: phi-3-mini-128k
    spec:
      containers:
      - name: textgen
        image: ghcr.io/huggingface/text-generation-inference:{{TEXT_GENERATION_VERSION}}
        args:
        - "--model-id"
        - "microsoft/Phi-3-mini-128k-instruct"
        - "--max-total-tokens"
        - "16000"
        - "--max-input-tokens"
        - "15500"
        - "--max-batch-prefill-tokens"
        - "15500"
        - "--rope-factor"
        - "32"
        - "--rope-scaling"
        - "dynamic"
        - "--json-output"
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: /data
          name: data-volume
        - mountPath: /dev/shm
          name: shm
        resources:
          limits:
            memory: "24Gi"
            cpu: "4"
            nvidia.com/gpu: "1"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: nvidia.com/gpu.product
                operator: In
                values:
                - NVIDIA-RTX-A4000
      volumes:
      - name: data-volume
        persistentVolumeClaim:
          claimName: hf-data-pvc
      - name: shm
        emptyDir:
          medium: Memory
          sizeLimit: 16Gi
---
apiVersion: v1
kind: Service
metadata:
  name: phi-3-mini-128k
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: phi-3-mini-128k

Expected behavior

It starts

The text was updated successfully, but these errors were encountered:

arunpatala · 2024-05-20T10:31:42Z

I am also facing similar issue. I am using a lora finetune merged model of microsoft/Phi-3-mini-128k-instruct and using docker to run locally.

manikawnth · 2024-05-23T18:19:03Z

@Cyb4Black Workaround here: #1930
Point it to a specific revision till it gets fixed upstream

Cyb4Black · 2024-06-02T21:01:07Z

Ah, ok, thanks.

Cyb4Black closed this as completed Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phi-3 not starting on TGI 2.0.3 in kubernetes cluster #1907

Phi-3 not starting on TGI 2.0.3 in kubernetes cluster #1907

Cyb4Black commented May 16, 2024

arunpatala commented May 20, 2024

manikawnth commented May 23, 2024

Cyb4Black commented Jun 2, 2024

Phi-3 not starting on TGI 2.0.3 in kubernetes cluster #1907

Phi-3 not starting on TGI 2.0.3 in kubernetes cluster #1907

Comments

Cyb4Black commented May 16, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

arunpatala commented May 20, 2024

manikawnth commented May 23, 2024

Cyb4Black commented Jun 2, 2024