Add KServe document. (#984)

Signed-off-by: Syulin7 <[email protected]>
kubeflow · Aug 29, 2023 · 14fa45c · 14fa45c
1 parent 2029700
commit 14fa45c
Show file tree

Hide file tree

Showing 4 changed files with 290 additions and 0 deletions.
diff --git a/docs/serving/index.md b/docs/serving/index.md
@@ -40,3 +40,8 @@ If you want to use arena to manage serving jobs, this guide is for you. we have
 
 * I want to [submit a nvidia triton serving job which use gpus](triton/serving.md).
 * I want to [update a nvidia triton serving job after deployed](triton/update-serving.md).
+
+## KServe Job Guide
+
+* I want to [submit a kserve job with supported serving runtime](kserve/sklearn.md)
+* I want to [submit a kserve job with custom serving runtime](kserve/custom.md)
diff --git a/docs/serving/kserve/custom.md b/docs/serving/kserve/custom.md
@@ -0,0 +1,142 @@
+# KServe job with custom serving runtime
+
+This guide walks through the steps to deploy and serve a custom serving runtime with kserve.
+
+1\. Setup
+
+Follow the [KServe Guide](https://kserve.github.io/website/master/admin/serverless/serverless/) to install Kserve.
+
+2\. Submit your serving job into kserve
+
+create a PVC 'training-data' before, and then download the 'bloom-560m' model from HuggingFace to the PVC.
+
+deploy an InferenceService with a predictor that will load a bloom model with text-generation-inference.
+
+ $ arena serve kserve \
+  --name=bloom-560m \
+  --image=ghcr.io/huggingface/text-generation-inference:1.0.2 \
+  --gpus=1 \
+  --cpu=12 \
+  --memory=50Gi \
+  --port=8080 \
+  --env=STORAGE_URI=pvc://training-data \
+  "text-generation-launcher --disable-custom-kernels --model-id /mnt/models/bloom-560m --num-shard 1 -p 8080"
+
+ inferenceservice.serving.kserve.io/bloom-560m created
+ INFO[0010] The Job bloom-560m has been submitted successfully
+ INFO[0010] You can run `arena serve get bloom-560m --type kserve -n default` to check the job status
+
+3\. Check the status of KServe job
+
+ $ arena serve list
+ NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS
+ bloom-560m KServe 00001 1 1 http://bloom-560m.default-group.example.com :80 1
+
+ $ arena serve get sklearn-iris
+ Name: bloom-560m
+ Namespace: default
+ Type: KServe
+ Version: 00001
+ Desired: 1
+ Available: 1
+ Age: 7m
+ Address: http://bloom-560m.default.example.com
+ Port: :80
+ GPU: 1
+
+ LatestRevision: bloom-560m-predictor-00001
+ LatestPrecent: 100
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS GPU NODE
+  ---- ------ --- ----- -------- --- ----
+  bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8 Running 7m 2/2 0 1 192.168.5.241
+
+4\. Perform inference
+
+you can curl with the ingress gateway external IP using the HOST Header.
+
+ $ curl -H "Host: bloom-560m.default.example.com" http://${INGRESS_HOST}:80/generate \
+ -X POST \
+ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17}}' \
+ -H 'Content-Type: application/json'
+
+ {"generated_text":" Deep Learning is a new type of machine learning that is used to solve complex problems."}
+
+5\. Update the InferenceService with the canary rollout strategy
+
+Add the canaryTrafficPercent field to the predictor component and update command to use a new/updated model path /mnt/models/bloom-560m-v2.
+
+ $ arena serve update kserve \
+ --name bloom-560m \
+ --canary-traffic-percent=10 \
+ "text-generation-launcher --disable-custom-kernels --model-id /mnt/models/bloom-560m-v2 --num-shard 1 -p 8036"
+
+After rolling out the canary model, traffic is split between the latest ready revision 2 and the previously rolled out revision 1.
+
+ $ arena serve get bloom-560m
+ Name: bloom-560m
+ Namespace: default
+ Type: KServe
+ Version: 00002
+ Desired: 2
+ Available: 2
+ Age: 26m
+ Address: http://bloom-560m.default.example.com
+ Port: :80
+
+ LatestRevision: bloom-560m-predictor-00002
+ LatestPrecent: 10
+ PrevRevision: bloom-560m-predictor-00001
+ PrevPrecent: 90
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS GPU NODE
+  ---- ------ --- ----- -------- --- ----
+  bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8 Running 19m 2/2 0 1 192.168.5.241
+  bloom-560m-predictor-00002-deployment-84dbb64cc4-647wx Running 2m 2/2 0 1 192.168.5.239
+
+6\. Promote the canary model
+
+If the canary model is healthy/passes your tests, you can set canary-traffic-percent to 100.
+
+ $ arena serve update kserve \
+ --name bloom-560m \
+ --canary-traffic-percent=100
+
+Now all traffic goes to the revision 2 for the new model. The pods for revision generation 1 automatically scales down to 0 as it is no longer getting the traffic.
+
+ $ arena serve get bloom-560m
+ Name: bloom-560m
+ Namespace: default
+ Type: KServe
+ Version: 00002
+ Desired: 2
+ Available: 2
+ Age: 26m
+ Address: http://bloom-560m.default.example.com
+ Port: :80
+
+ LatestRevision: bloom-560m-predictor-00002
+ LatestPrecent: 100
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS GPU NODE
+  ---- ------ --- ----- -------- --- ----
+  bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8 Terminating 22m 1/2 0 0 192.168.5.241
+  bloom-560m-predictor-00002-deployment-84dbb64cc4-647wx Running 5m 2/2 0 1 192.168.5.239
+
+7\. Delete the kserve job
+
+ $ arena serve delete sklearn-iris
+
+
+
+
+
+
+
+
+
+
+
diff --git a/docs/serving/kserve/sklearn.md b/docs/serving/kserve/sklearn.md
@@ -0,0 +1,140 @@
+# KServe job with supported serving runtime
+
+This guide walks through the steps to deploy and serve a supported serving runtime with kserve.
+
+1\. Setup
+
+Follow the [KServe Guide](https://kserve.github.io/website/master/admin/serverless/serverless/) to install Kserve.
+
+2\. Submit your serving job into kserve
+
+deploy an InferenceService with a predictor that will load a scikit-learn model.
+
+ $ arena serve kserve \
+  --name=sklearn-iris \
+  --model-format=sklearn \
+  --storage-uri=gs://kfserving-examples/models/sklearn/1.0/model
+
+ inferenceservice.serving.kserve.io/sklearn-iris created
+ INFO[0009] The Job sklearn-iris has been submitted successfully
+ INFO[0009] You can run `arena serve get sklearn-iris --type kserve -n default` to check the job status
+
+3\. Check the status of KServe job
+
+ $ arena serve list
+ NAME TYPE VERSION DESIRED AVAILABLE ADDRESS PORTS
+ sklearn-iris KServe 00001 1 1 http://sklearn-iris.default.example.com :80
+
+ $ arena serve get sklearn-iris
+ Name: sklearn-iris
+ Namespace: default
+ Type: KServe
+ Version: 00001
+ Desired: 1
+ Available: 1
+ Age: 3m
+ Address: http://sklearn-iris.default.example.com
+ Port: :80
+
+ LatestRevision: sklearn-iris-predictor-00001
+ LatestPrecent: 100
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS NODE
+  ---- ------ --- ----- -------- ----
+  sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84 Running 3m 2/2 0 192.168.5.239
+
+4\. Perform inference
+
+First, prepare your inference input request inside a file:
+
+ $ cat <<EOF > "./iris-input.json"
+ {
+  "instances": [
+  [6.8, 2.8, 4.8, 1.4],
+  [6.0, 3.4, 4.5, 1.6]
+  ]
+ }
+ EOF
+
+you can curl with the ingress gateway external IP using the HOST Header.
+
+ $ curl -H "Host: sklearn-iris.default.example.com" http://${INGRESS_HOST}:80/v1/models/sklearn-iris:predict -d @./iris-input.json
+
+5\. Update the InferenceService with the canary rollout strategy
+
+Add the canaryTrafficPercent field to the predictor component and update the storageUri to use a new/updated model.
+
+ $ arena serve update kserve \
+ --name sklearn-iris \
+ --canary-traffic-percent=10 \
+ --storage-uri=gs://kfserving-examples/models/sklearn/1.0/model-2
+
+After rolling out the canary model, traffic is split between the latest ready revision 2 and the previously rolled out revision 1.
+
+ $ arena serve get sklearn-iris
+ Name: sklearn-iris
+ Namespace: default
+ Type: KServe
+ Version: 00002
+ Desired: 2
+ Available: 2
+ Age: 26m
+ Address: http://sklearn-iris.default.example.com
+ Port: :80
+
+ LatestRevision: sklearn-iris-predictor-00002
+ LatestPrecent: 10
+ PrevRevision: sklearn-iris-predictor-00001
+ PrevPrecent: 90
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS NODE
+  ---- ------ --- ----- -------- ----
+  sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84 Running 25m 2/2 0 192.168.5.239
+  sklearn-iris-predictor-00002-deployment-7f677b9fd6-2dtpg Running 3m 2/2 0 192.168.5.241
+
+6\. Promote the canary model
+
+If the canary model is healthy/passes your tests, you can set canary-traffic-percent to 100.
+
+ $ arena serve update kserve \
+ --name sklearn-iris \
+ --canary-traffic-percent=100
+
+Now all traffic goes to the revision 2 for the new model. The pods for revision generation 1 automatically scales down to 0 as it is no longer getting the traffic.
+
+ $ arena serve get sklearn-iris
+ Name: sklearn-iris
+ Namespace: default
+ Type: KServe
+ Version: 00002
+ Desired: 1
+ Available: 1
+ Age: 32m
+ Address: http://sklearn-iris.default.example.com
+ Port: :80
+
+ LatestRevision: sklearn-iris-predictor-00002
+ LatestPrecent: 100
+
+ Instances:
+  NAME STATUS AGE READY RESTARTS NODE
+  ---- ------ --- ----- -------- ----
+  sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84 Terminating 31m 1/2 0 192.168.5.239
+  sklearn-iris-predictor-00002-deployment-7f677b9fd6-2dtpg Running 9m 2/2 0 192.168.5.241
+
+7\. Delete the kserve job
+
+ $ arena serve delete sklearn-iris
+
+
+
+
+
+
+
+
+
+
+
diff --git a/pkg/serving/delete.go b/pkg/serving/delete.go
@@ -19,6 +19,9 @@ func DeleteServingJob(namespace, name, version string, jobType types.ServingJobT
  return err
  }
  nameWithVersion := fmt.Sprintf("%v-%v", job.Name(), job.Version())
+ if job.Type() == types.KServeJob {
+ nameWithVersion = job.Name()
+ }
  servingType := string(job.Type())
  err = workflow.DeleteJob(nameWithVersion, namespace, servingType)
  if err != nil {