diff --git a/docs/serving/index.md b/docs/serving/index.md
index d269f4f00..07b9ed83d 100644
--- a/docs/serving/index.md
+++ b/docs/serving/index.md
@@ -40,3 +40,8 @@ If you want to use arena to manage serving jobs, this guide is for you. we have
 
 * I want to [submit a nvidia triton serving job which use gpus](triton/serving.md).
 * I want to [update a nvidia triton serving job after deployed](triton/update-serving.md).
+
+## KServe Job Guide
+
+* I want to [submit a kserve job with supported serving runtime](kserve/sklearn.md)
+* I want to [submit a kserve job with custom serving runtime](kserve/custom.md)
diff --git a/docs/serving/kserve/custom.md b/docs/serving/kserve/custom.md
new file mode 100644
index 000000000..7bfdd0a61
--- /dev/null
+++ b/docs/serving/kserve/custom.md
@@ -0,0 +1,142 @@
+# KServe job with custom serving runtime
+
+This guide walks through the steps to deploy and serve a custom serving runtime with kserve.
+
+1\. Setup
+
+Follow the [KServe Guide](https://kserve.github.io/website/master/admin/serverless/serverless/) to install Kserve.
+
+2\. Submit your serving job into kserve
+
+create a PVC 'training-data' before, and then download the 'bloom-560m' model from HuggingFace to the PVC.
+
+deploy an InferenceService with a predictor that will load a bloom model with text-generation-inference.
+
+    $ arena serve kserve \
+        --name=bloom-560m \
+        --image=ghcr.io/huggingface/text-generation-inference:1.0.2 \
+        --gpus=1 \
+        --cpu=12 \
+        --memory=50Gi \
+        --port=8080 \
+        --env=STORAGE_URI=pvc://training-data \
+        "text-generation-launcher --disable-custom-kernels --model-id /mnt/models/bloom-560m --num-shard 1 -p 8080"
+
+    inferenceservice.serving.kserve.io/bloom-560m created
+    INFO[0010] The Job bloom-560m has been submitted successfully
+    INFO[0010] You can run `arena serve get bloom-560m --type kserve -n default` to check the job status
+
+3\. Check the status of KServe job
+
+    $ arena serve list
+    NAME                 TYPE    VERSION  DESIRED  AVAILABLE  ADDRESS                                  PORTS
+    bloom-560m   KServe  00001    1        1          http://bloom-560m.default-group.example.com  :80    1
+
+    $ arena serve get sklearn-iris
+    Name:       bloom-560m
+    Namespace:  default
+    Type:       KServe
+    Version:    00001
+    Desired:    1
+    Available:  1
+    Age:        7m
+    Address:    http://bloom-560m.default.example.com
+    Port:       :80
+    GPU:        1
+    
+    LatestRevision:     bloom-560m-predictor-00001
+    LatestPrecent:      100
+    
+    Instances:
+      NAME                                                    STATUS   AGE  READY  RESTARTS  GPU  NODE
+      ----                                                    ------   ---  -----  --------  ---  ----
+      bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8  Running  7m   2/2    0         1    192.168.5.241
+
+4\. Perform inference
+
+you can curl with the ingress gateway external IP using the HOST Header.
+
+    $ curl -H "Host: bloom-560m.default.example.com" http://${INGRESS_HOST}:80/generate \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17}}' \
+    -H 'Content-Type: application/json'
+
+    {"generated_text":" Deep Learning is a new type of machine learning that is used to solve complex problems."}
+
+5\. Update the InferenceService with the canary rollout strategy
+
+Add the canaryTrafficPercent field to the predictor component and update command to use a new/updated model path /mnt/models/bloom-560m-v2.
+
+    $ arena serve update kserve \
+    --name bloom-560m \
+    --canary-traffic-percent=10 \
+    "text-generation-launcher --disable-custom-kernels --model-id /mnt/models/bloom-560m-v2 --num-shard 1 -p 8036"
+
+After rolling out the canary model, traffic is split between the latest ready revision 2 and the previously rolled out revision 1.
+
+    $ arena serve get bloom-560m
+    Name:       bloom-560m
+    Namespace:  default
+    Type:       KServe
+    Version:    00002
+    Desired:    2
+    Available:  2
+    Age:        26m
+    Address:    http://bloom-560m.default.example.com
+    Port:       :80
+    
+    LatestRevision:     bloom-560m-predictor-00002
+    LatestPrecent:      10
+    PrevRevision:       bloom-560m-predictor-00001
+    PrevPrecent:        90
+    
+    Instances:
+      NAME                                                    STATUS   AGE   READY  RESTARTS  GPU  NODE
+      ----                                                    ------   ---   -----  --------  ---  ----
+      bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8  Running  19m   2/2    0         1    192.168.5.241
+      bloom-560m-predictor-00002-deployment-84dbb64cc4-647wx  Running  2m    2/2    0         1    192.168.5.239
+
+6\. Promote the canary model
+
+If the canary model is healthy/passes your tests, you can set canary-traffic-percent to 100.
+
+    $ arena serve update kserve \
+    --name bloom-560m \
+    --canary-traffic-percent=100
+
+Now all traffic goes to the revision 2 for the new model. The pods for revision generation 1 automatically scales down to 0 as it is no longer getting the traffic.
+
+    $ arena serve get bloom-560m
+    Name:       bloom-560m
+    Namespace:  default
+    Type:       KServe
+    Version:    00002
+    Desired:    2
+    Available:  2
+    Age:        26m
+    Address:    http://bloom-560m.default.example.com
+    Port:       :80
+
+    LatestRevision:     bloom-560m-predictor-00002
+    LatestPrecent:      100
+    
+    Instances:
+      NAME                                                    STATUS       AGE  READY  RESTARTS  GPU  NODE
+      ----                                                    ------       ---  -----  --------  ---  ----
+      bloom-560m-predictor-00001-deployment-56b8bdbf87-sg8v8  Terminating  22m  1/2    0         0    192.168.5.241
+      bloom-560m-predictor-00002-deployment-84dbb64cc4-647wx  Running      5m   2/2    0         1    192.168.5.239
+
+7\. Delete the kserve job
+
+    $ arena serve delete sklearn-iris
+
+    
+
+
+
+
+
+
+
+
+
diff --git a/docs/serving/kserve/sklearn.md b/docs/serving/kserve/sklearn.md
new file mode 100644
index 000000000..450b48ce9
--- /dev/null
+++ b/docs/serving/kserve/sklearn.md
@@ -0,0 +1,140 @@
+# KServe job with supported serving runtime
+
+This guide walks through the steps to deploy and serve a supported serving runtime with kserve.
+
+1\. Setup
+
+Follow the [KServe Guide](https://kserve.github.io/website/master/admin/serverless/serverless/) to install Kserve.
+
+2\. Submit your serving job into kserve
+
+deploy an InferenceService with a predictor that will load a scikit-learn model.
+
+    $ arena serve kserve \
+        --name=sklearn-iris \
+        --model-format=sklearn \
+        --storage-uri=gs://kfserving-examples/models/sklearn/1.0/model
+
+    inferenceservice.serving.kserve.io/sklearn-iris created
+    INFO[0009] The Job sklearn-iris has been submitted successfully
+    INFO[0009] You can run `arena serve get sklearn-iris --type kserve -n default` to check the job status
+
+3\. Check the status of KServe job
+
+    $ arena serve list
+    NAME                 TYPE    VERSION  DESIRED  AVAILABLE  ADDRESS                                  PORTS
+    sklearn-iris         KServe  00001    1        1          http://sklearn-iris.default.example.com  :80
+
+    $ arena serve get sklearn-iris
+    Name:       sklearn-iris
+    Namespace:  default
+    Type:       KServe
+    Version:    00001
+    Desired:    1
+    Available:  1
+    Age:        3m
+    Address:    http://sklearn-iris.default.example.com
+    Port:       :80
+    
+    LatestRevision:     sklearn-iris-predictor-00001
+    LatestPrecent:      100
+    
+    Instances:
+      NAME                                                      STATUS   AGE  READY  RESTARTS  NODE
+      ----                                                      ------   ---  -----  --------  ----
+      sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84  Running  3m   2/2    0         192.168.5.239
+
+4\. Perform inference
+
+First, prepare your inference input request inside a file:
+
+    $ cat <<EOF > "./iris-input.json"
+    {
+      "instances": [
+        [6.8,  2.8,  4.8,  1.4],
+        [6.0,  3.4,  4.5,  1.6]
+      ]
+    }
+    EOF
+
+you can curl with the ingress gateway external IP using the HOST Header.
+
+    $ curl  -H "Host: sklearn-iris.default.example.com" http://${INGRESS_HOST}:80/v1/models/sklearn-iris:predict -d @./iris-input.json
+
+5\. Update the InferenceService with the canary rollout strategy
+
+Add the canaryTrafficPercent field to the predictor component and update the storageUri to use a new/updated model.
+
+    $ arena serve update kserve \
+    --name sklearn-iris \
+    --canary-traffic-percent=10 \
+    --storage-uri=gs://kfserving-examples/models/sklearn/1.0/model-2
+
+After rolling out the canary model, traffic is split between the latest ready revision 2 and the previously rolled out revision 1.
+
+    $ arena serve get sklearn-iris
+    Name:       sklearn-iris
+    Namespace:  default
+    Type:       KServe
+    Version:    00002
+    Desired:    2
+    Available:  2
+    Age:        26m
+    Address:    http://sklearn-iris.default.example.com
+    Port:       :80
+    
+    LatestRevision:     sklearn-iris-predictor-00002
+    LatestPrecent:      10
+    PrevRevision:       sklearn-iris-predictor-00001
+    PrevPrecent:        90
+    
+    Instances:
+      NAME                                                      STATUS   AGE  READY  RESTARTS  NODE
+      ----                                                      ------   ---  -----  --------  ----
+      sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84  Running  25m  2/2    0         192.168.5.239
+      sklearn-iris-predictor-00002-deployment-7f677b9fd6-2dtpg  Running  3m   2/2    0         192.168.5.241
+
+6\. Promote the canary model
+
+If the canary model is healthy/passes your tests, you can set canary-traffic-percent to 100.
+
+    $ arena serve update kserve \
+    --name sklearn-iris \
+    --canary-traffic-percent=100
+
+Now all traffic goes to the revision 2 for the new model. The pods for revision generation 1 automatically scales down to 0 as it is no longer getting the traffic.
+
+    $ arena serve get sklearn-iris
+    Name:       sklearn-iris
+    Namespace:  default
+    Type:       KServe
+    Version:    00002
+    Desired:    1
+    Available:  1
+    Age:        32m
+    Address:    http://sklearn-iris.default.example.com
+    Port:       :80
+    
+    LatestRevision:     sklearn-iris-predictor-00002
+    LatestPrecent:      100
+    
+    Instances:
+      NAME                                                      STATUS       AGE  READY  RESTARTS  NODE
+      ----                                                      ------       ---  -----  --------  ----
+      sklearn-iris-predictor-00001-deployment-7b4677c6b7-8cr84  Terminating  31m  1/2    0         192.168.5.239
+      sklearn-iris-predictor-00002-deployment-7f677b9fd6-2dtpg  Running      9m   2/2    0         192.168.5.241
+
+7\. Delete the kserve job
+
+    $ arena serve delete sklearn-iris
+
+    
+
+
+
+
+
+
+
+
+
diff --git a/pkg/serving/delete.go b/pkg/serving/delete.go
index 658600eb8..c0a7ef47a 100644
--- a/pkg/serving/delete.go
+++ b/pkg/serving/delete.go
@@ -19,6 +19,9 @@ func DeleteServingJob(namespace, name, version string, jobType types.ServingJobT
 		return err
 	}
 	nameWithVersion := fmt.Sprintf("%v-%v", job.Name(), job.Version())
+	if job.Type() == types.KServeJob {
+		nameWithVersion = job.Name()
+	}
 	servingType := string(job.Type())
 	err = workflow.DeleteJob(nameWithVersion, namespace, servingType)
 	if err != nil {