Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run Milvus in our Openshift cluster with (runsAsUser, runAsGroup, fsGroup) - How to remove them?? #43

Open
tmechineni12 opened this issue Dec 20, 2023 · 11 comments
Assignees

Comments

@tmechineni12
Copy link

Hi Team,

We tried to use your helm templates (https://github.com/zilliztech/milvus-helm/tree/master/charts/milvus) and tried to deploy milvus on our openshift cluster, Our openshift team or Kubernetes cluster admins won't let us specify any security context i.e(runsAsUser, runAsGroup, fsGroup, ) for pods/deployments/replicasets/statefulsets, So we should not be specifying the below.
# runAsUser: 1000
# runAsGroup: 1000
# fsGroup: 1000

So I had to comment them and tried to milvus install. But it does not work

image

None of my pods start and I see the following errors.

image

image

Please assist, how to proceed further.

Thanks!
Tharun M

@wyfeng001
Copy link
Contributor

I thought you used MinIO as the local storage?
Milvus doesn't set this securityContext. It should be those third party Charts such as MinIO, ETCD, Pursar, etc. that sets these up, they use this to avoid root execution to make it more secure.
I found the settings in minio/values.yaml

## Add stateful containers to have security context, if enabled MinIO will run as this
## user and group NOTE: securityContext is only enabled if persistence.enabled=true
securityContext:
  enabled: true
  runAsUser: 1000
  runAsGroup: 1000
  fsGroup: 1000

You could use the openshift cloud storage to store the data. Disable the MinIO and set the external bucket in your values. Can have a try

minio:
    enabled: false

externalS3:
    enabled: true
    host: xxxxx
    port: 443
    rootPath: xxxxx
    bucketName: xxxx
    cloudProvider: xxx
    useSSL: true
    accessKey: "xxx"
    secretKey: "xxx"

Or you can find the related settings of minIO.persistence in values.yaml. Disable it and have a try.

find more info about MinIO settings in the minio chart (https://github.com/zilliztech/milvus-helm/blob/master/charts/milvus/charts/minio-8.0.17.tgz)

@tmechineni12
Copy link
Author

Hi, Thanks for your response.

I can try that method for third party components, but before that, How do I get milvus components working ?

In these deployments for each one of those components (datacoord, datanode,indexcoord,indexnode,proxy,querycoord,querynode,rootcoord) looks like you are using 2 images.

  1. For Init container - milvusdb/milvus-config-tool:v0.1.2
  2. For Actual /main container - milvusdb/milvus:v2.3.3

In our case, none of these 8 Milvus specific deployments/pods are starting and looks like each one of them have exact same error. 2023/12/21 14:55:03 write failed: open /milvus/configs/milvus.yaml: permission denied . See screenshots.

### Overall:

image

Data Coord
image

Index Coord
image

Indexnode
image

Datanode
image

Proxy
image

QueryCoord
image

Querynode
image

rootCoord
image

Upon further investigation I see that the config-tool image has uid/gid baked in 65532:65532 ?

image

Can you help me what I can further do here to get these pods into Running state ??
Please note our openshift cluster won't let us run containers as any specific User or wont like us to have some user/group 65532:65532 baked into images.

Also, Can you eloborate if its hardcoded in config-tool image or Milvus images ?? ? Why am I getting permission denied on each of those pods ??

image

Thanks for your support.
Tharun

@wyfeng001
Copy link
Contributor

Could you write down your steps? For example, which files you modified, and which methods you used to deploy them, so that I can better understand your changes.

@avishka40
Copy link

I have also faced the same issue with OpenShift cluster, i managed to get some of these issues sorted out by adding a service account with extra SCC for running as root, however these 3 rd party providers as well they require to be run as root as well.
Is there a way we could get around them as well? I was trying to override some of these helm chart manually but having some trouble.

@guimou
Copy link

guimou commented Feb 22, 2024

There are multiple things to adjust for Milvus to deploy properly on OpenShift:

  1. Milvus container image needs some modification to make the folder /milvus writable by gid 0. I just made a PR for that. If you want to test, a container image implementing this change is available here.
  2. Security Contexts have to be fixed for Pulsar and a few other deployments or statefulsets.
  3. Pulsar must be updated to version 2.10.5. The current version, 2.8.2, cannot run as non-root.
  4. The port used used by Pulsar for Prometheus, 80, cannot be bound on OpenShift, it must be changed to a higher one, like 8080.

I generated a manifest for a full cluster deployment, did the modifications, and generated a diff file (attached, renamed to txt to allow upload) to reflect all those changes.
From there, if it's ok with you @wyfeng001 , or whoever is in charge, we could look into incorporating all those changes in the chart, or create specific one for OpenShift.
milvus_manifest.diff.txt

@haorenfsa
Copy link
Collaborator

Hi @guimou! Thank you very much for providing a solution!

I just check the manifest.diff. The default milvus image will be updated after your PR for milvus is merged and released.

It's unlikely that we change the default pulsar version within a short time, because v2.8.2 is quite stable and is sufficient for current milvus version, and 2 major version upgrade maybe too agressive. However we can maintain a pulsar image repo for milvus to solve this, like we did for the etcd image: https://github.com/milvus-io/bitnami-docker-etcd/.

@haorenfsa
Copy link
Collaborator

/assign
/assign @LoveEachDay

@guimou
Copy link

guimou commented Feb 23, 2024

@haorenfsa Yeah, I did not like updating Pulsar version, but to the extent of the limited tests I did it worked! 😄
The changes were introduced only with version 2.10.0, with this PR.
Would your suggestion be to fork the pulsar repo from 2.8.2 and apply those changes (if feasible without impact)? Then rebuild and use this patched version of Pulsar for OpenShift deployments?

@haorenfsa
Copy link
Collaborator

@guimou Yes, in fact it's not the version of pulsar we need to change, but the Dockerfile that the pulsar image is built with.

However, I just found that the Milvus community is considering upgrading pulsar to 3.0 due to this issue apache/pulsar#14779, which may influence the stability of Milvus significantly.

I'll keep updating the future plan of Milvus community here, based on which we'll then decide whether we need to fork & patch pulsar 2.8.2 to solve this in the short term.

@guimou
Copy link

guimou commented Mar 1, 2024

Meanwhile, for people still struggling, a full recipe to deploy Milvus (Standalone or Cluster-mode) on OpenShift is available here (as well as example notebooks for ingestion/query, RAG chatbot recipes,... in the same repo).

@haorenfsa
Copy link
Collaborator

However, I just found that the Milvus community is considering upgrading pulsar to 3.0 due to this issue apache/pulsar#14779, which may influence the stability of Milvus significantly.

looks like this's not gonna happen within a few month. Let's considering polishing current pulsar v2.8.2 docker image instead.

Things to be done from my rough estimation:

  • maintaining a Dockerfile to polishing pulsar image & push to milvusdb/pulsar
  • update milvus-helm's default image to milvusdb/pulsar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants