Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odigos instrumentation does not work on RedHat Openshift #1128

Open
esara opened this issue Apr 18, 2024 · 1 comment
Open

Odigos instrumentation does not work on RedHat Openshift #1128

esara opened this issue Apr 18, 2024 · 1 comment

Comments

@esara
Copy link
Contributor

esara commented Apr 18, 2024

Describe the bug
We are trying to instrument applications running RedHat Openshift (OCP 4.15 is the latest as of now). There are couple of problems:

a) the odiglet and odigos-data-collections requires privileged SCC

$ kubectl get events -n odigos-system
LAST SEEN   TYPE      REASON              OBJECT                                      MESSAGE
3m11s       Warning   FailedCreate        daemonset/odiglet                           Error creating: pods "odiglet-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].capabilities.add: Invalid value: "SYS_PTRACE": capability may not be added, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].hostPID: Invalid value: true: Host PID is not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

b) other odigos services require at least the anyuid SCC

$ kubectl get events -n odigos-system
41s         Warning   FailedCreate        replicaset/odigos-gateway-54d466ddc9        Error creating: pods "odigos-gateway-54d466ddc9-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .containers[0].runAsUser: Invalid value: 10000: must be in the ranges: [1000710000, 1000719999], provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

c) the odiglet clusterrole is missing kubernetes permissions

{"level":"error","ts":1713183591.5206065,"caller":"runtime_details/shared.go:121","msg":"Failed to update runtime info","name":"custom-app","kind":"Deployment","namespace":"default","error":"instrumentedapplications.odigos.io \"custom-app\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , <nil>","stacktrace":"github.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details.persistRuntimeResults\n\t/go/src/github.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details/shared.go:121\ngithub.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details.inspectRuntimesOfRunningPods\n\t/go/src/github.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details/shared.go:43\ngithub.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details.(*DeploymentsReconciler).Reconcile\n\t/go/src/github.com/keyval-dev/odigos/odiglet/pkg/kube/runtime_details/deployments.go:37\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

d) the virtual device is not readable

$ kubectl logs -f pod-85b5976d68
Starting JVM...
Executing startup command: "java -javaagent:/var/odigos/java/javaagent.jar -Dotel.traces.sampler=always_on -Dotel.exporter.otlp.endpoint=http://192.168.1.30:4317 -jar action-orchestrator-8.12.0.jar"
Error opening zip file or JAR manifest missing : /var/odigos/java/javaagent.jar
JVMJ9TI064E Agent initialization function Agent_OnLoad failed for library instrument, return code -1
JVMJ9VM015W Initialization error for library j9jvmti29(-3): JVMJ9VM009E J9VMDllMain failed

and if you exec in

$ mount|grep sda4
/dev/sda4 on /var/odigos/java type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
$ ls -al /var/odigos/java/javaagent.jar
ls: cannot access '/var/odigos/java/javaagent.jar': Permission denied
$ cat /var/odigos/java/javaagent.jar
cat: /var/odigos/java/javaagent.jar: Permission denied

you can also see this in the selinux audit.log

[root@master core]# grep denied /var/log/audit/audit.log
type=AVC msg=audit(1713583026.394:24529): avc:  denied  { read } for  pid=868736 comm="java" name="javaagent.jar" dev="sda4" ino=421528988 scontext=system_u:system_r:container_t:s0:c25,c26 tcontext=system_u:object_r:var_t:s0 tclass=file permissive=0

Expected behavior
expected for the instrumentedapplications cr to be created, instrumentation to be injected and readable

@esara
Copy link
Contributor Author

esara commented Apr 18, 2024

The solution for the first two problems is to use a more permissive SCC, respectively for each odigos component
The gateway, odiglet and data-collector components need to run with 'anyuid' and the odiglet and odigos-data-collecttion with 'privileged' scc

oc adm policy add-scc-to-group anyuid system:serviceaccounts:odigos-system
oc adm policy add-scc-to-user privileged -z odiglet -n odigos-system
oc adm policy add-scc-to-user privileged -z odigos-data-collection -n odigos-system

The solution to the third problem is use the generated clusterrole for the odiglet (specifically to allow the modification of the finalizer of the controllers):
https://github.com/keyval-dev/odigos-charts/pull/39/files

And the lastly, the fourth problem is caused by the enforced selinux policy that is default on the ocp nodes.

[core@master ~]$ sestatus 
Current mode:                   enforcing
Mode from config file:          enforcing

which results in

$ cat /var/odigos/java/javaagent.jar
cat: /var/odigos/java/javaagent.jar: Permission denied

but if you

[root@master core]# semanage fcontext -a -t container_ro_file_t '/var/odigos(/.*)?'
[root@master core]# restorecon -r /var/odigos

then the hostmounted files inside the container became readable and the applications start.

But this one is not a good enough solution, let me research how to do this safely in OCP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant