missing kubernetes pods metric labels #347

Artiach · 2023-09-14T14:54:14Z

Hi! I am testing scaphandre deploying it in a kubernetes cluster with the --containers flag and the metrics provided are not showing any of the extra labels for kubernetes pods mentioned in the docs (kubernetes_node_name, kubernetes_pod_name...etc)

example of the metrics I am getting:

scaph_process_power_consumption_microwatts{cmdline="nginx: worker process",pid="2152",exe="nginx"} 0
scaph_process_power_consumption_microwatts{cmdline="nginx: worker process",pid="2151",exe="nginx"} 0
scaph_process_power_consumption_microwatts{pid="2150",exe="nginx",cmdline="nginx: worker process"} 0
scaph_process_power_consumption_microwatts{pid="2149",exe="nginx",cmdline="nginx: worker process"} 0
scaph_process_power_consumption_microwatts{exe="nginx",pid="2148",cmdline="nginx: worker process"} 0

I am running scaphandre inside a Kind multinode cluster deployed with the following configfile:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  extraPortMappings:
  - containerPort: 30000
    hostPort: 30000
    protocol: TCP
  - containerPort: 31000
    hostPort: 31000
    protocol: TCP
  - containerPort: 32000
    hostPort: 32000
    protocol: TCP
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
- role: worker
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock
- role: worker
  extraMounts:
    - hostPath: /var/run/docker.sock
      containerPath: /var/run/docker.sock

Note I am mounting the hosts docker.sock as a volume in order for scaphandre to have access to it.

Then I install scaphandre using the helm chart:

$ helm install scaphandre helm/scaphandre

Scaphandre logs:

Scaphandre prometheus exporter
Sending ⚡ metrics
scaphandre::exporters::prometheus: 2023-09-14T13:22:37: Starting Prometheus exporter
Press CTRL-C to stop scaphandre
scaphandre::exporters::prometheus: 2023-09-14T13:23:42: Refresh topology
scaphandre::sensors: Before refresh procs init.
scaphandre::exporters::prometheus: 2023-09-14T13:23:42: Refresh data
scaphandre::exporters: 2023-09-14T13:23:42: Get self metrics
scaphandre::exporters: 2023-09-14T13:23:42: Get host metrics
scaphandre::exporters: 2023-09-14T13:23:42: Get socket metrics
scaphandre::exporters: 2023-09-14T13:23:42: Get system metrics
scaphandre::exporters: 2023-09-14T13:23:42: Get process metrics
scaphandre::exporters: First check done on pods.
scaphandre::exporters::prometheus: 2023-09-14T13:23:53: Refresh topology
scaphandre::sensors: Before refresh procs init.
scaphandre::exporters::prometheus: 2023-09-14T13:23:53: Refresh data
scaphandre::exporters: 2023-09-14T13:23:53: Get self metrics
scaphandre::exporters: 2023-09-14T13:23:53: Get host metrics
scaphandre::exporters: 2023-09-14T13:23:53: Get socket metrics
scaphandre::exporters: 2023-09-14T13:23:53: Get system metrics
scaphandre::exporters: 2023-09-14T13:23:53: Get process metrics
scaphandre::exporters::prometheus: 2023-09-14T13:24:35: Refresh topology
scaphandre::sensors: Before refresh procs init.
scaphandre::exporters::prometheus: 2023-09-14T13:24:35: Refresh data
scaphandre::exporters: 2023-09-14T13:24:35: Get self metrics
scaphandre::exporters: 2023-09-14T13:24:35: Get host metrics
scaphandre::exporters: 2023-09-14T13:24:35: Get socket metrics
scaphandre::exporters: 2023-09-14T13:24:35: Get system metrics
scaphandre::exporters: 2023-09-14T13:24:35: Get process metrics
scaphandre::exporters: Just refreshed pod list ! last: 1694697822 now: 1694697875, diff: 53

I would like to know if this feature is working as of today and in such case what are the kubernetes deployment requirements for it to work, or the recommended environment, meaning:

type of deployment (using kubeadm, microk8s, k3s, kind...)
container runtime (containerd, cri-docker,...)
kubernetes version
host operating system version
anything else??

The text was updated successfully, but these errors were encountered:

Artiach · 2023-09-20T09:44:08Z

Hi,
Looking into scaphandre code I have seen that when looking for the container name, a regex is used to look into the cgroups. The regex is:
Regex::new(r"^/kubepods.*$").unwrap();

So this is looking for something starting with /kubepods while the cgroups of my container processes I have seen that look like this:
/kubelet.slice/kubelet-kubepods.slice/kubelet-kubepods-besteffort.slice/

So it is not able to get the container id and thus not able to resolve the pod name.

I think this naming difference of the cgroup hierarchy might be related with Kind. Any idea if this is the case? and if so, is Kind going to be supported in the future?

Also I have seen that cgroups v2 has been introduced in newer versions of linux distros, is that something that could potentially affect the proper functioning of scaphandre? if so, is it going to be supported in the future aswell?

Also I would like to ask again for your recommended kubernetes cluster requirements in order to test all of scaphandre features properly.

Kind regards and thanks for your awesome project

damienvergnaud · 2023-11-09T14:47:35Z

Hello,
I've faced the same issue with my context.
Seems to be related to this line as you stated it :
https://github.com/hubblo-org/scaphandre/blob/main/src/sensors/utils.rs#L422

Scaphandre seems to use /proc/{PID}/cgroup file to figure out per process extra informations to provide to prometheus exporter as labels.

Documentation :
https://man7.org/linux/man-pages/man7/cgroups.7.html
State :

   ... 
   /proc files
       /proc/cgroups (since Linux 2.6.24)
              ... 

       /proc/pid/cgroup (since Linux 2.6.24)
              This file describes control groups to which the process
              with the corresponding PID belongs.  The displayed
              information differs for cgroups version 1 and version 2
              hierarchies.

              For each cgroup hierarchy of which the process is a
              member, there is one entry containing three colon-
              separated fields:

                  hierarchy-ID:controller-list:cgroup-path

              For example:

                  5:cpuacct,cpu,cpuset:/daemons

              The colon-separated fields are, from left to right:

              [1]  For cgroups version 1 hierarchies, this field
                   contains a unique hierarchy ID number that can be
                   matched to a hierarchy ID in /proc/cgroups.  For the
                   cgroups version 2 hierarchy, this field contains the
                   value 0.

              [2]  For cgroups version 1 hierarchies, this field
                   contains a comma-separated list of the controllers
                   bound to the hierarchy.  For the cgroups version 2
                   hierarchy, this field is empty.

              [3]  This field contains the pathname of the control group
                   in the hierarchy to which the process belongs.  This
                   pathname is relative to the mount point of the
                   hierarchy.

In my case, the cgroup was something like that :
0::/kubepods/burstable/pod348e8c15-e2a8-41d4-ae41-64dd1b6248df/d8e314cffefd00e08ab729a482563d237b27a74f524aa6df936b5bc50a8fde50

For Scaphandre to be able to grab the ID from this content according to this line :
https://github.com/hubblo-org/scaphandre/blob/dev/src/sensors/utils.rs#L421

(Seems to only take the last value of a "/" split before going in next steps).
I think the ID is then used to request Namespace and others informations, but haven't pushed further.

For us, the issue was --containers not using /proc/{PID}/cgroups reliably enough to match in all good cases.

In short, we recompiled Scaphandre with this modification on the Regex :

"^/kubepods.*$"

"/kubepods.*$"

and it worked, because ignoring the leading 0:: where the actual version doesn't
but it's a dirty quick fix.

Maybe you could check your own /proc/{PID}/cgroups to check if you got your cgroup file content formatted the same.
Take a PID from ps -faux on a node of your cluster corresponding to a process executing inside a container of your Kubernetes cluster ?

Artiach added the bug Something isn't working label Sep 14, 2023

bpetit added this to Triage in General Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

missing kubernetes pods metric labels #347

missing kubernetes pods metric labels #347

Artiach commented Sep 14, 2023

Artiach commented Sep 20, 2023

damienvergnaud commented Nov 9, 2023

missing kubernetes pods metric labels #347

missing kubernetes pods metric labels #347

Comments

Artiach commented Sep 14, 2023

Artiach commented Sep 20, 2023

damienvergnaud commented Nov 9, 2023