Dashboard with external prometheus #14169
Replies: 2 comments 4 replies
-
Regarding the RGW instances in the dashboard, I think @rkachach has done some investigation into that. Regarding the monitoring labels, you can provide additional labels on the mgr using this mechanism: https://rook.io/docs/rook/latest-release/CRDs/Cluster/ceph-cluster-crd/#annotations-and-labels |
Beta Was this translation helpful? Give feedback.
-
I just tested on rook-ceph 1.14.5 and the disk utilization issue on ceph dashboard is still present. You may easily see that by displaying the ceph-dashboard/host overview dashboard from grafana : when the bug is present, the AVG disk utilization panel shows N/A on rook and the percentage of disk utilization on a bare metal ceph cluster managed by cephadm. When deploying a bare metal ceph cluster using cephadm, the prometheus instance is configured with :
When looking at some prometheus values like ceph_disk_occupation, the instance is the host name of the device :
Under a rook-ceph cluster managed cluster, the prometheus instance is configured by the servicemonitors resources :
Theses resources are dynamically created by rook-ceph-operator :
and the effect is that the prometheus scape config generated do not have the "honorLabels: true" setting When looking at the prometheus values ceph_disk_occupation, the instance is replaced by a pod address and the original instance (hostname) is put in the field exported_instance :
This is the normal prometheus behavior when honorlabels is false (and is the default value for this field). A simple solution to this is to dynamically modify the servicemonitors resources in order to add "honorlabels: true" to the endpoint :
Probably the rook-ceph-operator code must be modified in order to add this when generating the servicemonitors resources. Alain RICHARD |
Beta Was this translation helpful? Give feedback.
-
Hi,
I am deploying rook-ceph-cluster using an external kube-prometheus-stack.
The relevant values I use to deploy the operator are :
And the cluster relevant values :
I am currently experiencing two well identified problems with dashboard performance graph :
Some graphs are showing empty are N/A indication for some disk io statistics.
For example one Cluster/Hosts/Overall Performance, the AVG Disk Utilization frame show N/A.
Stats about latencies on Cluster/OSDs/Overall Performance are also empty.
Looking on the prometheus/grafana side, it appears this is because prometheus have replaced the instance label with a host:port indication, copying the original instance label in an exported_instance label.
This is because the relevant serviceMonitor/rook-ceph/rook-ceph-mgr generated by rook operator do not have the "honor_labels: true" indication.
Editing manually the serviceMonitor/rook-ceph/rook-ceph-mgr resource solves the problem, but this is not a long term solution as this resource is generated internaly by the operator at the next operator or cluster update.
When I deploy the object store with two instances, the two instances are using the same name and id :
and most of the object gateway dashboard shows no data with an error "found duplicate series for the match group".
Looking at a cephadm (non-rook) deployed cluter, the instances get different ids :
The only solution so far is to have a single gw instance.
Beta Was this translation helpful? Give feedback.
All reactions