Scraping and storing metrics for the run

There are cases where the state of the cluster and metrics on the cluster during the chaos test run need to be stored long term to review after the cluster is terminated, for example CI and automation test runs. To help with this, Kraken supports capturing metrics for the duration of the scenarios defined in the config and indexes them into Elasticsearch. The indexed metrics can be visualized with the help of Grafana.

It uses Kube-burner under the hood. The metrics to capture need to be defined in a metrics profile which Kraken consumes to query prometheus ( installed by default in OpenShift ) with the start and end timestamp of the run. Each run has a unique identifier ( uuid ) and all the metrics/documents in Elasticsearch will be associated with it. The uuid is generated automatially if not set in the config. This feature can be enabled in the config by setting the following:

performance_monitoring:
    kube_burner_binary_url: "https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.9.1/kube-burner-0.9.1-Linux-x86_64.tar.gz"
    capture_metrics: True
    config_path: config/kube_burner.yaml                  # Define the Elasticsearch url and index name in this config
    metrics_profile_path: config/metrics-aggregated.yaml
    prometheus_url:                                       # The prometheus url/route is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes.
    prometheus_bearer_token:                              # The bearer token is automatically obtained in case of OpenShift, please set it when the distribution is Kubernetes. This is needed to authenticate with prometheus.
    uuid:                                                 # uuid for the run is generated by default if not set

Metrics profile

A couple of metric profiles ( metrics.yaml and metrics-aggregated.yaml are shipped by default and they can be tweaked to add more metrics to capture during the run. Following are the API server metrics for example:

metrics:
# API server
  - query: histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{apiserver="kube-apiserver", verb!~"WATCH", subresource!="log"}[2m])) by (verb,resource,subresource,instance,le)) > 0
    metricName: API99thLatency

  - query: sum(irate(apiserver_request_total{apiserver="kube-apiserver",verb!="WATCH",subresource!="log"}[2m])) by (verb,instance,resource,code) > 0
    metricName: APIRequestRate

  - query: sum(apiserver_current_inflight_requests{}) by (request_kind) > 0
    metricName: APIInflightRequests

Indexing

Define the Elasticsearch and index to store the metrics/documents in the kube_burner config:

global:
  writeToFile: true
  metricsDirectory: collected-metrics
  measurements:
    - name: podLatency
      esIndex: kube-burner

  indexerConfig:
    enabled: true
    esServers: [https://elastic.example.com:9200]
    insecureSkipVerify: true
    defaultIndex: kraken
    type: elastic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics.md

metrics.md

Scraping and storing metrics for the run

Metrics profile

Indexing

Files

metrics.md

Latest commit

History

metrics.md

File metadata and controls

Scraping and storing metrics for the run

Metrics profile

Indexing