Skip to content

xogoodnow/Monitoring_Stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Production Ready Monitoring Stack

Monitoring stack for production use

Components Used

Name:Version Documentation Purpose Alternatives Advantages
Terraform 1.5.4 Docs Hardware Provisioner
Initial Setup
Salt Ansible 1. Easy syntax
2. Sufficient community and documentation
3. Much better suited for hardware provisioning
Hetzner Provider 1.42.1 Docs Deploying servers Vultr DigitalOcean 1. Cheaper :)
2. Good community overlooking provider
Ansible 2.15.2 Docs Automating Tasks Salt 1. No footprint on target hosts
Ubuntu 22.04 Docs Operating system Debian Centos 1. Bigger community
2. Faster releases than debian
3. Bigger community than any other OS
4. Not cash grapping like centos (Yet :))
Victoriametrics latest Docs Time-series Database InfluxDB Prometheus 1. High performance
2. Cost-effective
3. Scalable
4. Handles massive volumes of data
5. Good community and documentation
vmalert latest Docs Evaluating Alerting Rules Prometheus Alertmanager 1. Works well with VictoriaMetrics
2. Supports different datasource types
vmagent latest Docs Collecting Time-series Data Prometheus 1. Works well with VictoriaMetrics
2. Supports different data source types
Alertmanager latest Docs Handling Alerts ElastAlert Grafana Alerts 1. Handles alerts from multiple client applications
2. Deduplicates, groups, and routes alerts
3. Can be plugged to multiple endpoints (Slack, Email, Telegram, Squadcast, ...)
Grafana latest Docs Monitoring and Observability Prometheus Datadog New Relic 1. Create, explore, and share dashboards with ease
2.Huge community and documentation
3. Easy to setup and manage
4. Many out of the box solutions for visualization
Nodeexporter latest Docs Hardware and OS Metrics cAdvisor Collectd 1. Measure various machine resources
2. Pluggable metric collectors
3. Basic standard for node monitoing
Docker latest Docs Application Deployment and Management containerd podman 1. Much more bells and wistels are included out of the box comparing to alternatives
2. Awsome community and documentation
3. Easy to work with

Before you begin

Note Each ansible role has a general and a specific Readme file. It is encouraged to read them before firing off

p.s: Start with the readme file of main setup playbook

  • Create an Api on hetzner
  • Create a server as terraform and ansible provisioner (Needless to say that ansible and terraform must be installed)
  • Clone the project
  • In modular_terraform folder create a terraform.tfvars
    • The file must contain the following variables
      • hcloud_token "APIKEY"
      • image_name = "ubuntu-22.04"
      • server_type = "cpx31"
      • location = "hel1"
  • Run terraform init to create the required lock file
  • Before firing off, run terraform plan to see if everything is alright
  • Run terraform apply
  • Go Drink a cup of coffe and come back in 10 minutes or so (Hopefully everything must be up and running by then (: )

Known issues

  • No automation for scaling or maintenance (after the initial set up)
  • Terraform is limited to Hetzner
  • Grafana datasource must be set manually http://IP_ADDRESS_:8428/prometheus

Work flow

  • Run the following command for terraform to install dependencies and create the lock file
terraform init

image

  • Apply terraform modules and get started
terraform apply

image

  • Check the status of services on the node

Note Keep in mind that you must configure alert manager

image

  • Checking the monitoring stack

Note All dashboard are provisioned Just keep in mind that you have to also copy the dashbaord using ansible to the remote destination

image

  • To Clean up everything (including the nodes themselvs)
terraform destroy

image