Skip to content

This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.

Notifications You must be signed in to change notification settings

shantoroy/site-reliability-engineering-101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 

Repository files navigation

#100daysofSRE - Site Reliability Engineering Notes (SRE-101)

I will join as a Site Reliability Engineer (SRE) intern at a Fortune 500 listed company in this summer of 2023. Now, I plan to take the #100dayschallenge for learning and sharing my journey through SRE resources.

I have planned the contents for next 100 days, and I will be posting one blog post each and everyday under the hashtag #100daysofSRE. ✌️

  1. #100daysofSRE (Day 01): Introduction to Site Reliability Engineering
  2. #100daysofSRE (Day 02): History of SRE and its Evolution
  3. #100daysofSRE (Day 03): SLAs, SLOs, and SLIs — understanding the metrics of reliability
  4. #100daysofSRE (Day 04): Chaos Engineering and SRE - Techniques and Tools to Break Things on Purpose
  5. #100daysofSRE (Day 05): Automation Benefits, Techniques, and Tools in SRE
  6. #100daysofSRE (Day 06): Incident Management and Response for Site Reliability Engineers
  7. #100daysofSRE (Day 07): Effective Communication during Incidents for Better Incident Response
  8. #100daysofSRE (Day 08): Root Cause Analysis and Post-Incident Reviews for SRE
  9. #100daysofSRE (Day 09): Monitoring and Observability in SRE
  10. #100daysofSRE (Day 10): Grafana vs Splunk for Monitoring System and Applications
  11. #100daysofSRE (Day 11): Logging and Log Analysis in Site Reliability Engineering- Techniques, Tools, and Best Practices
  12. #100daysofSRE (Day 12): Alerting and Notification Strategies and Best Practices in SRE
  13. #100daysofSRE (Day 13): Capacity Planning and Management in Site Reliability Engineering
  14. #100daysofSRE (Day 14): Load Testing and Stress Testing in Site Reliability Engineering
  15. #100daysofSRE (Day 15): Disaster Recovery Planning and Testing in SRE
  16. #100daysofSRE (Day 16): High Availability and Redundancy Strategies for Data

About

This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published