/note/tech

The Site Reliability Workbook

Googleが公開しているSREについてのWorkbook

  • Table of Contents
  • Foreword I
  • Foreword II
  • Preface
  • Chapter 1 - How SRE Relates to DevOps
  • Part I - Foundations
  • Chapter 2 - Implementing SLOs
  • Chapter 3 - SLO Engineering Case Studies
  • Chapter 4 - Monitoring
  • Chapter 5 - Alerting on SLOs
  • Chapter 6 - Eliminating Toil
  • Chapter 7 - Simplicity
  • Part II - Practices
  • Chapter 8 - On-Call
  • Chapter 9 - Incident Response
  • Chapter 10 - Postmortem Culture: Learning from Failure
  • Chapter 11 - Managing Load
  • Chapter 12 - Introducing Non-Abstract Large System Design
  • Chapter 13 - Data Processing Pipelines
  • Chapter 14 - Configuration Design and Best Practices
  • Chapter 15 - Configuration Specifics
  • Chapter 16 - Canarying Releases
  • Part III - Processes
  • Chapter 17 - Identifying and Recovering from Overload
  • Chapter 18 - SRE Engagement Model
  • Chapter 19 - SRE: Reaching Beyond Your Walls
  • Chapter 20 - SRE Team Lifecycles
  • Chapter 21 - Organizational Change Management in SRE
  • Conclusion
  • Appendix A - Example SLO Document
  • Appendix B - Example Error Budget Policy
  • Appendix C - Results of Postmortem Analysis
  • Index
  • About the Editors
  • Colophon