Site Reliability

3 books

Order by

View

Becoming SRE

First Steps Toward Reliability for You and Your Organization

by David N. Blank-Edelman

Do you wish the existing books on site reliability engineering started at the beginning? Do you wish someone would walk you through how to become an SRE, how to think like an SRE, or how to build and grow a successful SRE function in your organization?

Becoming SRE addresses all of these needs and more with three interconnected sections: the essential groundwork for understanding SRE and SRE culture, advice for individuals on becoming an SRE, and guidance for organizations on creating and developing a thriving SRE practice.

Acting as your personal and personable guide, author David Blank-Edelman takes you through subjects like:

SRE mindset, SRE culture, and SRE advocacy
What you need to get started and hired in SRE and what the job will be like when you get there
What you need to bring SRE into an organization and what is required for a good organizational fit so it can thrive there
How to work with your business folks and management around SRE
How SRE can grow and mature in an organization over time

Ready to become an SRE or introduce SRE into your organization? This book is here to help.

About the book

0/5 on Goodreads

ISBN 9781492090557

Published in 2024

266 pages

O'Reilly Media

Chaos Engineering

Site reliability through controlled disruption

by Mikolaj Pawlikowski

Auto engineers test the safety of a car by intentionally crashing it and carefully observing the results. Chaos engineering applies the same principles to software systems. In

Chaos Engineering: Site reliability through controlled disruption, you’ll learn to run your applications and infrastructure through a series of tests that simulate real-life failures. You'll maximize the benefits of chaos engineering by learning to think like a chaos engineer, and how to design the proper experiments to ensure the reliability of your software. With examples that cover a whole spectrum of software, you'll be ready to run an intensive testing regime on anything from a simple WordPress site to a massive distributed system running on Kubernetes.

About the book

4.33/5 on Goodreads

ISBN 9781617297755

Published in 2021

424 pages

Manning Publications

Site Reliability Engineering

How Google Runs Production Systems

by Betsy Beyer, Chris Jones, Niall Richard Murphy and Jennifer Petoff

The overwhelming majority of a software system's lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems?

In this collection of essays and articles, key members of Google's Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You'll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.

This book is divided into four sections:

Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices
Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE)
Practices—Understand the theory and practice of an SRE's day-to-day work: building and operating large distributed computing systems
Management—Explore Google's best practices for training, communication, and meetings that your organization can use

About the book

4.23/5 on Goodreads

ISBN 9781491929124

Published in 2016

550 pages

O'Reilly Media