/dev/reading
Category

Observability

7 books
Order by
View
by Michael Hausenblas

Don’t fly blind. Observability gives you actionable insights into your cloud native systems—from pinpointing errors, to increasing developer productivity, to tracking compliance.

Observability is the difference between an error message and an error explanation with a recipe how to resolve the error! You know exactly which service is affected, who’s responsible for its repair, and even how it can be optimized in the future. Cloud Observability in Action teaches you how to set up an observability system that learns from a cloud application’s signals, logging, and monitoring, all using free and open source tools.

In Cloud Observability in Action you will learn how to:

  • Apply observability in cloud native systems
  • Understand observability signals, including their costs and benefits
  • Apply good practices around instrumentation and signal collection
  • Deliver dashboarding, alerting, and SLOs/SLIs at scale
  • Choose the correct signal types for given roles or tasks
  • Pick the right observability tool for any given function
  • Communicate the benefits of observability to management

A well-designed observability system provides insight into bugs and performance issues in cloud native applications. They help your development team understand the impact of code changes, measure optimizations, and track user experience. Best of all, observability can even automate your error handling so that machine users apply their own fixes—no more 3AM calls for emergency outages.

Implement Trustworthy End-to-End Data Solutions
by Andy Petrella

Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work.

Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need.

  • Learn the core principles and benefits of data observability
  • Use data observability to detect, troubleshoot, and prevent data issues
  • Follow the book's recipes to implement observability in your data projects
  • Use data observability to create a trustworthy communication framework with data consumers
  • Learn how to educate your peers about the benefits of data observability
by Christian E. Posta and Rinor Maloku

Solve difficult service-to-service communication challenges around security, observability, routing, and resilience with an Istio-based service mesh. Istio allows you to define these traffic policies as configuration and enforce them consistently without needing any service-code changes.

In Istio in Action you will learn:

  • Why and when to use a service mesh
  • Envoy’s role in Istio’s service mesh
  • Allowing “North-South” traffic into a mesh
  • Fine-grained traffic routing
  • Make your services robust to network failures
  • Gain observability over your system with telemetry “golden signals”
  • How Istio makes your services secure by default
  • Integrate cloud-native applications with legacy workloads such as in VMs

Reduce the operational complexity of your microservices with an Istio-powered service mesh! Istio in Action shows you how to implement this powerful new architecture and move your application-networking concerns to a dedicated infrastructure layer. Non-functional concerns stay separate from your application, so your code is easier to understand, maintain, and adapt regardless of programming language.

In this practical guide, you’ll go hands-on with the full-featured Istio service mesh to manage microservices communication. Helpful diagrams, example configuration, and examples make it easy to understand how to control routing, secure container applications, and monitor network traffic.

Setting Up and Operating a Modern Observability System
by Ted Young and Austin Parker

OpenTelemetry is a revolution in observability data. Instead of running multiple uncoordinated pipelines, OpenTelemetry provides users with a single integrated stream of data, providing multiple sources of high-quality telemetry data: tracing, metrics, logs, RUM, eBPF, and more. This practical guide shows you how to set up, operate, and troubleshoot the OpenTelemetry observability system.

Authors Austin Parker, head of developer relations at Lightstep and OpenTelemetry Community Maintainer, and Ted Young, cofounder of the OpenTelemetry project, cover every OpenTelemetry component, as well as observability best practices for many popular cloud, platform, and data services such as Kubernetes and AWS Lambda. You'll learn how OpenTelemetry enables OSS libraries and services to provide their own native instrumentation—a first in the industry.

Ideal for application developers, OSS maintainers, operators and infrastructure teams, and managers and team leaders, this book guides you through:

  • The principles of modern observability
  • All OpenTelemetry components—and how they fit together
  • A practical approach to instrumenting platforms and applications
  • Methods for installing, operating, and troubleshooting an OpenTelemetry-based observability solution
  • Ways to roll out and maintain end-to-end observability across a large organization
  • How to write and maintain consistent, high-quality instrumentation without a lot of work
Achieving Production Excellence
by Charity Majors, Liz Fong-Jones and George Miranda

Observability is critical for building, changing, and understanding the software that powers complex modern systems. Teams that adopt observability are much better equipped to ship code swiftly and confidently, identify outliers and aberrant behaviors, and understand the experience of each and every user. This practical book explains the value of observable systems and shows you how to practice observability-driven development.

Authors Charity Majors, Liz Fong-Jones, and George Miranda from Honeycomb explain what constitutes good observability, show you how to improve upon what you're doing today, and provide practical dos and don'ts for migrating from legacy tooling, such as metrics, monitoring, and log management. You'll also learn the impact observability has on organizational culture (and vice versa).

You'll explore:

  • How the concept of observability applies to managing software at scale
  • The value of practicing observability when delivering complex cloud native applications and systems
  • The impact observability has across the entire software development lifecycle
  • How and why different functional teams use observability with service-level objectives
  • How to instrument your code to help future engineers understand the code you wrote today
  • How to produce quality code for context-aware system debugging and maintenance
  • How data-rich analytics can help you debug elusive issues
Monitor, control, and visualize your Kubernetes and cloud platforms using the LGTM stack
by Rob Chapman and Peter Holmes

To overcome application monitoring and observability challenges, Grafana Labs offers a modern, highly scalable, cost-effective Loki, Grafana, Tempo, and Mimir (LGTM) stack along with Prometheus for the collection, visualization, and storage of telemetry data.

Beginning with an overview of observability concepts, this book teaches you how to instrument code and monitor systems in practice using standard protocols and Grafana libraries. As you progress, you’ll create a free Grafana cloud instance and deploy a demo application to a Kubernetes cluster to delve into the implementation of the LGTM stack. You’ll learn how to connect Grafana Cloud to AWS, GCP, and Azure to collect infrastructure data, build interactive dashboards, make use of service level indicators and objectives to produce great alerts, and leverage the AI & ML capabilities to keep your systems healthy. You’ll also explore real user monitoring with Faro and performance monitoring with Pyroscope and k6. Advanced concepts like architecting a Grafana installation, using automation and infrastructure as code tools for DevOps processes, troubleshooting strategies, and best practices to avoid common pitfalls will also be covered.

After reading this book, you’ll be able to use the Grafana stack to deliver amazing operational results for the systems your organization uses.

What you will learn

  • Understand fundamentals of observability, logs, metrics, and distributed traces
  • Find out how to instrument an application using Grafana and OpenTelemetry
  • Collect data and monitor cloud, Linux, and Kubernetes platforms
  • Build queries and visualizations using LogQL, PromQL, and TraceQL
  • Manage incidents and alerts using AI-powered incident management
  • Deploy and monitor CI/CD pipelines to automatically validate the desired results
  • Take control of observability costs with powerful in-built features
  • Architect and manage an observability platform using Grafana

Who this book is for

If you’re an application developer, a DevOps engineer, a SRE, platform engineer, or a cloud engineer concerned with Day 2+ systems operations, then this book is for you. Product owners and technical leaders wanting to gain visibility of their products in a standardized, easy to implement way will also benefit from this book. A basic understanding of computer systems, cloud computing, cloud platforms, DevOps processes, Docker or Podman, Kubernetes, cloud native, and similar concepts will be useful.

Infrastructure and Application Performance Monitoring
by Julien Pivotto and Brian Brazil

Get up to speed with Prometheus, the metrics-based monitoring system used in production by tens of thousands of organizations. This updated second edition provides site reliability engineers, Kubernetes administrators, and software developers with a hands-on introduction to the most important aspects of Prometheus, including dashboarding and alerting, direct code instrumentation, and metric collection from third-party systems with exporters.

Prometheus server maintainer Julien Pivotto and core developer Brian Brazil demonstrate how you can use Prometheus for application and infrastructure monitoring. This book guides you through Prometheus setup, the Node Exporter, and the Alertmanager, and then shows you how to use these tools for application and infrastructure monitoring. You'll understand why this open source system has continued to gain popularity in recent years.

You will:

  • Know where and how much instrumentation to apply to your application code
  • Monitor your infrastructure with Node Exporter and use new collectors for network system pressure metrics
  • Get an introduction to Grafana, a popular tool for building dashboards
  • Use service discovery and the new HTTP SD monitoring system to provide different views of your machines and services
  • Use Prometheus with Kubernetes and examine exporters you can use with containers
  • Discover Prom's new improvements and features, including trigonometry functions
  • Learn how Prometheus supports important security features including TLS and basic authentication