In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.
This book is divided into four sections:
Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.
Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.
Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.
Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.
This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.
You’ll learn:How to run reliable services in environments you don’t completely control—like cloudPractical applications of how to create, monitor, and run your services via Service Level ObjectivesHow to convert existing ops teams to SRE—including how to dig out of operational overloadMethods for starting SRE from either greenfield or brownfield
Docker is quickly changing the way that organizations are deploying software at scale. But understanding how Linux containers fit into your workflow—and getting the integration details right—are not trivial tasks. With this practical guide, you’ll learn how to use Docker to package your applications with all of their dependencies, and then test, ship, scale, and support your containers in production.
Two Lead Site Reliability Engineers at New Relic share much of what they have learned from using Docker in production since shortly after its initial release. Their goal is to help you reap the benefits of this technology while avoiding the many setbacks they experienced.Learn how Docker simplifies dependency management and deployment workflow for your applicationsStart working with Docker images, containers, and command line toolsUse practical techniques to deploy and test Docker-based Linux containers in productionDebug containers by understanding their composition and internal processesDeploy production containers at scale inside your data center or cloud environmentExplore advanced Docker topics, including deployment tools, networking, orchestration, security, and configuration
In three parts, this book explains how these services work and what it means to build an application the Microservices Way. You’ll explore a design-based approach to microservice architecture with guidance for implementing various elements. And you’ll get a set of recipes and practices for meeting practical, organizational, and cultural challenges to microservice adoption.Learn how microservices can help you drive business objectivesExamine the principles, practices, and culture that define microservice architecturesExplore a model for creating complex systems and a design process for building a microservice architectureLearn the fundamental design concepts for individual microservicesDelve into the operational elements of a microservices architecture, including containers and service discoveryDiscover how to handle the challenges of introducing microservice architecture in your organization