In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization.
This book is divided into four sections:
Niall Murphy leads the Ads Site Reliability Engineering team at Google Ireland. He has been involved in the Internet industry for about 20 years, and is currently chairperson of INEX, Ireland’s peering hub. He is the author or coauthor of a number of technical papers and/or books, including "IPv6 Network Administration" for O’Reilly, and a number of RFCs. He is currently cowriting a history of the Internet in Ireland, and is the holder of degrees in Computer Science, Mathematics, and Poetry Studies, which is surely some kind of mistake. He lives in Dublin with his wife and two sons.
Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC. She has previously written documentation for Google Datacenters and Hardware Operations teams. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University.
Chris Jones is a Site Reliability Engineer for Google App Engine, a cloud platform-as-a-service product serving over 28 billion requests per day. Based in San Francisco, he has previously been responsible for the care and feeding of Google’s advertising statistics, data warehousing, and customer support systems. In other lives, Chris has worked in academic IT, analyzed data for political campaigns, and engaged in some light BSD kernel hacking, picking up degrees in Computer Engineering, Economics, and Technology Policy along the way. He’s also a licensed professional engineer.
Jennifer Petoff is a Program Manager for Google’s Site Reliability Engineering team and based in Dublin, Ireland. She has managed large global projects across wide-ranging domains including scientific research, engineering, human resources, and advertising operations. Jennifer joined Google after spending eight years in the chemical industry. She holds a PhD in Chemistry from Stanford University and a BS in Chemistry and a BA in Psychology from the University of Rochester.
Effective design is at the heart of everything from software development to engineering to architecture. But what do we really know about the design process? What leads to effective, elegant designs? The Design of Design addresses these questions.
These new essays by Fred Brooks contain extraordinary insights for designers in every discipline. Brooks pinpoints constants inherent in all design projects and uncovers processes and patterns likely to lead to excellence. Drawing on conversations with dozens of exceptional designers, as well as his own experiences in several design domains, Brooks observes that bold design decisions lead to better outcomes.
The author tracks the evolution of the design process, treats collaborative and distributed design, and illuminates what makes a truly great designer. He examines the nuts and bolts of design processes, including budget constraints of many kinds, aesthetics, design empiricism, and tools, and grounds this discussion in his own real-world examples—case studies ranging from home construction to IBM’s Operating System/360. Throughout, Brooks reveals keys to success that every designer, design project manager, and design researcher should know.
The added chapters contain (1) a crisp condensation of all the propositions asserted in the original book, including Brooks' central argument in The Mythical Man-Month: that large programming projects suffer management problems different from small ones due to the division of labor; that the conceptual integrity of the product is therefore critical; and that it is difficult but possible to achieve this unity; (2) Brooks' view of these propositions a generation later; (3) a reprint of his classic 1986 paper "No Silver Bullet"; and (4) today's thoughts on the 1986 assertion, "There will be no silver bullet within ten years."
New chapters are included on specifying data requirements, writing high-quality functional requirements, and requirements reuse. Considerable depth has been added on business requirements, elicitation techniques, and nonfunctional requirements. In addition, new chapters recommend effective requirements practices for various special project situations, including enhancement and replacement, packaged solutions, outsourced, business process automation, analytics and reporting, and embedded and other real-time systems projects.
In this collection of over 80 columns, Eric Brechner's alter ego pulls no punches with his candid commentary and best practice solutions to the issues that irk him the most. He dissects the development process, examines tough team issues, and critiques how the software business is run, with the added touch of clever humor and sardonic wit. His ideas aren't always popular (not that he cares), but they do stimulate discussion and imagination needed to drive software excellence.
Get the unvarnished truth on how to:Improve software quality and value—from design to security Realistically manage project schedules, risks, and specs Trim the fat from common development inefficiencies Apply process improvement methods—without being an inflexible fanatic Drive your own successful, satisfying career Don't be a dictator—develop and manage a thriving team!
Companion Web site includes:Agile process documents Checklists, templates, and other resources
This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.
Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.
You’ll learn:How to run reliable services in environments you don’t completely control—like cloudPractical applications of how to create, monitor, and run your services via Service Level ObjectivesHow to convert existing ops teams to SRE—including how to dig out of operational overloadMethods for starting SRE from either greenfield or brownfield
Scaling isn’t just about handling more users; it’s also about managing risk and ensuring availability. Author Lee Atchison provides basic techniques for building applications that can handle huge quantities of traffic, data, and demand without affecting the quality your customers expect.
In five parts, this book explores:Availability: learn techniques for building highly available applications, and for tracking and improving availability going forwardRisk management: identify, mitigate, and manage risks in your application, test your recovery/disaster plans, and build out systems that contain fewer risksServices and microservices: understand the value of services for building complicated applications that need to operate at higher scaleScaling applications: assign services to specific teams, label the criticalness of each service, and devise failure scenarios and recovery plansCloud services: understand the structure of cloud-based services, resource allocation, and service distribution
Gruntwork co-founder Yevgeniy (Jim) Brikman walks you through dozens of code examples that demonstrate how to use Terraform’s simple, declarative programming language to deploy and manage infrastructure with just a few commands. Whether you’re a novice developer, aspiring DevOps engineer, or veteran sysadmin, this book will take you from Terraform basics to running a full tech stack capable of supporting a massive amount of traffic and a large team of developers.Compare Terraform to other IAC tools, such as Chef, Puppet, Ansible, and Salt StackUse Terraform to deploy server clusters, load balancers, and databasesLearn how Terraform manages the state of your infrastructure and how it impacts file layout, isolation, and lockingCreate reusable infrastructure with Terraform modulesTry out advanced Terraform syntax to implement loops, if-statements, and zero-downtime deploymentUse Terraform as a team, including best practices for writing, testing, and versioning Terraform code