ARC338: How AWS Minimizes the Blast Radius of Failures - a podcast by AWS

from 2021-01-31T22:10:42.023393

:: ::

At AWS, we obsess over operational excellence. We have a deep understanding of system availability, informed by over a decade of experience operating the cloud and our roots of operating Amazon.com for nearly a quarter-century. One thing we've learned is that failures come in many forms, some expected, and some unexpected. It's vital to build from the ground up and embrace failure. A core consideration is how to minimize the "blast radius" of any failures. In this talk, we discuss a range of blast radius reduction design techniques that we employ, including cell-based architecture, shuffle-sharding, availability zone independence, and region isolation. We also discuss how blast radius reduction infuses our operational practices.

Further episodes of AWS re:Invent 2018

Further podcasts by AWS

Website of AWS