Error Budget
Error Budget is a concept in Site Reliability Engineering (SRE) that defines the acceptable amount of unreliability or downtime for a service over a specific period, typically measured as a percentage of total time. It quantifies the balance between reliability and the need for innovation, allowing teams to take calculated risks by spending the budget on changes that might cause failures. This approach helps organizations prioritize between maintaining stability and deploying new features or improvements.
Developers and SREs should learn and use Error Budgets to manage service reliability in a data-driven way, especially in cloud-native or microservices architectures where frequent deployments are common. It is crucial for teams that need to balance rapid innovation with user expectations for uptime, such as in e-commerce, streaming, or SaaS platforms, as it provides a clear framework for making trade-offs and avoiding over-engineering for perfect reliability.