Credit: Andre Boukreev
If IT workers fear they will be punished for outages, they will adopt behavior that leads to even larger outages. Instead, we should celebrate our outages: Document them blamelessly, discuss what we've learned from them openly, and spread that knowledge generously. An outage is not an expense. It is an investment in the people who have learned from it. We can maximize that investment through management practices that maximize learning for those involved and by spreading that knowledge across the organization. Managed correctly, every outage makes the organization smarter. In short, the goal should be to create a learning culture—one that seeks to make only new mistakes.
I worked at Bell Labs in New Jersey from 1994 to 2000. I was a systems administrator on a team of people charged with maintaining thousands of computers and the network that connected them. It was intimidating to be surrounded by so many brilliant scientists and engineers, many of whom had written the textbooks I used in college.
No entries found