I believe strongly that if an individual is to blame for a (production/security/operational) incident, then this individual should be fired.

This is because I also strongly feel that the only acceptable time to blame an individual for an incident is in the case of intentional malicious action.

Any other incident can trace the root cause to something systemic that could be improved. For example:

  • Bad hiring practices that put individuals in situations where they operate tools outside their skillset.
  • Insufficient testing or testing environments.
  • Poor documentation that leaves engineers guessing.
  • Insufficient rollback mechanisms.
  • Inadequate monitoring and alerting.
  • Missing access controls.
  • Bad onboarding and training programs.

I’m sure you can think of dozens more…

If you blame an employee for an incident then your culture sucks, and you should feel bad. Worse than said employee does.