··· Chatra All books
§§ Table of Contents − − − − − − − − −
4.

When things are back to normal

Phew, take a breath. Everything has calmed down, and everything is back to normal. The crisis is over. However, there’s still a list of things to consider before you wrap up this chapter. In particular, it’s time to reflect and see what went wrong, what you did well and learn from this experience. You’ll also need to circle back to customers to close the loop, communicate any final updates and offer compensation if you need to.

When things are back to normal

Running a blameless postmortem

After the crisis has been wrapped up and everything has returned to normal, it’s important to look back at what happened and identify learning opportunities so we can do better going forward. This isn’t about pointing fingers or assigning blame, but rather about discovering the issues in our everyday workflows and improving them.The Etsy engineering team has coined this process, Blameless Postmortem. This meeting involves asking the right questions to prevent people from simply accepting blame and pointing to the simplest root cause: I messed up. But rarely is human error the only cause of issues. The point of the post-mortem is to uncover where better checks can be put in place, where confusion arises and how future errors can be prevented.

Debriefing a crisis starts by looking at what happened before things went wrong, and asking deep questions about the failures that led to the issue. Etsy offer a great example in their blog post on postmortems where a poor choice in dashboard set-up led the deploying engineers to read an eight as a zero. Without walking through the exact workflow they used before the crash, the team wouldn’t have identified the issue.

Evaluate your response

Besides identifying the root cause of the issue, crisis post-modems should also review the response to the crisis. This helps improve future crisis management. The following questions will need to be answered:

  • How long did it take us to identify and respond to the issue? How can we shorten this time? Remember, the faster you’re able to identify the crisis, the more effective your response will be.
  • How many customers were affected? How can we reduce this number?
  • How many customers contacted us about the issue? How could we reduce this number through clearer, more proactive communication?
  • Of the customers that contacted us, how many gave us a positive customer satisfaction rating? How could we improve this? What were the negative comments about?
  • How smoothly did our team communicate internally? Did anything get lost in communication or dropped during the crisis?

Consider every crisis an opportunity to improve your crisis management strategies and processes. You’ll learn something new with each unique incident. And while you hopefully don’t need to face a ton of stressful, urgent problems every day — more practice will make you more skilled at handling them as a team. Just like a well-oiled machine.

Communicate a final update to customers

Even as you’ve been giving updates to customers throughout the crisis, it’s important to wrap everything up after you’ve done the postmortem. This is where you can share some of the things you’ve learned and what you’re going to do better in the future. Final updates are important to rebuild the trust your customers may have lost in you. This is especially true if the same crisis has happened more than once — what make this time any different? How are you going to ensure this doesn’t happen again?

Some of the things to cover in your final update include:

  • An overview of what happened, or what the problem was.
  • How customers were affected.
  • What you did to resolve the issue.
  • Any remaining impact to customers (ie. lost data, password reset required).
  • How you’re going to prevent the issue from re-occuring or what changes you’ve made.
  • Any compensation affected customers will receive (this can also be communicated privately).
  • How customers can contact your team with any further questions.

Balance the technical and the helpful

When you’re deciding how much detail to put into your wrap-up communications, it’s important to consider your audience. While they might want a full explanation of why the issues happened, they likely don’t need all the nitty gritty details of exactly what your devops team did to bring everything back online. However, they might need to know things like security policies you’ve put into place, any changes to the API that they might notice and enough detail to be sure your technical team is in control of the situation.

As SorryApp puts it, in their guide to crisis communication:

People probably aren’t too fussed as to whether your database server in Asia is playing up, or that the switch in your west European cluster has gone down; they simply want to know you’re on the case and what you’re doing to fix it.

Should you offer compensation?

After every crisis, the question arises as to whether it’s necessary, or even helpful, to offer compensation to anyone affected. There are a few considerations that go into making this decision:

  • Impact. How serious was the issue? How did it impact customers? If it was an inconvenience, the compensation doesn’t need to be as big as if it was a complete loss of business function.
  • Available compensation. What options do you have to provide compensation? For example, on a delayed shipping issue, can you offer something small like free expedited shipping next time? Any credit or discount options that encourage customers to come back for another purchase to get the benefit are better than straight refunds.
  • Precedent. What have you done in the past? If you have a history of not providing refunds, it might make sense to stick to it, just so that you don’t set the wrong expectations for future issues.

SLAs and contractual compensation requirements

If you have enterprise customers that have service level agreements built into their requirements, you are legally required to maintain a certain level of uptime. If you don’t, there are likely credits that you’re required to provide. Calculate and communicate these proactively, rather than dragging your feet on them. It will be seen as a sign of good will (and you’ll have to do it anyways). There are tools that will monitor SLAs for specific enterprise customers automatically for you (Pingdom, for example) and it’s worth setting these up as soon as you start signing contracts with uptime requirements.

Providing Compensation

It’s inevitable that some customers will ask for refunds or discounts — it’s just in their nature. However, if the situation does indicate the need for reimbursement you shouldn’t wait to be asked. Be proactive in your generosity.

Compensation will be better received after the issue is resolved. Offering a discount on future service when the existing issue is still ongoing doesn’t solve the problem and it doesn’t give customers what they want — a solution. If anything they will be more upset that things are still broken and you’re trying to get them to buy again already.

Rather than calculating the “fair” amount of compensation down to the prorated minute, be generous in your credits or discounts. As SorryApp says: “It’s about going above and beyond, doing everything and anything possible to set things right.” You’ve lost the trust of your customers during the crisis — this is the time to make it up to them and show how much their business matters to you.

Get creative when it comes to deciding how you can win back customers. When Equifax went through their security crisis in July 2017, they offered all affected customers a free subscription to a credit protection program. This helped prevent customers from seeing any side-effects of a bad breach, and also showed that Equifax was looking out for them. While Equifax obviously didn’t handle that crisis exactly right (hiding security breaches from your customers is never ideal), their use of creative compensation is a great strategy to remember.

When calculating the cost benefit of compensation, consider this: it’s between five and 25 times more expensive to acquire a new customer than retain an existing one. If you’re stingy and your existing customers walk out the door, that’s an expensive problem to solve. Even if you offer very generous compensation, discounts and recovery incentives to your affected customers, it’s still likely more affordable than trying to replace all of them. When you’re considering if you can afford to compensate customers — it’s more likely that you can’t afford not to.

Service recovery paradox

There’s a phenomenon that can happen when a company manages a very bad crisis very well — they come away with happier customers and more fans than before the crisis. This is called the service recovery paradox.

It’s a paradox because we automatically assume that if something bad happens, especially if it’s our fault, that customers will be upset and leave. But that’s not always the case. When companies do a great job of responding to an issue it actually builds trust that they will do the right thing, no matter what. As we mentioned above, Buffer’s excellent response to their crisis gained them really positive publicity.

Customer Thermometer explains it well:

Companies with the best customer service understand the paradox: customers are often more loyal after a service failure (so long as the recovery has been swift and good) than customers who have not a service failure at all.

In other words, the silver lining of your crisis is that you can show what your company is really made of.

If handled well, customers will become even more loyal than before the downtime. Hmm… should we try unplugging the server, just to see if we can gain more fans? (Just kidding, we’d never advocate for creating a crisis on purpose, no matter how thoroughly you’ve planned for it!)