Risk Assessment as Part of Business Continuity Planning is Overrated
July 12, 2012Business Continuity Planning (BCP) “by the book” means starting with a Risk Assessment
to identify the types of incidents and risks you need to mitigate. Makes sense, right? How do you guard against something you haven’t identified?
There are two problems with that approach:
- Unless you are a fortune teller, odds are you won’t think of every incident that might occur. If you think of 20 risks, it will be the 21st that gets you.
- If you take risk assessment to an extreme level to try to guard against that unforeseen 21st risk, you can very quickly get into unrealistic and cartoonish scenarios – meteors, swarms of locusts, and maybe even an alien invasion.
A much more efficient and practical approach is to focus on what your organization requires to be resilient and recover from service interruptions, regardless of the specific type of incident. Continuity requirements can be boiled down to the following:
- Alternative locations (DR hot-site, command center, alternate office location for business workers, etc.).
- Redundancy in both technology and people. On the people side, this can be accomplished through cross-training, mentoring, and so on. It doesn’t mean having two people doing the same job.
- Documented and accessible knowledge base, including standard operating procedures (SOPs).
To develop and document a BCP, you will need more detail of course to spell out the who’s and how’s. To help you identify those details, define categories of service interruptions, rather than specific incidents, and use that as a basis for documenting recovery procedures. Service interruptions can be grouped into the following categories:
- Your building is not accessible. Could be due to a swarm of locusts, a chemical spill, or a fire in the building next door. Doesn’t matter what is causing the incident. The net effect is that staff can’t get into the building.
- Your building is gone or severely damaged (e.g., from a natural disaster, fire, roof collapse, or even a meteor).
- Hardware or software failure.
- Power outage.
- Network failure.
Let’s examine the “Building is not accessible” scenario in more detail. In this scenario, your equipment is operational. Your recovery procedure is really about people and the ability to remotely access your infrastructure. For example, customer service staff might require an alternate office facility while knowledge-based workers might be able to work from home. Whether the incident is a chemical spill or a swarm of locusts really doesn’t matter.
The type of risk assessment that can be useful is exploring the risk of equipment failure and the impact of that failure, and then planning technology enhancements accordingly. For example, in an online catalog application, components such as the Message Queuing servers would be critical due to the risk of data loss. That would be a prime candidate for adding redundancy. However, the goal here is to improve availability and resiliency (again, regardless of the cause of failure).
Now if your data center is next door to a nuclear reactor, you don’t need a risk assessment to understand that having an alternate facility in a geographically distant location should be high on your list of priorities. And if your building does get hit by a meteor, you’ll be covered for that too. However, if there’s an alien invasion, all bets are off.
This entry was posted in Governance, Infrastructure, Research, What's New in Research and tagged bcp, business-continuity, business-continuity-planning, contingency-plans, risk, risk assessment. Bookmark the permalink.
Comments are closed.