About Frank Trovato

Frank Trovato is a Consulting Analyst specializing in disaster recovery, business continuity, and mission critical systems for the Info-Tech Research Group.

BCP ROIWhat if your  business continuity planning (BCP) could actually reduce ongoing costs rather than be just an expensive insurance policy? The key to realizing ongoing value from your BCP, not to mention making a strong case for BCP investment, is to treat it as a means to achieve day-to-day resiliency regardless of whether you ever experience a disaster.

Traditionally, business continuity planning (BCP) has been viewed as a pure cost to the company – an insurance policy that only pays out if there is a disaster. In this context, it’s often a challenge to get funding and resources for BCP beyond the bare minimum necessary to satisfy regulations. More immediate concerns take priority.

But it doesn’t have to be this way. Here are three examples of potential ROI from taking a day-to-day resiliency approach:

1. Prevent Lost Productivity

A common BCP strategy is to have staff work remotely if there is a major event at the primary location such as fire, chemical leak, or a police incident. However, organizations often fail to apply this BCP approach to “lesser” events such as commute delays due to road closures, bad weather, or transit disruptions. In the last few years even “snow days” have become more common again in northern regions.

The loss of productivity that stems from such minor disruptions is often accepted as unavoidable, but this doesn’t have to be the case. Instead, use your notification systems to alert staff of possible weather impact or traffic delays, and enable staff to work remotely and be productive rather than waste hours trying to commute in to your primary workplace.

A bit of caution: If part of your BCP strategy is for staff to work from home, the organization needs to be fully committed to this option – e.g., deploying laptops rather than desktop computers, and supporting secure remote access to key systems to enable a mobile workforce – or staff might as well just take the day off.

Rely on your BCP team to ensure there are no gaps in your support for a mobile workforce, and then leverage that capability any time events might impact productivity, not just when there is a major disaster.

2. Reduce Facility Costs

Organizations have an opportunity to reduce facility costs while improving business resilience by moving away from the static workplace model to a more dynamic approach. For example, reduce fixed office space to the minimum required (e.g., for staff who must be in a fixed location due to their role and responsibilities) and leverage a combination of vendor-managed business centers and work-from-home policies to accommodate overflow as needed.

If a major incident does occur at your primary facility, the business can more easily adjust to working from different locations since this is already ingrained in day-to-day business practices. Similarly, if the incident also affects one of your usual vendor-managed business centers, you can leverage your existing vendor relationship to identify an alternative location. In the meantime, your organization saves money on facility costs.

3. Minimize Resourcing Risks

It’s important that your BCP identify backup staff who can take over business continuity tasks during an event if the primary is not available. Similar to the work-from-home strategy, the organization needs to invest time in practices such as cross-training to ensure backup staff are capable of taking over in a crisis.

That investment in training backup staff as part of your BCP strategy also enables the organization to better manage day-to-day resourcing challenges from sick days, resignations, and vacations. This can also reduce the risk of being held hostage by the threat of resignation, and improve overall resourcing flexibility.

It’s much easier to get support from senior management for BCP practices such as identifying and training secondary staff when you can demonstrate a return on that investment outside of the disaster scenario.

From enabling a mobile workforce to training backup staff, incorporating business continuity practices in day-to-day business decisions greatly increases your chances of a successful recovery from a major event. These same practices can turn business continuity from a cost drain to a cost saver if you change your approach from a purely disaster recovery practice to ensuring day-to-day business resilience.

Looking to right size your disaster recovery/business continuity plans. Close the gap between your DR capabilities and service continuity requirements with Info-Tech’s World Class Operations Disaster Recovery Planning Workshop. Click here for more.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

78457130When do you hear about the left tackle? When he gives up a sack. When do you hear about IT security? When there’s a major breach. No wonder Chief Security Officers (CSOs) often take a conservative approach. However, security practices must minimize risks AND enable the business.

Let’s take this analogy a step further. How does the conservative football coach prevent sacks? He keeps the tight end in to help the left tackle, and both running backs to pick up any blitzes, leaving only two receivers to run pass patterns. The coach has drastically reduced the odds of a sack, as well as the odds of scoring any points. The overall goal can’t just be avoiding sacks but enabling the team to score points while protecting the quarterback.

Similarly, the conservative CSO takes a lock-everything-down approach – extremely limited remote or mobile access, overly restrictive access rights, no client-facing websites, and a whole lot of roadblocks for business users (e.g., no BYOD). However, at an organizational level, the overall goal must be business enablement, not risk avoidance.

The trap is thinking you have to be conservative to ensure security. Instead, take a proactive approach that ensures the appropriate security practices are in place to support the demand for remote or mobile access, expedited changes to access rights, and client-facing websites rather than throwing up roadblocks. For example:

  • Implement advanced network segregation — which separates critical apps/data from the rest of the network — as part of your standard network provisioning procedures. This enables new initiatives such as a partner portal to be rolled out in a timely manner without undue risks. Your critical apps/data remain off-limits. Without this level of segregation, it’s much riskier to allow external access to your network, prompting overly conservative security practices that limit the business.
  • Define Role-Based-Access-Rights (RBAC) to ensure consistent and timely assignment of user access rights. This streamlines user provisioning, enabling new staff to become productive faster. It also facilitates seamless transfers to other departments or changes in responsibilities. In a large company, without RBAC, it can take one-to-two weeks to approve and implement changes to access rights due to the number of applications and system owners that might be involved. Security becomes a hindrance to the business.
  • Establish application development security standards that must be followed regardless of the application’s purpose. You never know when the business will decide to web-enable an internal application – for example, to support mobile staff. Applications are typically the weak points even in large enterprises with dedicated security teams because of the inconsistency at the developer level. Again, this prompts a more conservative approach after-the-fact to limit exposure.

The conservative avoid-all-potential-risks approach doesn’t work because staff will subvert processes to “get the job done,” just like the quarterback who changes the play at the line of scrimmage.

Instead, implement appropriate security practices that support business initiatives – and go on the offensive. Design plays that give you a chance to score points while keeping the quarterback upright. Keep the running back in to block and help your offensive line, but let the tight-end run a skinny post to the end zone. If you don’t score points (or make money), why even play the game?

For more advice on optimizing security, see the project blueprint Optimize Security Operations without Overspending (http://www.infotech.com/workshops/optimize-security-operations-without-overspending).

 

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

 

Modernization is a hot topic these days for legacy applications, but is being green really so bad? If you are going to modernize, make sure it’s driven by sound business reasons, not just perception. Otherwise, you’re just putting lipstick on a pig, as the saying goes.

Case study: Green screen wins out over web-based interface

An insurance company, that has asked to remain anonymous, was well on its way to converting green screen interfaces for a data entry application to a web-based interface, when the business leaders put on the brakes. Yes, a web-based app would cut down on training time for new staff, but would impact their productivity; in short, green screen was much faster for data entry.

In short, if all IT was going to do was replace the existing green screen interface with the exact same screens in a browser-based interface — i.e., no process improvements, shortcuts, or truly enhanced functionality — they wanted no part of that.

Start modernization projects with a business objective

There are several good business reasons to consider modernization. For example:

  • Can you increase productivity by replacing 5 green screen pages with 1 web page that includes time-saving features such as drop-down menus and so on (which can make up for slower response times for web pages)?
  • Similarly, can you improve accuracy by including options in drop-down menus?
  • Is there an opportunity to more seamlessly integrate with other applications, thereby streamlining your overall business process?

Providing a web-based interface to reduce training time for new staff is absolutely a solid business reason, but as the case study above illustrates, that may not be a good enough reason on its own. The insurance company in this example settled on providing a web-based interface for customers but stuck with green screens for their employees to maintain existing productivity until they could put the time in to make meaningful improvements via modernization.

Legacy is not a dirty word. Legacy applications equal longevity, staying power, and value. Consider the ROI you’re getting on that green screen application that was written 20 or 25 years ago? Oh, and remember, even if you think they’re ugly, pigs have their good points too, like bacon.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

 

Full-scale testing — completely shutting down your primary site and failing over to a recovery site — is impractical for most organizations, and it’s actually the least effective form of testing, based on a recent survey:

The challenges with simulation, parallel and full-scale testing

Simulation testing involves bringing recovery facilities and systems online to validate startup procedures and standby systems. Parallel testing takes this a step further by including restoring from backups and validating production-level functionality. Both methodologies can be executed without impacting your production environment, but still require a commitment of time, money, and resources.

Full-scale testing adds the risk of service interruption if the recovery site cannot be brought online. Unless you are running parallel production data centers, it is too risky and impractical for most organizations.

However, the biggest issue with the above methodologies is the focus on technology. Where companies usually struggle with DR is with people and processes, and those factors are inherently overlooked in technology-based testing. Processes for tasks such as assessing the impact, recalling backups, and coordinating recovery activities are not validated.

Why tabletop testing is so much more effective

Tabletop testing gets the technology out of the room — and out of your focus — so you can concentrate on people and processes, and for the entire event, not just your failover procedures. Specifically, tabletop testing is a paper-based exercise where your Emergency Response Team (ERT) maps out the tasks that should happen at each stage in a disaster, from discovery to notifying staff to the technical steps to execute the recovery.

It’s during these walkthroughs that you discover half of your ERT doesn’t know where your DR command center is located, or that critical recovery information is kept in a locked cabinet in the CIO’s office, or key staff would be required for so many separate tasks that they would need to be in 10 places at once.

Tabletop testing also makes it easier to play out a wider range of scenarios compared to technology-based testing. Walk through relatively minor events, such as an individual key server failing, or major disasters that take down your entire primary site.  Similarly, play out what-if scenarios, such as what happens if key staff members are not available or disk backups have been corrupted.

With parallel testing, you can be sure that the technician restoring backups is not dealing with data corruption, and any necessary documentation is readily available (not locked in an office that you can no longer access); the focus is on “does the technology work” and not the hundred other things that can go wrong during a recovery. Tabletop testing reveals those people and process gaps that are otherwise so difficult to identify until you are actually in a DR scenario.

Focus on unit testing to validate standby systems

Unit testing was second only to tabletop testing in overall importance to DRP success. In this context, unit testing means validating standby systems as your environment changes, ideally as part of your change management procedures. The recovery site goes through the same release procedure as the primary site, including unit testing affected systems, to ensure that standby systems stay in sync with your primary systems.

Unlike simulation, parallel or full-scale testing, there is no pretense that unit testing is validating your DRP. It is validating the technology, and that’s all, so it provides a good complement to tabletop testing.

Conclusion

Is it important to validate standby equipment? Yes, but if that’s the focus of your DR testing, you aren’t truly validating your DRP. Use simulation or parallel testing to validate your recovery site and standby systems, and unit testing as your environment changes for ongoing validation — but make annual tabletop testing your primary methodology for practicing and verifying end-to-end DR procedures.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Business Continuity Planning (BCP) “by the book” means starting with a Risk Assessment to identify the types of incidents and risks you need to mitigate. Makes sense, right? How do you guard against something you haven’t identified?

There are two problems with that approach:

  1. Unless you are a fortune teller, odds are you won’t think of every incident that might occur. If you think of 20 risks, it will be the 21st that gets you.
  2. If you take risk assessment to an extreme level to try to guard against that unforeseen 21st risk, you can very quickly get into unrealistic and cartoonish scenarios – meteors, swarms of locusts, and maybe even an alien invasion.

A much more efficient and practical approach is to focus on what your organization requires to be resilient and recover from service interruptions, regardless of the specific type of incident. Continuity requirements can be boiled down to the following:

  • Alternative locations (DR hot-site, command center, alternate office location for business workers, etc.).
  • Redundancy in both technology and people. On the people side, this can be accomplished through cross-training, mentoring, and so on. It doesn’t mean having two people doing the same job.
  • Documented and accessible knowledge base, including standard operating procedures (SOPs).

To develop and document a BCP, you will need more detail of course to spell out the who’s and how’s. To help you identify those details, define categories of service interruptions, rather than specific incidents, and use that as a basis for documenting recovery procedures. Service interruptions can be grouped into the following categories:

  • Your building is not accessible. Could be due to a swarm of locusts, a chemical spill, or a fire in the building next door. Doesn’t matter what is causing the incident. The net effect is that staff can’t get into the building.
  • Your building is gone or severely damaged (e.g., from a natural disaster, fire, roof collapse, or even a meteor).
  • Hardware or software failure.
  • Power outage.
  • Network failure.

Let’s examine the “Building is not accessible” scenario in more detail. In this scenario, your equipment is operational. Your recovery procedure is really about people and the ability to remotely access your infrastructure. For example, customer service staff might require an alternate office facility while knowledge-based workers might be able to work from home. Whether the incident is a chemical spill or a swarm of locusts really doesn’t matter.

The type of risk assessment that can be useful is exploring the risk of equipment failure and the impact of that failure, and then planning technology enhancements accordingly. For example, in an online catalog application, components such as the Message Queuing servers would be critical due to the risk of data loss. That would be a prime candidate for adding redundancy. However, the goal here is to improve availability and resiliency (again, regardless of the cause of failure).

Now if your data center is next door to a nuclear reactor, you don’t need a risk assessment to understand that having an alternate facility in a geographically distant location should be high on your list of priorities. And if your building does get hit by a meteor, you’ll be covered for that too. However, if there’s an alien invasion, all bets are off.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter