High Availability Depends on More Than TechnologyMay 17, 2012
By Frank Trovato, a Research Analyst specializing in mainframe technology and mission critical systems for Info-Tech Research Group
Downtime is more likely to be caused by human
error or process issues, yet organizations often focus primarily on technology redundancy. An Info-Tech survey found that adding more layers of redundancy (e.g. going to N+2) does not have close to the same impact as addressing people and process issues. Organizations thinking about investing tens or hundreds of thousands of dollars into increasing redundancy should first take a look at their people and processes.
For example, the same survey found that having secondary resources in place for mission critical systems was a strong indicator of success in meeting availability objectives. Secondary resources does not mean paying two people to do the same job, but rather sharing knowledge through a mentorship program or cross-training so the organization is not overly dependent on specific individuals. You need backup people such as much as you need redundant servers.
As far as processes are concerned, don’t assume staff are already following good processes, or that normal processes for production systems are rigorous enough for mission critical systems. There is a higher level of investment and risk with mission critical systems that demand a higher level of attention. For example, a U.S. bank recently discovered their development team was not consistently using source control for mission critical code. Processes must be documented and managed.
On the technology side, when end-to-end redundancy is not possible due to budget limitations, prioritize investments based on risk and impact analysis. That means doing your homework in terms of clearly identifying which systems are mission critical, what are their dependencies (and therefore also mission critical), what is the impact to the business, and where are the single points of failure.
In the meantime, while technology investments may need to be delayed, there is no reason to delay addressing the equally (if not more) important people and process aspects of high availability. Simply purchasing and installing more-advanced hardware and software will not deliver 4 or 5 x 9 availability. For more on aligning people, processes, and technology to deliver high availability, see Info-Tech’s solution set, Maximize Availability for Mission Critical Systems.This entry was posted in Infrastructure, Research, What's New in Research and tagged disaster-recovery, dr, drp, Incident Management, Mainframe, Mission Critical, service-management, System z, zEnterprise, zSeries. Bookmark the permalink.