By Gil Hecht, CEO, Continuity Software
While financial-services organizations are obligated to establish and report stringent service-availability objectives for mission-critical systems, they are traditionally among the worst performing when it comes to achieving this standard.
Well aware of the high cost of downtime and the significant impact it can have on customers, financial-services organizations are not shy about investing in service availability. Many of the organizations that do, in fact, invest in service-availability software have a budget of more than 10 million dollars a year for business continuity and disaster recovery, with some investing more than 50 million dollars each year to keep their systems up and running (in addition to ongoing investment in production hardware, software, facilities, services and personnel).
Yet despite their investments and while each IT (information technology) team does its best to make sure everything is running smoothly, keeping up with the pace of change is proving to be less than trivial. When a change is made (and that’s a daily occurrence in today’s IT environment), most financial organizations have no consistent processes in place to ensure things will continue to function as intended.
With this in mind, below I have outlined a five-step approach for financial-services organizations to combat these challenges:
- Detect: Dispersing the fog
With limited visibility across infrastructure layers, IT organizations are having a hard time detecting problems ahead of time and preventing business disruptions. On average, financial-services organizations are able to detect and address only 59 percent of the critical IT issues before they adversely impact the business.
Manual detection of downtime and data-loss risks is practically unmanageable in any sizeable IT environment. While many organizations rely on periodic testing, it still leaves great portions of the environment vulnerable and exposed to risks in between tests.
Frequent automated detection is the only method that can provide timely visibility to risks across the entire IT infrastructure and prevent risks from accumulating over time.
- Anticipate: An ounce of prevention is worth a pound of cure
Waiting for bad things to happen is never a good idea. Once a system is down, the damage to the business cannot be undone. Highly visible outages tend to send the entire organization into a frenzy, and troubleshooting and recovery efforts become extremely costly and disruptive.
Fortunately, most downtime and data loss incidents follow known patterns that can be documented and identified in advance. With automated daily verification of the IT environment against these known vulnerabilities, relevant IT teams have an up-to-date view of the organization’s state of readiness. They can identify areas of risk and focus their attention and resources on fixing these issues before they impair business operations and turn into a costly undertaking.
- Collaborate: Breaking down the silos
In an interconnected environment, problems tend to spill from one area to another. Consistent compliance with service-availability goals requires tight collaboration and coordination among various IT teams.
It’s not surprising that when speaking with customers, cross-domain and cross-team collaboration was cited as one of the top challenges for IT organizations trying to meet service-availability goals. A unified collaboration platform that provides all IT teams with actionable information about risks across all IT domains allows all parties involved to share information. Integration with existing enterprise systems—email, portals and incident management systems—further streamlines visibility and collaboration.
- Validate: Trust, but verify
Given the high stakes involved in the continuity of business operations and the dire consequences of system breakdown, a comprehensive structure of checks and balances is critical to the overall robustness of service-availability practices.
While there is no doubt that each IT team is doing its absolute best to correct any uncovered risks, a closed-loop process must be put in place to independently verify the resolution and ensure that nothing falls between the cracks.
- Enumerate: There is no management without measurement
Measuring a global set of key performance indicators (KPIs) is the only way to track the bigger picture. Knowing which business services and IT systems are at risk allows IT leaders to analyze risk trends and take corrective actions. With this information in hand, attention can be focused on areas of emerging risks, and activities can be coordinated among the relevant IT teams to achieve availability goals.
Measurement is also important for tracking the performance of the various IT teams. In addition, KPIs can highlight the systems that create the greatest availability risks, so relationships with these system vendors can be proactively managed to address problems in advance.
Last but not least, to support continuous process improvement, KPIs should show what best practices are most frequently compromised and whether processes are improving or deteriorating over time.
To change the current state of resiliency (or lack thereof) in the financial-services sector, a new model of operations is required—one that shifts the emphasis from firefighting to prevention and from siloed functions to one that is based on collaboration and coordination.
Leading financial organizations are already taking such steps to become more resilient. These steps cannot rely on technology alone. To be effective, they must take into account the people involved, the processes in place, as well as organizational structure and culture. Those who are late to the game risk falling behind, with potential detrimental consequences to their reputations and their business performances.