By Yaniv Valik, VP Product Management & Customer Success, Continuity Software
Institutions from government to universities are no longer what they once were in the public mind, but banks have fared worse in the court of public opinion than many others, according to a Gallup poll. Over the course of a decade, the 2016 poll showed that while confidence in the military, media, and even organized religion had fallen, confidence in banks had dropped 22%, more than double the confidence drop for any other type of institution.
Already struggling to maintain their stature in the public eye, banks cannot afford any additional hits to their good names and to the confidence their clients have in them – so one would think that they would do everything in their power to ensure that customers have 24/7 access to their funds and avoid service interruptions. And indeed, banks do put a lot of effort into ensuring access – but somehow, service outages do occur. And when they do, dismay, disbelief, and eventually anger ensue.
Just ask the folks at HSBC UK. Earlier this year, the bank’s online and mobile services went down due to “technical problems,” with the services offline for days. Angry customers took to social media with pleas, puns, and protest:
“Our rent’s overdue. But I can’t check into it because the HSBC website is down.”
“HSBC (or How Simple Became Complicated’) still has its online banking platform down.”
“How a biz as big and wealthy as @HSBC_UK can have internet banking down for 2 days is beyond belief.”
There are no public statistics on whether irate customers closed accounts and moved to a competitor due to this event. But no bank should take comfort if that did not indeed happen, because it most likely means that customers have thrown up their hands in frustration, and don’t expect any better at another bank.
One wouldn’t imagine that a bank didn’t do everything it could to avoid such outages. No doubt these banks buy top of the line systems to ensure service continuity, yet outages still happen. Why?
Blame it in part on the pace of technological change. To keep up, banks need to continue and push the envelope with new services and implement new systems. But are those systems compatible with the existing ones? Will a glitch in one affect others? Can these glitches be worked out so that services will remain up, and not go down?
The answer to that latter question is often “we hope so” – and that’s an answer that even the IT department will give. It’s actually the only honest answer; there are likely thousands of files on bank servers that control the configuration of services. Obviously, examining these configurations manually is next to impossible for an IT team – even one with hundreds of members. And running a test environment is usually not enough; the test environment will likely not include all the legacy systems and infrastructure layers that the new services must integrate with, leaving these interdependencies untested until the new system goes live.
These configuration issues are not necessarily even due to “errors” or bugs – they are often outgrowths of the normal function of systems. While IT infrastructures have been growing massively in complexity and scale, the tools at the disposal of IT teams have not kept pace. And as the rate of change in these environments continues to escalate with newer technologies and rapid release cycles, it’s no wonder IT teams are struggling to ensure all facets of the infrastructure are risk-free and configured according to industry best practices.
In fact, a recent study by the University of Chicago, which examined the roots of service outages at online companies determined that the majority – nearly 300 of the 500-some instances that were examined – were due to “unknown” factors. “Unknown” could mean anything – misconfiguration, malware, incompatible software, etc. The point is that it is unknown, and that is dangerous for any organization that needs to ensure continuity of services – especially organizations like banks, which, subject to regulation and public ire, are in a more sensitive position than most other service businesses.
Detecting these unknowns at the scale and complexity common in today’s environments requires a different approach – a quality assurance mindset and processes designed to proactively identify misconfigurations and risks. It requires system-wide visibility and the ability to proactively examine and analyze the thousands of things that could go wrong across all layers of the infrastructure, alerting IT personnel to potential risks so they can be addressed before adversely impacting service availability and performance.
An automated big data system that constantly crawls a company’s IT infrastructure – checking out the dependencies, determining the way resources are allocated, and proactively detecting problems – could help IT teams ensure full availability. When a change occurs that has the potential of causing a service disruption, the system alerts IT personnel, pointing out where the problem is and what needs to be done. The issues are highly visible, and remediation procedures are available to allow for a quick resolution before the business is affected.
Of course, there will still be issues for IT teams to contend with, ranging from hacking and security threats to regulatory and compliance requirements. At least, though, banks will know what they are dealing with, as opposed to searching in the dark – vulnerable to the vagaries of unknown IT issues, as well as to the ire of customers.