By Peter Guagenti, Chief Marketing Officer, MemSQL
Financial markets are among the fastest-moving markets around. People and organizations need to know where their money is, what it’s doing for them, and whether it’s at risk, on a moment-by-moment basis.
Yet banks and other financial services organizations are often well-established, even venerable, with their names and reputations a vital tool in their ability to prosper. As such, they often have outmoded internal systems that make it hard for data to move quickly across the organization.
This lack of fast data flow makes it hard to provide cutting-edge tools for customers and to equip managers with the information they need to make and implement the best decisions quickly. It also makes it extremely challenging, and in some cases even impossible, to meet customer needs, competitive challenges, and regulatory requirements.
The rapid adoption of modern, even cutting-edge, data infrastructure may be more important in financial services than in any other discipline. In this article, we’ll examine the growing importance of real-time data flows and their impact on business and on people’s lives, especially in financial services.
We’ll then show how financial services organizations can move with all due speed toward real-time data flows, needed to power enhanced customer experiences and to improve business decision-making.
The Growth of Real-Time Customer Service
The fastest growth in business value for the last few decades has been among “digital native” companies that move fast and offer users experiences based on real-time and near-real-time data:
- Google rose to fame and fortune by indexing the World Wide Web throughout the day, crunching their findings at night, and offering an up-to-date index the following morning. Today, Google updates their database in real time.
- Recommendation engines allow Amazon to offer companion products during the purchasing process, TV networks to sell advertising slots at optimized pricing, and Netflix to offer – and develop – viewing options that customers find compelling.
- Services like Airbnb, Lyft, and Uber would be rendered useless without real-time data for ride hailing and trip completion.
Offering enhanced experiences to customers is emerging as a crucial best practice in business growth. According to a report from Kantar, The Experience Advantage, leaders in customer experience achieve a level of business growth, and associated indicators, roughly double the level achieved by non-leaders.
Machine learning and AI are crucial elements in improving customer experience, largely through their role in enhancing analytics. The PwC report 2019 AI Predictions talks about the need to integrate AI into existing analytics systems so as to deliver the greatest scale and impact. AI tends to work better the more data and compute resources that you can assign to it, requiring a new approach to data infrastructure.
The Gartner report, Top 10 Data and Analytics Technology Trends That Will Change Your Business (requires registration), states the following as a strategic planning assumption: “By 2022, every personalized interaction between users and applications or devices will be adaptive.” Delivering on adaptive customer service will require the full use of the current state of the art in both analytics and AI, including optimized data infrastructures, and may require additional advances that have not been invented yet.
Current Data Infrastructure vs Real-Time Data Flows
There are two kinds of data infrastructure that have become the norm: traditional relational databases – which support SQL – and NoSQL databases. Both of them present nearly insuperable obstacles to the real-time data flows needed in financial services.
The data flows powered by traditional relational databases are actually designed to prevent real-time data flows, in order to work around restrictions that made sense decades ago, but which are no longer necessary today. The original online transaction processing (OLTP) systems were limited by the fact that transactions only worked reliably as a single process, with an entire database controlled by a single central computer that had sole responsibility for both transactions and queries.
The emergence of distributed systems – systems in which multiple servers work together to deliver a service, in a unified fashion – only reached stateful services, such as databases, slowly. None of the leading OLTP systems is fully distributed, and limited scalability – for instance, sharded systems and elaborate caching architectures – are complex, expensive, and prone to sudden slowdowns in service.
Because analytics requirements are quite different to transactions, and because single-process OLTP systems were already maxed out, online analytics processing (OLAP) developed as a separate discipline. Today’s data warehouse software is an example of a traditional OLAP system. OLAP systems also tend to be single-process, even though they don’t have the same intrinsic complexity and difficulty needed to reliably complete an update. OLTP and OLAP systems all support structured query language (SQL), though the language has somewhat splintered into dialects due to competitive pressures among providers.
In order to get data from OLTP systems, and other sources, into OLAP systems, a whole other discipline called extract, transform, and load (ETL) developed. Entire public companies today do little except sell and maintain systems and services for ETL. Data flows are almost deliberately slowed down in the move from OLTP, through ETL, to OLAP. Performance of each of these pillars can only be optimized up to a point, and moving data seamlessly from generation events to analytic input is not even considered.
Financial services companies are particularly likely to be constructed in the image of the long-time split between OLTP, ETL, and OLAP. There isn’t a business on Earth in which reliable completion of transactions is more important than in finance, so it’s been convenient to treat the rest of the data value chain as somewhat of an afterthought. Modern requirements for real-time data flows collide with regulatory requirements and decades of “best practice.”
NoSQL evolved as a way to bypass this complexity, but NoSQL introduced complexities of its own NoSQL was not developed by financial services companies; the charge into NoSQL was led by digital native companies that wanted to master huge data flows and could afford imprecision and complexity in order to manage very specific use cases, such as relatively fast and accurate – but far from perfect – online search.
As the name implies, NoSQL databases don’t support SQL, the indispensable lingua franca of analytics. Even worse, from a financial services point of view, they aren’t relational, they don’t support transactions, and every new type of query is a new adventure in understanding novel underlying data structures, filtering out what may be bad data, and averaging and smoothing your way to a useful – but not necessarily precise – conclusion.
Many financial services companies are now reversing large and expensive NoSQL commitments. The data lake is kept as a suitable place for massive quantities of lower-value input data, but without much usefulness against the day-to-day operational needs of the business.
Enabling Real-Time Data Flows
Enabling real-time data flows is an area where start-ups have somewhat turned the size of incumbents against them. Incumbent companies are often saddled with decades-old technology infrastructures, exacerbated by the results of mergers and acquisitions among parent companies with differing IT architectures and approaches.
This existing infrastructure imposes a cumbersome, once-unquestioned data processing architecture, as described above: data is batched, then processed as transactions. The transactional data is then remixed, via ETL, and placed in analytics database or data warehouse. Only then – one or more days after the data-generating events take place – are important events available for analysis and response.
Fraud detection is an example. Many financial services institutions do fraud checking overnight, on data up to a day old – or, if a weekend or holiday has intervened, two or three days old. With that kind of head start, fraudsters can rather thoroughly “wear off the numbers”on fraudulently obtained credit card data.
Incumbent companies must look to the startups that are innovating in many areas, including their data processing architecture, for solutions. Here, a basic formula is emerging: start by streaming data (no ETL allowed) to a fast, scalable operational data store. Then run the incoming data through machine learning models, score it, and assess it for fraud potential, all in real-time. Simultaneously, loop back and update the model with the latest results.
A newly emerging architecture combines Kafka as the messaging system, driving streaming data from its origin to the operational data store; Spark as the model processor, drawing on the leading set of machine learning code libraries; then back to the operational data store, which drives analytics and application support.
Latencies disappear. In one recent case, a major US bank has moved to doing fraud detection “on the swipe,” collapsing fraud detection from an overnight, batch process into a one-second window, including data transmission time over the network. Machine learning models are deployed, run, and updated in real time. Credit card customers, merchants, and the system as a whole benefit from greatly reduced chargebacks and lost sales.
Wealth management, regulatory compliance, and portfolio optimization are additional financial services disciplines that work differently – and far better – when services can be delivered in real time.
In a seemingly short time, the use of real-time data to make operational decisions and deliver enhanced customer services has moved from an occasionally used, value-added option to a competitive necessity.
The extensive data infrastructure possessed by incumbent financial services institutions, so long a source of competitive advantage, can now be an impediment in meeting customer expectations, including the deployment of machine learning and AI. Fortunately, a new, proven model is emerging that supports a move to real-time data processing, including the use of machine learning models.
Financial services institutions now have the opportunity to combine their institutional heft with the latest technologies, delivering the latest and greatest services cost-effectively. This also opens the door for these institutions to move to the offensive, pioneering a new wave of services that can’t be matched by competitors.