Simply mentioning the topic of artificial intelligence in finance usually elicits a mix of excitement (“AI is amazing and can solve every problem”) and fear (“Will we all lose our jobs? Will robots cause the next market crash?”). In reality, most current fintech applications of artificial intelligence are still miles away from the machine-dominated dystopia of sci-fi novels: AI-powered chatbots provide bank customers with better phone support, big data analysis help direct-lending sites and insurers make more intelligent pricing decisions.
This study will apply deep learning techniques to solve one of the most computationally-intensive problems of asset management – portfolio diversification. Mean-variance optimization is a perfect real-world test for artificial intelligence because the problem has a known solution: we fed the input data (total return indices of various equity regions) and the expected output (the optimal weights of the efficient frontier portfolio) to train a deep neural network. Then, we used out-of-sample data to test whether the model was able to replicate the optimization process on its own.
Much to our surprise, the out-of-sample results were generally close to the correct mathematical solution. Without any prior knowledge of the problem and a relatively small dataset (2,200 data points), the neural network learned the complex optimization process, and was able to replicate its results most of the time.
Interestingly, even the model’s failures generated valuable insights. For example, the model could not find the Markowitz optimal weights during most of 2016. That is because its key input – the covariance matrix – was literally all over the map after the Chinese devaluation and the Fed’s botched rate hike. Deep learning only works if the future resembles the past. As N. Taleb might say, a neural network that has only seen white swans will never be able to fathom the existence of a black swan. This is something to keep in mind when evaluating quant strategies after 35 years of falling bond yields.
The Diversification Problem and its Traditional Solution
Diversification may be the only free lunch in the world of portfolio management, but doing it properly requires quite a bit of work. The problem of portfolio diversification was solved mathematically by H. Markowitz in his 1952 paper, Portfolio Selection. Markowitz’s work won him a Nobel prize, but the logic of modern portfolio theory is quite intuitive, and the mathematics are not so complicated.
In Markowitz’s world, investors only care about risk and return – specifically how to maximize the latter while minimizing the former. Every asset offers a unique set of expected return and standard deviation. Every asset is defined in relation to all others by the covariance matrix. Given these three datapoints (expected return, variance, and the covariance matrix), we solve for the assets weights that will generate the portfolio with the highest risk-adjusted return. The goal is to find the sets of portfolio weights that maximize the following function:
In other words, the goal is to find the combination of weights that will produce the portfolio with the highest Sharpe ratio. Graphically, this would be the tangency portfolio in the efficient frontier chart below.
In practice, the problem is solved by trial and error. An algorithm will test every combination of weights, store their Sharpe ratios, and keep looping until it finds the portfolio with the highest risk-adjusted return. MS Excel Solver offers a very user-friendly solution to this problem that does not require any coding. However, the computation grows exponentially with the number of assets in the portfolio. The covariance matrix of a 10-asset portfolio has 100 cells but that of a 100-asset portfolio has 10,000 cells. Historical backtests need to replicate this complex process at every period, which requires the use of a statistical software.
In this case study, we used the weekly total USD return of the six MSCI regional indices since 1995 as the input data. We searched for the combination of Japanese, U.S., European, British, Canadian, Emerging Markets and Pacific-Asia equities that offered the best risk-adjusted returns at any given point. The trailing one-year return is used as a measure of expected return, the trailing one-year standard deviation as a measure of risk, and the trailing one-year covariance matrix between the six regions. We constrained the experiment to mimic a simple, long-only portfolio: individual portfolio weights had to be positive but below one, and the portfolio needed to be fully-invested at all times. The optimizer is re-run every week and the portfolio is rebalanced at Friday’s close.
Somewhat counter-intuitively, the Markowitz portfolio is very poorly diversified. It spent most of the 2009-2010 rebound in Emerging Markets, and most of the past four years exclusively in the U.S. stock market. This lack of diversification owes to the fact that we used indices’ trailing one-year returns as their expected returns: if a market greatly outperforms, as the U.S. has in the past four years, the model will allocate 100% of the portfolio to the best-performing index.
Markowitz Optimal Weights – Regional Equity Allocations
The strategy’s lack of diversification has been a blessing: the Markowitz regional portfolio greatly outperformed both the equal-weighted and the cap-weighted MSCI All Country indices over the simulation period. The portfolio’s big bets (emerging markets in 2004-2007, U.S. since 2014) contributed most of the gains. This alphais likely the result of luck (momentum has worked at the regional level in the past 30 years) than any inherent virtue of the mean-variance optimization process. But our point is not to criticize or laud the Markowitz process, but to replicate it with deep learning techniques. To which we shall now turn.
The Deep Learning Approach
Deep learning operates a (reverse) Copernican revolution on the scientific process. Modern science works by induction or deduction. The scientist posits a theory and uses experiments to prove it or disprove it. Theories that cannot be disproved are held to be true.
On the contrary, neural networks do not need theories. They make no assumption regarding the nature of the relations between variables or the causality between them. They just observe many tables of inputs and outputs and learn from repetition. Neural networks work like the biological brain: dogs have no concept of the laws of gravity and do not understand their muscular systems. Yet most dogs can catch a frisbee in the air because their brains have observed so many falling objects that they can predict where the frisbee will be next.
In a deep neural network, every neuron is trained to look for a specific attribute – the edge of an eye, the shape of a mouth, etc. Higher-level neurons synthesized this raw information into a more complex object: a human face, the face of 35-year old Caucasian woman, and finally the face of the friend who was tagged in your 1988 fifth-grade class picture. Smartphone applications can recognize faces because deep neural networks have been trained on millions of tagged pictures and have learned to replicate the process.
We followed a similar process for the portfolio optimization process. The deep neural network model was trained on 25 years of input (the weekly return of the six MSCI regions) and output (the optimal portfolio weights as calculated by a mean-variance optimization script).
The in-sample performance (prior to 2013) is remarkably close. This is already quite impressive because the dataset is small (about 2,200 data points), the output is complex (the model is asked to predict six weights), the target weights are extremely volatile, and the solution is not a direct function of the input, which had to be found via trial and error. A quant researcher who would just be given this data would never find the Rosetta stone to translate the input dataset into the target sets of weights – unless he had been told that it was generated by a mean-variance optimization process.
The post-2013 period was kept as the validation dataset: the model never saw the output and had to guess the optimal weights based on the 18 years of history it observed. Much to our surprise, the model’s predictions were quite close.
Optimal Regional Equity Allocation Weights
Markowitz (top clip) versus Deep Neural Network (bottom clip)
Let’s zoom in on the first out-of-sample year, 2013. The Markowitz optimization recommended an allocation of about 80% in the U.S. and 20% in Japan. The deep neural network correctly identified these two markets: its only deviation was a small allocation to Europe and Asia ex-Pacific.
Out-of-Sample Optimal Regional Equity Allocation Weights
Now let’s look at a period where the model had more difficulty predicting the correct outcome. In 2016, the deep-learning model invested almost equally in all the main regions, instead of going 100% in U.S. equities as recommended by the (correct) mean-variance optimizer.
Out-of-Sample Optimal Regional Equity Allocation Weights
Why did the deep-learning model fail? Auditing deep neural networks is a lot harder than error-testing normal code, but the context is helpful. On August 11, 2015, Chinese authorities devaluated the Yuan/USD exchange rate by 2% overnight, breaking with a decade-long policy of exchange rate targeting. On December 16, the Federal Reserve increased its target rate after having kept it at zero for seven years. To use a Talebian analogy, these two events were “black swans” for our deep neural network model: the model had only seen the white swans of a pegged CNY/USD and a zero-interest rate policy.
It was not just the model which was confused. Correlations were literally all over the map in 2016. For a brief period, the 1-month correlation between the returns of the MSCI Switzerland Index and the price of Brent oil jumped to 80%. I remember telling Swiss clients that they should look for oil fields under the Matterhorn.
With this context, the confusion of the model is quite understandable. Faced with never-seen-before market conditions, the neural network had no “nearest neighbor” to learn from. It returned to an agnostic position, equally-weighing the portfolio between the six regional constituents. That is a rather rational response to “unknown unknowns.”
We learned three important lessons from the exercise:
Deep neural networks are extraordinarily powerful. The hype is deserved, and artificial intelligence will keep getting smarter. Translation softwares used to be awful just a few years ago. Now, the voice-operated version of Google Translate can switch between all the major languages with relatively few errors. By construction, AI-based technologies will continue to improve as models are continuously refined with more training data.
As far as finance goes, artificial intelligence should progressively expand beyond the relatively simple tasks of fraud detection, compliance, and customer support. Bond ratings seem like a perfect field for deep learning techniques. The data is plentiful and easily available, the outcome is binary (has the issuer defaulted?), and the relation between the input variables and output should be fairly simple.
Predicting asset price returns is inherently harder because relations between variables are unstable, feedback loops appear, “black swans” are common. We still think that a well-trained deep neural network could provide valuable asset allocation or stock selection insights, provided it is fed with good input data.
Machine-learning techniques and any backtested strategy implicitly assume that the future will resemble the past. This can create biases when regimes change. For example, most historical simulations are heavily skewed by the 35-year long bull market for bonds. If, as I believe, yields are about to go higher for longer, much higher for much longer, the predictive power of machine-learning algorithms may disappoint, even though the underlying technology will continue to improve.