“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay.” – Sherlock Holmes.
Arthur Conan Doyle’s famed detective may have been tackling a crime mystery in “The Adventure of the Copper Beeches” when he proclaimed the above, but today his drive to amass reams of data would be as at home in Canary Wharf as it once was in 221B Baker Street.
While the concept of ‘big’ data (datasets so large or complex that traditional methods cannot process them) has become a familiar one, the investment world’s efforts to identify innovative and exotic sources of returns have given rise to a new theme – ‘alternative’ data.
Broadly speaking, anything that falls outside typical sources of investment information, such as financial reports and economic indices, can be classified as alternative data. This can range from now well-known indicators that attempt to glean market sentiment from Twitter babble or Google searches, to sophisticated algorithms that automatically listen in and analyse earnings calls, or even scrutinise high-resolution satellite imagery of retailers’ parking lots.
As the number of alternative data sources grow, so do the number of companies promising to provide new and profitable information. Start-ups like RavenPack, which provides a variety of alternative data services, have grown in prominence, while established data providers, including Bloomberg and Thomson Reuters, have developed their own proprietary sentiment data offerings.
Even companies from outside the industry are getting in on the act, with some discovering that they are sitting on a trove of user data that can be shared, at a price of course. The social networking app Foursquare for example now enables investors to analyse where users have been ‘checking-in’; potentially uncovering trends in business activity. Paired with the growth of the so called ‘Internet-of-Things’, through which Silicon Valley envisages the complete inter-networking of our lives, investors willing to pay for the privilege could soon have formidable access to information relating to every aspect of our lives.
Bringing together these myriad sources of alternative data are new data aggregators, which are attempting to disrupt an industry dominated by big players such as Bloomberg. One such company is Quandl, a data provider which has amassed over 200,000 users since it was founded in 2011 and has now received over $20 million in venture capital funding.
Although the growth in alternative vendors is providing investors with cheaper access to important data, the choice presents a headache; much like media-streaming companies Netflix or Amazon Prime, each data provider promises exclusive access to certain content, leaving subscribers with the choice of either missing out on potentially valuable information, or paying for a growing set of data providers.
“Big data is not about the data” – Gary King, Harvard University
Sourcing alternative data is one thing, turning it into effective investment signals is quite another. In some cases, the sheer scale of the information can be too much for traditional desktop software. Step forward innovative data storage solutions from the likes of Apache Hadoop and Microsoft’s Azure cloud computing offering.
The need to decipher the cacophony of modern data also brings in the other major tech theme of the day; artificial intelligence (AI). Specifically, machine learning, the subfield of AI that specialises in autonomously learning complex relationships in data, can be required to analyse alternative data in order to produce investment signals. This is an area of active research within Aberdeen.
For several years we have partnered with the Scottish Financial Risk Academy to provide industrial placements to Quantitative Finance MSc students undertaking their dissertations. Last year we worked with students from the University of Edinburgh and Heriot-Watt University to investigate whether using alternative data sources could enhance investment decisions.
"The study demonstrated that using this alternative data enhanced the performance of various investment strategies"
For this project we used two different data sources, which automatically read through thousands of news articles from around the world (‘data-scraping’ in technical jargon), applying the machine learning field of natural language processing to discern whether a particular article has favourable or unfavourable sentiment towards certain asset classes. The study demonstrated that using this alternative data, which gives investors a real time insight into the ebb and flow of public sentiment, enhanced the performance of various investment strategies forecasting regional equity markets and trading crude oil contracts.
While providers scramble to sell alternative data, some investors may remain sceptical; the bar is high to prove that expensive new data sources provide sufficient extra information and alpha over traditional analyses to justify their price. The proliferation of alternative data sources also raises questions around market efficiency, as well as whether, as the information becomes more easily accessed, any market advantage will be eroded away.
The growing debate around data privacy could also pose a challenge to those wishing to monetise their customers’ activity. Some consumers may be more wary of smartphone apps’ terms and conditions when they realise their location data is being sold on to Wall Street. Nevertheless, financial analysts now have unprecedented access to an abundance of sophisticated information, the key question is whether we will drown in a sea of bytes, or we will manage to find the signal in the noise.
Image credit: Hazel Stevenson / Alamy Stock Photo