Nasdaq Acquires Quandl to Advance the Use of Alternative Data

It's crazy how poor the financial data provider offerings out there are. Most financial data is riddled with inconsistencies, wildly overpriced, and in esoteric formats. Simply ingesting financial data in a reliable manner requires significant engineering.

For something so important to the economy, it's amazing that there isn't a better solution, or that an open standard hasn't been mandated.

claytonjy · 7 years ago

I feel this; I email Quandl regularly to fix data errors that the simplest of automated checks should catch ("why is this price 1200% higher than the previous one?").

But, they do have a mostly-decent API (tables; timeseries is pretty bad).

Something that always bugs me is properly adjusting prices when backtesting. The "right" way seems to be how Quantopian now handles it [1], in a just-in-time fashion, but that code isn't in their public libraries, and over email they declined to tell me where they get the data.

[1] https://www.quantopian.com/quantopian2/adjustments

anonu · 7 years ago

Always store unadjusted prices and volumes

Keep an updated corp action table with date, corp action type, and adjustment factor.

Corp action type is important because divs adjust prices but not volumes, for example. Splits adjust both.

When you're ready to use an adjusted time-series: select the corp actions you care about and calculate a running product of 1+the adjustment factor. As-of join the adjustment factors, multiply and you're done.

claytonjy · 7 years ago

I stand corrected; the code for on-the-fly adjusting _is_ in Zipline, but you have to know that stock splits are treated like dividends, which wasn't obvious to me.

At least some of data they use comes from a vendor they aren't able to name publicly.

erichurkman · 7 years ago

For my current job, we wanted to get a mapping of stock tickers and exchanges to CUSIPs. Every provider we looked at — and this is fundamental trade data — was full of errors and missing values. Couple that with the extortion that is CUSIP (you can't use CUSIP values without a license from them, and licenses start at $xx,xxx+). It's criminally inept. And when you do fix it up, you don't want to publish it, because you spent all your time and resources fixing it… and it becomes a trade secret.

mlthoughts2018 · 7 years ago

This is why finance is lucrative, similar to esoteric codes in various types of law. Nothing to do with math models or superior prediction, just paying for someone else to fight through identifier hell, exchange protocol hell, etc., and be able to do some mickey mouse math at the end of it.

Honestly, this stuff is so bad that the headache of it might fully justify huge finance compensation, and I’ve had colleagues who turned down huge bonuses and raises to leave finance companies solely to avoid this type of stuff and seek a career where the headaches bother them less and they are paid less.

pmart123 · 7 years ago

Did you look at Factset's datafeed? I've found its reference data and symbology to be pretty reliable. Cusips will cost a lot with redistribution charges though. You're better off avoiding them if possible.

JaggerFoo · 7 years ago

I agree, CUSIP is also a problem for the privateer (meaning all data needs to be free to use). While I have found a way to find a mapping online, I have no idea of the accuracy and have to trust that the unaware provider QA's the data.

rla3rd · 7 years ago

I really would like to see something like Bloomberg's OpenFIGI take the place of cusips, but its not nearly as widely used. https://www.openfigi.com/ The api does allow you to convert from cusip though.

pmart123 · 7 years ago

Yep. OpenFIGI plus using LEI codes seems like the best practice to move forward.

fancyfish · 7 years ago

For anyone curious in esoteric formats, check out some of the documentation for financial data providers.

CRSP[1] is pretty much regarded as the highest quality pricing data in the US, with stock prices going back to 1925. The database API is written for C and FORTRAN-95.

Data providers also have a habit of providing their own proprietary security IDs, or just mapping to tickers. So if you're trying to build a database with several providers, you have to wrangle together 15 different security identifiers, taking care of mergers/acquisitions, delistings, ticker recycling, etc. It is a fun exercise.

[1]http://www.crsp.com/files/programmers-guide.pdf

newguynewguy · 7 years ago

Any advice on where an individual could purchase (even limited) access to CRSP data?

I'm working on a data-driven financial analysis blog and can't seem to find decent time-series fundamentals data now that yahoo and google have taken down their api's. Everything I find seems to be a $1000+ yearly subscription.

We're in the midst of a data gold rush. People who have data are struggling to monetize it. If you're a data buyer, you're probably swamped with the quantity and breadth of data providers out there. AI/ML techniques to make sense of this data are still only scratching the surface. I think this is where there is a lot of low-hanging fruit: creating services or tools that allow non-CS/non-Quant people to extract insights from TBs of data...

On the exchange side: these guys are always on the prowl for hot new properties to scoop up. The traditional business model of simply earning fees on exchange trading is slowly eroding away (for the last 10 years). So they need to branch out into services and other data plays...

inputcoffee · 7 years ago

Alternative take: there isn't that much low hanging fruit there.

Hear me out.

"To the person who only has a hammer, everything looks like a nail."

The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases.

I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel...

Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't.

Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction.

The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it.

heurist · 7 years ago

I've been selling good data in a particular industry for three years. In this industry at least, the so-called "low-hanging fruit" only seems low-hanging until you realize that the people who could benefit most from the data are the ones who are mentally lazy and least likely to adopt it. Data has the same problems as any other product and may even be harder because you need to 1) acquire the data and 2) build tools that solve reliably difficult problems using huge amounts of noisy information...

rademacher · 7 years ago

Isn't there utility in accepting the null hypothesis? It's almost as valuable to know that there is no signal in the data as there is in the opposite, i.e., knowing where not to look for information.

I think your example is really justifying a "machine learner" that has some domain expertise and doesn't blindly apply algorithms to some array of numbers.

rafiki6 · 7 years ago

Bingo. You nailed it. I work in finance. Developed markets have efficient stock markets. They are highly liquid. The reality is that there's lot of people competing for the same profits. In reality when there's that many players, if there's a profit to be had from a dataset you will be buy from a vendor, chances are one of your many competitors already bought it and found it. This is why we now say don't try to beat the market, you likely can't and mostly just need to get lucky having the right holding when an unforeseen event occurs. Too many variables at play that we just don't understand. Most firms are buying these datasets to stay relevant but they really make no difference in their actual investing strategies.

rdlecler1 · 7 years ago

This is where you might use a genetic algorithm or to learn which data to use on a particular prediction. Good AI won’t use all data just trim down to signal.

claytonjy · 7 years ago

For finance in particular, I'd say we're drowning in a massive volume of shitty data.

A client of mine purchases several fundamental feeds from Quandl, and I email them regularly to point out errors. Not weird, hard, tricky errors, but stuff like "why are all these volumes missing" or "there's a 1-day closing price increase of 1200%" or "you divided when you should have multiplied". This tells me neither Quandl nor the original provider (e.g. Zacks) do any serious data validation, despite claiming to.

If the companies people have been paying for decades for this data get it wrong this often, how can I trust any weirder data they're trying to sell me? I thought the point of buying these feeds was to let the seller worry about quality assurance.

vasilipupkin · 7 years ago

this doesn't matter - any sophisticated user will have their own software to clean the data anyway. Their concern is getting the data, they know how to clean it once they have it.

pplonski86 · 7 years ago

You are right, extracting insights from data is a low-hanging fruit. From what I observe there is huge lack for proper services and tools that can automatically produce insights. There are of course automatic machine learning solutions, but they focus more on machine learning model tuning (in the kaggle style) rather that giving users understanding and awareness of the their data.

rainboiboi · 7 years ago

I think data scientists needs to produce more actionable insights as oppose to living in their own world. I suspect there will be an rising group of people who can understand data science techniques and communicate them effectively to drive business decisions. These people will be the ones who can clinch the top posts.

pmart123 · 7 years ago

Nasdaq already makes more money on data licensing than on trading fees or IPO's. Each time a professional in the financial services industry wants real-time display data, for example, they have to pay Nasdaq a monthly fee. Nasdaq and NYSE compete for listings less so now for trading fees, but because it makes its data licensing package more valuable.

momentmaker · 7 years ago

There is Ocean Protocol (https://oceanprotocol.com/) that lets you sell your data.

There is ChainLink (https://chain.link/) that lets you sell your data via API service through decentralized oracle nodes.

https://blog.goodaudience.com/the-four-biggest-use-cases-for...

Monetization is coming soon... in a big way.

cpb · 7 years ago

How do these services make it easier to evaluate data? The medium article starts with a disclaimer about DLT... Talking with investors buying data, one shouldn't be surprised to hear them request uploads to their FTP. Their data teams are overcommitted when it comes to the evaluation side of consuming data. They aren't (yet) resourced like a tech startup.

How should they prioritize learning about ingesting data from a DLT? They have data brokers (like Quandl) coming to them with assurances of frictionless integration, with data they can understand and use, today!

abakker · 7 years ago

I'm calling it here: the most useful data is private, or, can't be sold due to confidentiality. The fact that data is confidential is great evidence that we know it is useful, but also that we hope others won't use that signal.

hobofan · 7 years ago

We also pitched something in that direction with Rlay at Techcrunch Disrupt Berlin: https://techcrunch.com/2018/11/29/rlay-startup-battlefield/

Deleted Comment

Immortalin · 7 years ago

Fwiw that's what I have been trying to do for the past few years, infrastructure for easier access to algorithmic trading.

Shameless plug: https://KloudTrader.com/narwhal

peterbraden · 7 years ago

monkeydust · 7 years ago

Spoke to these guys a while back. Asked for examples of real alternative data they had...one intertesting was flight data for private jets labeled against which company owned them. Theory being if ceo of company x keeps visiting a place near company y there may be an acquisition or merger in play.

projectramo · 7 years ago

I wonder if anyone did/could use it to buy real estate before HQ2 was announced. I don't know if the person in charge of finding the real estate was high enough to fly private.

stevewodil · 7 years ago

Scott Galloway called the HQ2 locations back when it was first announced that Amazon was looking for one.

Also I remember hearing about some Amazon employees buying real estate once the terms were being finalized...

ajay-d · 7 years ago

posted this before: https://www.bloomberg.com/news/articles/2017-01-27/hedge-fun...

jdironman · 7 years ago

Out of curiosity, at what point could things be considered insider trading / insider data?

throwawaymath · 7 years ago

Speaking from experience: it's not illegal insider trading unless you violate a confidentiality agreement or fiduciary duty.

I specifically say "illegal insider trading" because insider trading is not intrinsically illegal. The SEC distinguishes between insider trading and illegal insider trading (and by extension, so does the compliance department of every investment firm, bank and hedge fund). If, through your own research, you discover information which is both material and nonpublic, and you proceed to trade on that information, you are insider trading. However it is not illegal unless you have thereby broken an agreement or duty (namely with the company itself, its affiliates or your own clients) at any part of the process.

In practice this usually means the information is tainted if any of the following is true:

1) you have a fiduciary duty to the shareholders of the company in question,

2) you are employed by, or contract for, the company in question,

3) you are employed by, or contract for, an affiliate of the company (such as a vendor),

4) you disobeyed terms and conditions of service related to use of the product related to the information you found.

Obviously the standard disclaimers apply: I am not a lawyer, don't take potential legal advice from a random HN commenter, etc.

Source: I used to work in financial forecasting using alternative data.

samsonradu · 7 years ago

Here is an interesting piece from yesterday about insider trading by Matt Levine: https://www.bloomberg.com/opinion/articles/2018-12-03/inside...

In America, this kind of research in explicitly encouraged, and very much NOT insider trading according to the SEC. Insider trading has to involved _theft_, not just insider knowledge.

If you overhear someone talking about an impending acquisition in a coffee shop, and you trade on that information, you're quite safe in the US. European countries can and do consider that insider trading, though.

rb808 · 7 years ago

This article is just for you!

Is Spying on Corporate Jets Insider Trading? https://www.cnbc.com/id/100272132

kyleblarson · 7 years ago

Gecko's going to buy Anacott Steel!

Maven911 · 7 years ago

I've been researching this topic for some time now, alternative data, and not surprised since Nasdaq is a large provider of software (e.g. market-making sw amongst dozens of other sw):

QUANDL SPECIFIC: -Quandl has a pretty decent blog that I would check out, you never know what new large corporate policy enacted might get rid of it: https://blog.quandl.com/

GENERAL NOTES:

-More and more asset managers are using it and there is some worry that everyone is making the same conclusions off the same data set, and thus no money to be made. Though most practitioners say this is a none-issue, there is more and more alt. data sets out there to chose from, cleaning the data is tricky and testing the veracity of the data provided and knowing how to combine it with others sets is a key competitive advantage that not every asset manager is good at.

-The ROI is something that is top of mind but not always easily attributable throughout the year, e.g. one large insight very late in the financial year can bring +100x returns on what was paid for a data provider's software.

-Hugely successful funds like Renaissance's Medallion has likely been doing this for a long long time, coupled with top PhDs looking for a lot of statistical correlation with traditional data as well.

-More and more data sets that are being created and thrown into a self-learning financial model (aka AI) have a lot of people excited, and certainly there are a lot of small funds being created, though seems to be mostly by young people or not-so-great hedge fund managers. Getting large investors to lay down significant capital has a huge trust component to it, aka want to bet only on succesful grey-haired largely-male dominated folks -A lot of alternative data can be found directly from the Bloomberg terminal e.g. MAPS <Go> function. However my understanding is that it's not that deep, quality is an issue, and everyone has access to it (no real competitive advantage).

epapsiou · 7 years ago

Any idea as to its valuation?

minimaxir · 7 years ago

Given that its last raise was $15M (Canadian) in 2016 (https://www.crunchbase.com/organization/quandl#section-fundi...), and I haven't heard anything about Quandl since then, I'm guessing it's not a 10x exit.

That's probably fair. Quandl started by offering the "everyday" investor API access. I know the typical VC approach is to first get users and then scale, but often in investing/financial data products, it seems better to price high and then move down market. If you study the companies with the most success in the past (Bloomberg, CapIQ, MSCI, Eze, Advent, Factset, Morningstar, etc.), none of them started by trying to cater to the DIY investor.

kaybe · 7 years ago

What is 'alternative data'? The text only says

> 'The company offers a global database of alternative, financial and public data, including information on capital markets, energy, shipping, healthcare, education, demography, economics and society.'

which doesn't really answer the question.

barbecue_sauce · 7 years ago

The way I think of alternative data is data about a business or industry that is obtained, collated, and analyzed through non-traditional communication channels, and helps to provide a better picture of how a company or industry is doing than just relying on trade data and financial statements. The best example of this I can think of is companies scraping AliBaba at certain frequencies, trying to ascertain the movement of certain products or raw materials. This data is then sold to investment firms and hedge funds, because they feel it gives them an edge.

One company that operates in this space is YipitData. From what I've been told, they started as something similar to GroupOn but then pivoted to this space after scraping for their own competitive intelligence reasons.

One example could be combining credit card data and location data to try and infer is bad weather affected same-store sales. Another use could be determining if a company was emailing loss-leading discount promos at the quarter to juice its sales growth. Another could be collecting Tesla VINs to see if it is hitting its target productions. In the last case, Bloomberg has made this available for free:

https://www.bloomberg.com/graphics/2018-tesla-tracker/

Desustorm · 7 years ago

Alternative data is non-financial data which can be tied to various securities.

Financial data, for example, would be EUR USD spot prices. Non-financial data (i.e. "alternative data") could be healthcare reports which you could theoretically couple to e.g. pharma stocks.

bostik · 7 years ago

There are quite a few, and I can think of these off the top of my head:

- Real-time weather data from major ports and across the main shipping lines

- Telemetry from crop and soil report systems

- Up-to-date satellite imagery of basically anything large under construction (solar farms, factories, ...)

Provide information like that in a machine-readable, consistent format and you have a business.

Btw... Using satellite images to track car manufacturers' inventory levels is an old idea, used for more than a decade.

garysahota93 · 7 years ago

I had no idea Nasdaq did acquisitions as well. Maybe that's just the engineer in me..