Reading SEC filings using LLMs

I have been working on getting ChatGPT to answer questions that equity research analysts, investors would like to get from SEC filings. The application uses a combination of hybrid text search and LLMs for completion and does not rely much on embedding based distance searches.

A core assumption underlying this is that LLMs are already pretty good and will continue to get better at reading texts. If provided with the right thing to read, they will do very well on 'reading comprehension'.

Open ended writing is more susceptible to errors, especially in questions related to finance. For e.g google's revenues are just as likely to be 280.2 billion vs 279 billion in a probabilistic model that guesses the next part of the sentence - Google's revenues for FY 2022 are ....

So this leaves us with the main problem to solve; Serving the right texts to the LLM aka text search.

Once the right text is served, we can generate any pretty much anything in the text, Income statements, ceo comments, accounts payable on the fly, For e.g try - `can you get me Nvidia and AMD's income statement from March 2020 ?` as in here. https://imgur.com/gallery/H8Vfd5X A few more examples, Apple's sales in China, Google's revenue by quarter: https://imgur.com/a/oCCay3o

Currently, the application supports ~8k companies that are registered with the SEC. Pdfs are still work in progress, so tesla etc don't work as well.

The stack is Nextjs on Supabase. So Postgres's inbuilt text search does a lot of heavy lifting.

If one thinks of the bigger picture, we can extend/improve this to pdfs and the entire universe of stocks and more. a.k.a a big component of what CapitalIQ, Factset, Bloomberg and Reuters do can now be generated on the fly accurately for a fraction of the cost.

Generating graphs with gross margin increasing etc are just one step further and stuff like EV/Ebitda, yet another step further, as one can call a stock pricing api for each date of the report.

I would guess a number of LLM applications follow a similar process, ask a question --> LLM converts to query --> datalakes/bases --> searching and serving texts --> answer. Goes without saying, I would appreciate any feedback, especially from those who are building stuff that looks architecturally similar :) !

anotherhue · 2 years ago

I've been trying something similar with parliamentary debates. They're long winded, often full of empty speech, and a chore to read.

The LLMs are able to hone in on the details and provide interesting responses like "What questions were asked of the minister that they failed to address" and "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer"

Crucially, they're well transcribed: https://api.oireachtas.ie/

nnechm · 2 years ago

I think one thing you can try is to figure out what lies where, so chunking arbitrarily will not work as well as chunking with headings for e.g

For e.g, if the question is: "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer."

An embedding based search will find it fairly difficult to match this against a text. Based on my experience, you have to figure out what is a nonanswer first (i don't think that's easy but gpt4 is very good at a lot of language based stuff.)

you can try Q1: question A1: Answer then prompt GPT, do you think A1 answers the question and then save it. And then, Q1: question, A1: Answer, Q2: Follow up based on the following questions, do you think A1 answers Q1 and then save it to a db.

You can then augment it in the code with our own knowledge of how politicians lie, using certain words etc :) to improve what gpt4 might miss...

dcsan · 2 years ago

Could you ask an LLM if they were persuaded by a speaker of one side of a debate as a method of evaluation? Ie the bot's before and after opinion based on fine-tuning with the pro and con arguments?

I was also thinking about a society of bots type application where you could have autonomous bot researchers, debaters, judges and audience. Would be interesting to feed in the topics and grab some popcorn

hef19898 · 2 years ago

I once had a collegue who used ChatGPT to sumarize our then employer's SEC filings. Results were, well, putting it mildly, mixed. Best case was a slightly less biased version of the "shareholder letters" (read: propaganda pieces) published around the same time.

What ChatGPT completey missed was stuff like omissions (I'll kind of give it a pass here, how can software analyse the absence of something without having access to supplemental documents; it still shoes hoe dangerous it is to rely only on a LLM for such analysis) and, more importantly, the connection between certain tid bits and, and here it became outright dangerous, ChatGPT didn't provide anything meaningful on risks and financials.

The tid bit it missed, one of the most important ones at the time, was a huge multi year contract given to a large investor in said company. To find it, including the honestly hilarious amount, one had to connect the disclosure of not specified contract to a named investor, the specifics of said contract (not mentioning the investor by name), the amount stated in some finacial statement from the document and, here obviously ChatGPT failed completely, knowledge of what said investor (a pretty (in)-famous company) specialized in. ChatGPT did even mention a single of those data points. Fun fact, said contract covered a significant junk of the amount the investing company had invested to begin with. And all that during a time in which the financial stability of the reporting company was at least questionable. Oh, and ChatGPT didn't even realize that risk (cash and equivilants on hand devided by burn rate per year is simple maths), or repeat the exact passage in which the SEC filling said that the survival of the reporting was in doubt.

In short, without some serious promp working, and including addditional data sources, I think ChatGPT is utterly useless in analyzing SEC filings, even worse it can be outright misleading. Not that SEC filings are increadibly hard to read, some basic financial knowledge and someone pointing out the highlights, based on a basic unserstanding of how those filongs actually work are supossed to work, and you are there.

nnechm · 2 years ago

I do not really imagine this as something that does all the investment work and makes a decision.

Instead, the mental model is you have an army of people who can read texts really well, as in 'reading comprehension' as they call it in english tests... This army can get you information on the fly.

Investment research involves a lot of back and forth reading and fetching tables and making conclusions, which in turn might not have much to do with stock price performance, but there's a whole industry of financial information and news for that :)

So currently, the scope is to make widely available information beyond what FactSet and CapIQ offer and even that's a long way away :)

thisisit · 2 years ago

I think the challenge with using ChatGPT to summarize or read factual data lies in the probabilistic nature of LLM outputs. So, your experience is what you should expect from LLMs. Though my understanding of OPs answer is that instead of using OpenAI to read documents directly they use OpenAI to generate queries to read the document instead.

pbhjpbhj · 2 years ago

It seems like a system incorporating an LLM would be good at parsing a document to match all investors mentioned with the amounts invested and spot any major differences. That a generalized tool can't do that out of the gate doesn't seem surprising.

blawson · 2 years ago

I tried this as a little hobby weekend project but found that after a while it would start hallucinating answers even if previously it had gotten them right. It didn’t even take that long sometimes, where I’d ask a question about revenue, then liabilities, and them to sum some revenue numbers and they would just start to be wrong.

I wouldn’t yet feel comfortable with this without some automated reconciliation which to my mind defeated the point of my hobby project but I’m curious if you’ve seen different? No doubt you’d expect this to improve over time though.

nnechm · 2 years ago

You can try it out for yourself... :) Here's an example, that asks for AMD's cash and makes an arbitrary calculation on total liabilities, the ai is smart enough to sum up everything until equity and gets the numbers right, without any hallucination.

https://imgur.com/a/oAUZiIB

This is the source: https://www.sec.gov/Archives/edgar/data/2488/000000248823000...

The sum of all of these is 12831.

Total Current Liabilities 7572 Long-term debt, net of current portion 1714 Long-term operating lease liabilities 393 Deferred tax liabilities 1365 Other long-term liabilities 1787 12831

drited · 2 years ago

One of the advantages of data from CapIQ /Refinitiv is that you're not just pulling data from a single report but rather data has been curated across time from multiple historical reports so that historical income statements, balance sheets, footnote data etc spanning many years can be generated.

When you say that generating graphs of gross margin, EV/Ebitda is just one step further, are you talking about generating those based in just a single report's information or are you combining information from multiple years to for example show gross margin trends over 10 years and EV/TTM Ebitda?

nnechm · 2 years ago

I am talking about comparing multiple reports, i.e gross margin trends over 10 years and EV/TTM Ebitda etc across several reports. Currently only financials are possible, but the ratios depend on stock prices, so we are working on that !

You can think of it like this, you now have an army of readers that can go through tables really quickly.

FactSet, CapIQ etc use a combination of automation/manual entry and fit these tables into a homogenized schema so that they can be saved, compared etc. So if you want to get Apple's Greater China sales from 2020, you would be lucky if they decided to create an item for that. https://imgur.com/a/bp2hb7n

Here are two examples, AMD's revenues and AMD's revenue outlook using beatandraise.com https://imgur.com/a/61jqiUk I doubt you can get AMD's own outlook on CapIQ for e.g

Context sizes mean getting 100s of reports on one call is not possible, but multiple iterations will still do the trick. So in effect, you can actually create a dataset like FactSet for a lot lower cost, more comprehensive and can be customized to what the user wants, if you see my point... :)

ZephyrBlu · 2 years ago

Looks cool and seems like a valuable tool! I really like the idea of LLMs that give you rich answers like this.

I'm curious how you're accurately extracting the data though. Are you prompting to respond in a JSON format, using OpenAI's functions or something else? How do you ensure you have the correct label, dates, values, etc?

nnechm · 2 years ago

I am using openai's functions as that is a more reliable form of extracting json, but even that can fail sometimes as the response misses a "'" or a }

lifeisstillgood · 2 years ago

So (if I get it right) you are using the LLM to convert the human language question into (presumably) code that is basis for running "normal" searches and returning text.

Avoiding any fears of hallucinations.

But earlier you say "LLMs will get better at comprehension". So are you using an LLM to markup the original text in some way ?

nnechm · 2 years ago

Yes, that's right, I avoid hallucinations that way. In addition, I mark up existing texts so that the LLM knows what it is reading even if it is a piece of a larger text, so for e.g if you need to get Apple's Greater China sales for each quarter in 2020... https://imgur.com/a/oCCay3o

I am not making 4 calls with the entire text, I instead get pieces of each which would best match the question. There are additional challenges, for instance, GPT4 struggles to know the difference between the words guidance and outlook, which mean the same thing but somehow they don't for GPT4.

When I say they become better readers, I meant it in a general sense as in better in the case above. You basically have someone who can read through tables really well, and that can change investment research fundamentally, which is a lot of reading tables and graphs :)

nnechm · 2 years ago

A few things on prompting: 1. Get me google cloud revenues fails, because somehow gpt4 thinks i am talking about an entity called google cloud and not google :) 2. So in order to fix it, you can either ask for Get me google's cloud revenues or get me google cloud revenues from google's results...

As you can see, inspite of all the training in the real world, gpt4 thinks google cloud is more of an entity than google is, based on that question :)

Terretta · 2 years ago

This product would be a fantastic honeypot for front running researcher interest in particular investments.

I don't see anything in your privacy policy that gives me comfort my (even anonymized) interest in a particular firm isn't feeding your own signals.

Even if your policy promised it, I'd want to see technical controls implemented, since incentive to leverage information on searching would be so high.

nnechm · 2 years ago

Hi, Thanks for bringing that concern up. I shall keep it in mind and change it based on feedback from customers if it is an issue. Typically, people search for a company after a price move rather than before. And these are searches on publicly available data, ie data that is not proprietary and already filed with the sec. Needless to say, none of the chat traffic is used or will ever be used to feed any trading signals for anyone.

paulkon · 2 years ago

Could you share more about the prompt you use to get GPT-4 to return a query from the user's question? This is fed into postgres full text search?

nnechm · 2 years ago

I am using openai's latest functions api, so you can get it to return arguments that will ensure that you get a json, it works pretty well most of the time. The json would then be used to fetch a report from a database.

cerved · 2 years ago

Neat idea!

How do you parse the PDFs?

petarb · 2 years ago

Apache Tika has worked well for me in the past, ended up running it on an AWS Lambda

https://tika.apache.org/

nnechm · 2 years ago

Pdf parsing was more tedious that I would have liked at this stage so I stuck to the SEC which requires that companies file in a text format :) so that helped.

I used poppler on a digital ocean droplet, but the sheer variety of company pdfs especially european companies, some of which have to be OCRed, meant results were not really uniform. GPT still does very well, but not as well as on text documents directly. So in short, this is next on the list...

yawnxyz · 2 years ago

I have a tool that applies LLMs to abstracts and research papers — @opendocsg/pdf2md on Node / Sveltekit has been really good for me

I may be missing something, but looking at the examples in the website, isn't this just a fancy worded search?

What kind of data can be extracted from these filings, that is not available using "traditional" methods?

nnechm · 2 years ago

That's a good question, for e.g if you want to get Microsoft's revenues by product category or apple's revenues in greater China. https://imgur.com/a/LdQkt7j

You can also do this across time with constraints on context limits ...

What is a traditional method, would you search the string or would you find the named entity (using NLP) and look for the entity ? or do you mean a ctrl +F in the document ?

I guess the premise of LLMs, is that we have an intelligence that can read and write, so it can do this in an automated way in different applications. In this case, I am trying to automate and speed up the investment research process, along the way, we can create our own dataset of financial data that can be generated on the fly as necessary.

Also, how would you extract an income statement from the text, also in the same img using a traditional method ? I personally find it magical it can do that, it knows where it ends and so on...

halflings · 2 years ago

Yes, this looks like a (hopefully) better information retrieval system than running CTRL+F over a PDF.

Nothing wrong with that :) information retrieval is hard. And CTRL+F won't render a table for queries like "what is the % increase of revenue over the last 3 quarter"-type queries.

nnechm · 2 years ago

Yes, information retrieval is hard :) A lot of people ask for 'can you get Apple's revenues by product category for all of 2020', how do you get the smallest piece of text that has the most information about product category and revenues ?I can get a lot of text, but that would mean I probably run over the context limit and so on :)