Readit News logoReadit News
nnechm · 2 years ago
I have been working on getting ChatGPT to answer questions that equity research analysts, investors would like to get from SEC filings. The application uses a combination of hybrid text search and LLMs for completion and does not rely much on embedding based distance searches.

A core assumption underlying this is that LLMs are already pretty good and will continue to get better at reading texts. If provided with the right thing to read, they will do very well on 'reading comprehension'.

Open ended writing is more susceptible to errors, especially in questions related to finance. For e.g google's revenues are just as likely to be 280.2 billion vs 279 billion in a probabilistic model that guesses the next part of the sentence - Google's revenues for FY 2022 are ....

So this leaves us with the main problem to solve; Serving the right texts to the LLM aka text search.

Once the right text is served, we can generate any pretty much anything in the text, Income statements, ceo comments, accounts payable on the fly, For e.g try - `can you get me Nvidia and AMD's income statement from March 2020 ?` as in here. https://imgur.com/gallery/H8Vfd5X A few more examples, Apple's sales in China, Google's revenue by quarter: https://imgur.com/a/oCCay3o

Currently, the application supports ~8k companies that are registered with the SEC. Pdfs are still work in progress, so tesla etc don't work as well.

The stack is Nextjs on Supabase. So Postgres's inbuilt text search does a lot of heavy lifting.

If one thinks of the bigger picture, we can extend/improve this to pdfs and the entire universe of stocks and more. a.k.a a big component of what CapitalIQ, Factset, Bloomberg and Reuters do can now be generated on the fly accurately for a fraction of the cost.

Generating graphs with gross margin increasing etc are just one step further and stuff like EV/Ebitda, yet another step further, as one can call a stock pricing api for each date of the report.

I would guess a number of LLM applications follow a similar process, ask a question --> LLM converts to query --> datalakes/bases --> searching and serving texts --> answer. Goes without saying, I would appreciate any feedback, especially from those who are building stuff that looks architecturally similar :) !

anotherhue · 2 years ago
I've been trying something similar with parliamentary debates. They're long winded, often full of empty speech, and a chore to read.

The LLMs are able to hone in on the details and provide interesting responses like "What questions were asked of the minister that they failed to address" and "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer"

Crucially, they're well transcribed: https://api.oireachtas.ie/

nnechm · 2 years ago
I think one thing you can try is to figure out what lies where, so chunking arbitrarily will not work as well as chunking with headings for e.g

For e.g, if the question is: "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer."

An embedding based search will find it fairly difficult to match this against a text. Based on my experience, you have to figure out what is a nonanswer first (i don't think that's easy but gpt4 is very good at a lot of language based stuff.)

you can try Q1: question A1: Answer then prompt GPT, do you think A1 answers the question and then save it. And then, Q1: question, A1: Answer, Q2: Follow up based on the following questions, do you think A1 answers Q1 and then save it to a db.

You can then augment it in the code with our own knowledge of how politicians lie, using certain words etc :) to improve what gpt4 might miss...

dcsan · 2 years ago
Could you ask an LLM if they were persuaded by a speaker of one side of a debate as a method of evaluation? Ie the bot's before and after opinion based on fine-tuning with the pro and con arguments?

I was also thinking about a society of bots type application where you could have autonomous bot researchers, debaters, judges and audience. Would be interesting to feed in the topics and grab some popcorn

hef19898 · 2 years ago
I once had a collegue who used ChatGPT to sumarize our then employer's SEC filings. Results were, well, putting it mildly, mixed. Best case was a slightly less biased version of the "shareholder letters" (read: propaganda pieces) published around the same time.

What ChatGPT completey missed was stuff like omissions (I'll kind of give it a pass here, how can software analyse the absence of something without having access to supplemental documents; it still shoes hoe dangerous it is to rely only on a LLM for such analysis) and, more importantly, the connection between certain tid bits and, and here it became outright dangerous, ChatGPT didn't provide anything meaningful on risks and financials.

The tid bit it missed, one of the most important ones at the time, was a huge multi year contract given to a large investor in said company. To find it, including the honestly hilarious amount, one had to connect the disclosure of not specified contract to a named investor, the specifics of said contract (not mentioning the investor by name), the amount stated in some finacial statement from the document and, here obviously ChatGPT failed completely, knowledge of what said investor (a pretty (in)-famous company) specialized in. ChatGPT did even mention a single of those data points. Fun fact, said contract covered a significant junk of the amount the investing company had invested to begin with. And all that during a time in which the financial stability of the reporting company was at least questionable. Oh, and ChatGPT didn't even realize that risk (cash and equivilants on hand devided by burn rate per year is simple maths), or repeat the exact passage in which the SEC filling said that the survival of the reporting was in doubt.

In short, without some serious promp working, and including addditional data sources, I think ChatGPT is utterly useless in analyzing SEC filings, even worse it can be outright misleading. Not that SEC filings are increadibly hard to read, some basic financial knowledge and someone pointing out the highlights, based on a basic unserstanding of how those filongs actually work are supossed to work, and you are there.

nnechm · 2 years ago
I do not really imagine this as something that does all the investment work and makes a decision.

Instead, the mental model is you have an army of people who can read texts really well, as in 'reading comprehension' as they call it in english tests... This army can get you information on the fly.

Investment research involves a lot of back and forth reading and fetching tables and making conclusions, which in turn might not have much to do with stock price performance, but there's a whole industry of financial information and news for that :)

So currently, the scope is to make widely available information beyond what FactSet and CapIQ offer and even that's a long way away :)

thisisit · 2 years ago
I think the challenge with using ChatGPT to summarize or read factual data lies in the probabilistic nature of LLM outputs. So, your experience is what you should expect from LLMs. Though my understanding of OPs answer is that instead of using OpenAI to read documents directly they use OpenAI to generate queries to read the document instead.
pbhjpbhj · 2 years ago
It seems like a system incorporating an LLM would be good at parsing a document to match all investors mentioned with the amounts invested and spot any major differences. That a generalized tool can't do that out of the gate doesn't seem surprising.
blawson · 2 years ago
I tried this as a little hobby weekend project but found that after a while it would start hallucinating answers even if previously it had gotten them right. It didn’t even take that long sometimes, where I’d ask a question about revenue, then liabilities, and them to sum some revenue numbers and they would just start to be wrong.

I wouldn’t yet feel comfortable with this without some automated reconciliation which to my mind defeated the point of my hobby project but I’m curious if you’ve seen different? No doubt you’d expect this to improve over time though.

nnechm · 2 years ago
You can try it out for yourself... :) Here's an example, that asks for AMD's cash and makes an arbitrary calculation on total liabilities, the ai is smart enough to sum up everything until equity and gets the numbers right, without any hallucination.

https://imgur.com/a/oAUZiIB

This is the source: https://www.sec.gov/Archives/edgar/data/2488/000000248823000...

The sum of all of these is 12831.

Total Current Liabilities 7572 Long-term debt, net of current portion 1714 Long-term operating lease liabilities 393 Deferred tax liabilities 1365 Other long-term liabilities 1787 12831

drited · 2 years ago
One of the advantages of data from CapIQ /Refinitiv is that you're not just pulling data from a single report but rather data has been curated across time from multiple historical reports so that historical income statements, balance sheets, footnote data etc spanning many years can be generated.

When you say that generating graphs of gross margin, EV/Ebitda is just one step further, are you talking about generating those based in just a single report's information or are you combining information from multiple years to for example show gross margin trends over 10 years and EV/TTM Ebitda?

nnechm · 2 years ago
I am talking about comparing multiple reports, i.e gross margin trends over 10 years and EV/TTM Ebitda etc across several reports. Currently only financials are possible, but the ratios depend on stock prices, so we are working on that !

You can think of it like this, you now have an army of readers that can go through tables really quickly.

FactSet, CapIQ etc use a combination of automation/manual entry and fit these tables into a homogenized schema so that they can be saved, compared etc. So if you want to get Apple's Greater China sales from 2020, you would be lucky if they decided to create an item for that. https://imgur.com/a/bp2hb7n

Here are two examples, AMD's revenues and AMD's revenue outlook using beatandraise.com https://imgur.com/a/61jqiUk I doubt you can get AMD's own outlook on CapIQ for e.g

Context sizes mean getting 100s of reports on one call is not possible, but multiple iterations will still do the trick. So in effect, you can actually create a dataset like FactSet for a lot lower cost, more comprehensive and can be customized to what the user wants, if you see my point... :)

ZephyrBlu · 2 years ago
Looks cool and seems like a valuable tool! I really like the idea of LLMs that give you rich answers like this.

I'm curious how you're accurately extracting the data though. Are you prompting to respond in a JSON format, using OpenAI's functions or something else? How do you ensure you have the correct label, dates, values, etc?

nnechm · 2 years ago
I am using openai's functions as that is a more reliable form of extracting json, but even that can fail sometimes as the response misses a "'" or a }
lifeisstillgood · 2 years ago
So (if I get it right) you are using the LLM to convert the human language question into (presumably) code that is basis for running "normal" searches and returning text.

Avoiding any fears of hallucinations.

But earlier you say "LLMs will get better at comprehension". So are you using an LLM to markup the original text in some way ?

nnechm · 2 years ago
Yes, that's right, I avoid hallucinations that way. In addition, I mark up existing texts so that the LLM knows what it is reading even if it is a piece of a larger text, so for e.g if you need to get Apple's Greater China sales for each quarter in 2020... https://imgur.com/a/oCCay3o

I am not making 4 calls with the entire text, I instead get pieces of each which would best match the question. There are additional challenges, for instance, GPT4 struggles to know the difference between the words guidance and outlook, which mean the same thing but somehow they don't for GPT4.

When I say they become better readers, I meant it in a general sense as in better in the case above. You basically have someone who can read through tables really well, and that can change investment research fundamentally, which is a lot of reading tables and graphs :)

nnechm · 2 years ago
A few things on prompting: 1. Get me google cloud revenues fails, because somehow gpt4 thinks i am talking about an entity called google cloud and not google :) 2. So in order to fix it, you can either ask for Get me google's cloud revenues or get me google cloud revenues from google's results...

As you can see, inspite of all the training in the real world, gpt4 thinks google cloud is more of an entity than google is, based on that question :)

Terretta · 2 years ago
This product would be a fantastic honeypot for front running researcher interest in particular investments.

I don't see anything in your privacy policy that gives me comfort my (even anonymized) interest in a particular firm isn't feeding your own signals.

Even if your policy promised it, I'd want to see technical controls implemented, since incentive to leverage information on searching would be so high.

nnechm · 2 years ago
Hi, Thanks for bringing that concern up. I shall keep it in mind and change it based on feedback from customers if it is an issue. Typically, people search for a company after a price move rather than before. And these are searches on publicly available data, ie data that is not proprietary and already filed with the sec. Needless to say, none of the chat traffic is used or will ever be used to feed any trading signals for anyone.
paulkon · 2 years ago
Could you share more about the prompt you use to get GPT-4 to return a query from the user's question? This is fed into postgres full text search?
nnechm · 2 years ago
I am using openai's latest functions api, so you can get it to return arguments that will ensure that you get a json, it works pretty well most of the time. The json would then be used to fetch a report from a database.
cerved · 2 years ago
Neat idea!

How do you parse the PDFs?

petarb · 2 years ago
Apache Tika has worked well for me in the past, ended up running it on an AWS Lambda

https://tika.apache.org/

nnechm · 2 years ago
Pdf parsing was more tedious that I would have liked at this stage so I stuck to the SEC which requires that companies file in a text format :) so that helped.

I used poppler on a digital ocean droplet, but the sheer variety of company pdfs especially european companies, some of which have to be OCRed, meant results were not really uniform. GPT still does very well, but not as well as on text documents directly. So in short, this is next on the list...

yawnxyz · 2 years ago
I have a tool that applies LLMs to abstracts and research papers — @opendocsg/pdf2md on Node / Sveltekit has been really good for me
jatinshah · 2 years ago
The main use case highlighted here seems to be retrieving quantitative data from financial reports and then doing sophisticated analysis (trendlines, forecasting etc) using ChatGPT (basically what’s currently done in Excel).

I doubt any finance professional would want to move from Excel to ChatGPT for these use cases.

A secondary issue is that numbers in financial reports need to be standardized before they can be used for analysis - say comparing ROE of a stock with industry average. That’s not possible with numbers from raw reports.

nnechm · 2 years ago
I don't think excel can be defeated :) I certainly do not expect financial professionals to switch to a chat interface.

When you login into capitaliq or factset or look at a bloomberg screen, you access data, you can then do the same here. Excel plugins go on top which can then get this data into excel to build models. The api that powers this app, can also send data into excel for instance.

Data can be copied already directly from the chat box into excel, maintaining the table format for e.g

Regarding standardization, I think data was standardized not to enable comparison but simply to fit into the same schema for every industry. Most companies within the same industry report the same way. REITs do not report like SaaS, but the existing datasets put it all into one set. Raw data from source is always better as you can convert it to standardized but you can't go back...

jatinshah · 2 years ago
If this product is not meant for finance professionals then who is the target customer?

Retrieving numbers via ChatGPT and then feeding it into Excel via APIs seems like a very odd thing to do. Especially when most of these numbers are already available on Yahoo finance for individual investors and CapitalIQ for professionals.

alxdrlitreev · 2 years ago
We've recently made COFIN AI — an AI based on ChatGPT that reads hundreds of pages of SEC filings such as 10Ks and 10Qs and checks investor calls to assist you with investment research and strategy evaluation. Pretty much similar (almost the same) to something you did.

Check this out: https://cofinapp.com Let me know what you think!

nnechm · 2 years ago
Thanks for letting me know, i guess you use a different strategy of querying by company and by document. I see this when I try to get Apple's rest of asia pacific revenues for e.g

I guess it really depends on how your targeted user would use it... :)

https://imgur.com/a/uEgLUpz

i_like_pie1 · 2 years ago
asked 3 questions

1st question = got the ticker wrong. asked about company A - gave info on B 2nd question = please give info on A. it says sure 3rd question = please give info on A. pops a modal asking to pay

andreev_io · 2 years ago
Shoot me a message at ilya@andlabs.co.uk. Happy to give you a free subscription.

We also just dropped our price to $1.99 per month from the previous $49.99.

Enjoy.

nnechm · 2 years ago
I am assuming this is on Cofin and not on beatandraise.com, if so, please let me know, I shall fix it. You have 20 chats etc.
hbcondo714 · 2 years ago
Great to see this and the other SEC Filing LLMs in the comments here. I run a freemium SEC Filings website at https://Last10K.com to help investors read 10K/Qs more efficiently. Earlier this year, we partnered with https://edgar-gpt.ai for summarizing 10K/Qs. The summaries[1] when clicked on, would then scroll to and display the actual verbiage in the SEC Filing.

This feature stayed on the site for a couple months but ultimately we mutually decided to remove it since the traction didn't meet the costs. I'm open to revisiting this topic all in the name of making verbose financial disclosures easier to understand for all of us.

[1] https://last10k.com/chatgpt.png

epups · 2 years ago
What about hallucinations though? If the LLM missed just one number, which at this point is very likely, you would get totally distorted information.

Sorry if I missed something, but skimming through the site I didn't find any information about this specific issue.

nnechm · 2 years ago
As the answers are from the text, we can avoid hallucinations for the most part. I have not experienced made up numbers, instead errors are numbers that are typically misplaced, where it gives you revenues for the last 6 months instead of the last 3 months, when they are both next to each other in a table and so on... or the same numbers for different quarters and so on.

From my experience, GPT4 has been very good in following instructions and doesn't make up numbers which Bard is much more susceptible to. Bard relies heavily on snippets from the web search and completes the rest...

I am pretty sure if google wanted to train it, it would get all the answers right, but the way its designed, it gets one or two numbers and makes the rest up rather hilariously :) and even adds a breakdown which is also made up..

You can take a look here : https://imgur.com/a/vDxOV9D

refurb · 2 years ago
I'm pretty sure a primitive version of this has been done for a long time by hedge funds.

You basically monitor the SEC website for when filings are made public, quickly parse the document in some automated way and make trades based on the output.

It wouldn't surprise me if some hedge fund has been trying to build something with LLMs for past few years (and possibly already deployed it).

xcxcx · 2 years ago
This was done by hand by people like Footnoted. They received SEC filings using a key red flag word filter into a shared Gmail account with special attention for filings done on Friday night or ahead of the holidays.

I think using LLMs to ask a company about their company figures is missing the point. After all, XBRL already exists, and that data is widely accessible anyways. Where LLMs would add more value is analyzing changes in different sections of a company's filings, such as Risk Factors, MD&A, or governance filings.

nnechm · 2 years ago
Yes, I think the broader point is you have these readers available on call via api, so you can do all that you suggested and more...
nnechm · 2 years ago
You are probably right, but there has been nothing like gpt4 before, having seen some versions before I would wager that their entire application can probably be redone for a fraction of the cost and it would work better... :)
infecto · 2 years ago
These already exist and are not primitive. Groups have been applying NLP and analysis against text for longer than you would think.
nnechm · 2 years ago
Yes, they are definitely not primitive, but considering how the breakthrough in LLMs happened, i.e MSFT, GOOGL were working on it, even as late as 2021, Google's BERT was not really there (at least the one they showed the public in the blog from Google research). So the sudden jump from ok to great thanks to openAI would probably mean a lot of existing systems also need to do that.
sroussey · 2 years ago
I am working on something similar. I have all the 10-k and 8-k docs.

I’ve pulled out the structured data separately, and now looking at breaking up the text into paragraphs to get embeddings.

Why are you using inverted index style text search instead of embeddings? I can see doing both perhaps…

nnechm · 2 years ago
Embeddings did not really work well enough for me, let us say you are looking for numerical text, so for e,g get me AMD's revenue from March 2022. The embedding representation needs to understand that March 2022 together is way more important than March alone, I often ended up with March 2021 or March 2018 as being closer because the text might have multiple terms containing 'revenue' or multiple 'march'... Perhaps I could have improved it, but that did not seem like the right path to go down for accuracy.. This was way worse for e.g when I am looking for ECB statements, they can refer to an older date in their current report and it caused all sorts of trouble :). An initial fix was to basically mention March several times so that the search returns that... :)
manuelabeledo · 2 years ago
I may be missing something, but looking at the examples in the website, isn't this just a fancy worded search?

What kind of data can be extracted from these filings, that is not available using "traditional" methods?

nnechm · 2 years ago
That's a good question, for e.g if you want to get Microsoft's revenues by product category or apple's revenues in greater China. https://imgur.com/a/LdQkt7j

You can also do this across time with constraints on context limits ...

What is a traditional method, would you search the string or would you find the named entity (using NLP) and look for the entity ? or do you mean a ctrl +F in the document ?

I guess the premise of LLMs, is that we have an intelligence that can read and write, so it can do this in an automated way in different applications. In this case, I am trying to automate and speed up the investment research process, along the way, we can create our own dataset of financial data that can be generated on the fly as necessary.

Also, how would you extract an income statement from the text, also in the same img using a traditional method ? I personally find it magical it can do that, it knows where it ends and so on...

halflings · 2 years ago
Yes, this looks like a (hopefully) better information retrieval system than running CTRL+F over a PDF.

Nothing wrong with that :) information retrieval is hard. And CTRL+F won't render a table for queries like "what is the % increase of revenue over the last 3 quarter"-type queries.

nnechm · 2 years ago
Yes, information retrieval is hard :) A lot of people ask for 'can you get Apple's revenues by product category for all of 2020', how do you get the smallest piece of text that has the most information about product category and revenues ?I can get a lot of text, but that would mean I probably run over the context limit and so on :)