Readit News logoReadit News
Posted by u/timshell 2 years ago
Launch HN: Roundtable (YC S23) – Using AI to Simulate Surveys
Hi HN, we’re Mayank and Matt of Roundtable (https://roundtable.ai/). We use LLMs to produce cheap, yet surprisingly useful, simulations of surveys. Specifically, we train LLMs on standard, curated survey datasets. This approach allows us to essentially build general-purpose models of human behavior and opinion. We combine this with a nice UI that lets users easily visualize and interpret the results.

Surveys are incredibly important for user and market research, but are expensive and take months to design, run, and analyze. By simulating responses, our users can get results in seconds and make decisions faster. See https://roundtable.ai/showcase for a bunch of examples, and https://www.loom.com/share/eb6fb27acebe48839dd561cf1546f131 for a demo video.

Our product lets you add questions (e.g. “how old are you”) and conditions (e.g. “is a Hacker News user”) and then see how these affect the survey results. For example, the survey “Are you interested in buying an e-bike?” shows ‘yes’ 28% [1]. But if you narrow it down to people who own a Tesla, ‘yes’ jumps to 52% [2]. Another example: if you survey “where did you learn to code”, the question “how old are you?” makes a dramatic difference—for “45 or older” the answer is 55% “books” [3], but for “younger than 45” it’s 76% “online” [4]. One more: 5% of people answer “legroom” to the question “Which of the following factors is most important for choosing which airline to fly?” [5], and this jumps to 20% when you condition on people over six feet tall [6].

You wouldn’t think (well, we didn’t think) that such simulated surveys would work very well, but empirically they work a lot better than expected—we have run many surveys in the wild to validate Roundtable's results (e.g. comparing age demographics to U.S. Census data). We’re still trying to figure out why. We believe that LLMs that are pre-trained on the public Internet have internalized a lot of information/correlations about communities (e.g. Tesla drivers, Hacker News, etc.) and can reasonably approximate their behavior. In any case, researchers are seeing the same things that we are. A nice paper by a BYU group [7] discusses extracting sub-population information from GPT/LLMs. A related paper from Microsoft [8] shows how GPT can simulate different human behaviors. It’s an active research topic, and we hope we can get a sense of the theoretical basis relatively soon.

Because these models are primarily trained on Internet data, they start out skewed towards the demographics of heavy Internet users (e.g., high-income, male). We addressed this by fine-tuning GPT on the GSS (General Social Survey [9] - the gold standard of demographic surveys in the US) so our models emulate a more representative U.S. population.

We’ve built a transparency feature that shows how similar your survey question is to the training data and thus gives a confidence metric of our accuracy. If you click ‘Investigate Results’, we report the most similar (in terms of cosine distance between LLM embeddings) GSS questions as a way of estimating how much extrapolation / interpolation is going on. This doesn’t quite address the accuracy of the subpopulations / conditioning questions (we are working on this), but we thought we are at a sufficiently advanced point to share what we’ve built with you all.

We're graduating PhD students from Princeton University in cognitive science and AI. We ran a ton of surveys and behavioral experiments and were often frustrated with the pipeline. We were looking to leave academia, and saw an opportunity in making the survey pipeline better. User and market research is a big market, and many of the tools and methods the industry uses are clunky and slow. Mayank’s PhD work used large datasets and ML for developing interpretable scientific theories, and Matt’s developed complex experimental software to study coordinated group decision-making. We see Roundtable as operating at the intersection of our interests.

We charge per survey. We are targeting small and mid-market businesses who have market research teams, and ask for a minimum subscription amount. Pricing is at the bottom of our home page.

We are still in the early stages of building this product, and we’d love for you all to play around with the demo and provide us feedback. Let us know whatever you see - this is our first major endeavor into the private sector from academia, and we’re eager to hear whatever you have to say!

[1]: https://roundtable.ai/sandbox/e02e92a9ad20fdd517182788f4ae7e...

[2]: https://roundtable.ai/sandbox/6b4bf8740ad1945b08c0bf584c84c1...

[3] https://roundtable.ai/sandbox/d701556248385d05ce5d26ce7fc776...

[4] https://roundtable.ai/sandbox/8bd80babad042cf60d500ca28c40f7...

[5] https://roundtable.ai/sandbox/0450d499048c089894c34fba514db4...

[6] https://roundtable.ai/sandbox/eeafc6de644632af303896ec19feb6...

[7] https://arxiv.org/abs/2209.06899

[8] https://openreview.net/pdf?id=eYlLlvzngu

[9] https://www.norc.org/research/projects/gss.html

SemioticStandrd · 2 years ago
I see the logic here, but I’m highly skeptical about how valid such a tool would be.

If a researcher comes out and says, “Surveys show that people want X, and they do not like Y,” and then others ask the researcher if they surveyed people, the answer would be “no.”

Fundamentally, people wanting feedback from humans will not get that by using your product.

The best you can say is this: “Our product is guessing people will say X.”

famouswaffles · 2 years ago
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (https://arxiv.org/abs/2301.07543)

Out of One, Many: Using Language Models to Simulate Human Samples (https://arxiv.org/abs/2209.06899)

There's been some research in this vain. To answer your question, seemingly very valid.

puppy_nap · 2 years ago
These papers suggest that LLMs do something a lot more specific (when asked to simulate a certain political background, they're able to give responses to questions in a way that's consistent with those political backgrounds). That's not particularly surprising to me as I would expect a human to be also able to simulate this kind of thing pretty accurately. I don't think it implies that LLMs would be good at answering typical business survey questions.
timshell · 2 years ago
We're trying to figure out the optimal use case for this, i.e. whether it's internal or client-facing (your example).

Internal purposes include stuff like optimally rewording questions and getting priors.

A hybrid approach would be something like - hey let's not ask someone 100 questions because we can accurately predict 80%. Let's just ask them the hard-to-estimate 20 questions

tcgv · 2 years ago
I think it's less about "prediction" and more about mapped cohort behaviors and opinions, especially those that change slowly over time. The LLM model will likely be a picture of how the population and each demographic group behaved and what they believed at a specific time window (i.e. when the data set was collected), and will produce answers that reflect that. It will most likely be lagging behind new trends and how they shape population behaviors and beliefs over time. In any case I think even the most experienced market research professionals would agree that discovering new trends before they become mainstream is really challenging.
quadrature · 2 years ago
> optimally rewording questions

This kind of concerns me because you could use this to bias surveys in different directions. This obviously already happens, so maybe it just part of the status quo.

tchock23 · 2 years ago
I’ve worked in this industry for a while and in the ‘faster, cheaper, better - pick two’ trade off, many will select faster and cheaper. That’s only speaking for corporate market research though, can’t say the same for academic researchers.

I suspect people would use this product as a quick gut check to decide if it is warranted to spend the time and money on a full scale quant study.

DriverDaily · 2 years ago
You want a 90/10: 90% of the benefit, 10% of the effort.

This is like a 10/10.

Shrezzing · 2 years ago
The tool would be useful as a QA step to test for leading questions in survey design. See Yes Minister's[1] explanation for how they can work. A simulation to see if the questions get the same response irrespective of the order they were asked in could improve survey quality. Obviously, the tool could be used in the opposite way too, to help design surveys that say exactly what the company/govt/charity wants it to.

[1] https://www.youtube.com/watch?v=G0ZZJXw4MTA

helsinkiandrew · 2 years ago
> I see the logic here, but I’m highly skeptical about how valid such a tool would be.

I see the problem as although you can create lots of examples that are correct/follow real world opinions, you can never prove that a particular question is correct/follows real world opinion. I'm not sure who would trust the output enough to rely on it for decision making.

digitcatphd · 2 years ago
I have been using AI generated surveys using the playground and have found them quite effective in simulating responses. In fact they are incredibly similar to my experience asking the same questions IRL. The challenge is people don’t trust them and AI still have this negative association. So yes I mean to say it’s yet another human error.
egonschiele · 2 years ago
Some people worry that biased AI models will deepen inequality. Your product seems particularly primed for this scenario. I might even say that a product like yours would exacerbate this problem. What is your plan to ameliorate AI bias?

On a more personal note, while all of the AI advances have been very interesting, I worry that AI will reduce human connection, and a product like this sure seems to do that. You are telling users that they don't need to talk to real people, and can just get feedback from a model instead.

Edit: for example, here's your dataset by race: https://imgur.com/a/134epoN

I asked, "Which race is most likely to commit a crime?": https://imgur.com/a/4QJZo2O

timshell · 2 years ago
1. GPT out of the box was pretty biased (e.g. gender distribution). We fine-tuned on representative survey data to ameliorate this bias so we get Census-level estimates for conditions such as gender [a] and work status [b].

2. We add the transparency features (click on 'Investigate Results') that shows how in vs. out-of-distribution the target question is. For out-of-distribution, we suggest people run traditional surveys.

More broadly, I think your point is really interesting when it comes to qualitative data. That is one reason we haven't generated qualitative survey data, but a lot of potential customers have already started to ask for it.

----

[a] https://roundtable.ai/sandbox/baa3d5f25236b91f1608c9f606b315...

[b] https://roundtable.ai/sandbox/7a9ee27872eb29087be2386ccd19f7...

timshell · 2 years ago
To respond to Edits - that's a great example, thank you. One of the limitations of surveys more broadly is you're asking for people's opinions, which of course does not correspond to reality. So, what we're simulating is how we estimate a representative U.S. population to answer the question "Which race is most likely to commit a crime?" as opposed to what the actual answer is.

We definitely need to think how to handle your question so that it's clear where survey data converges/diverges with reality.

Gerardo1 · 2 years ago
How can you be reasonably sure that that work sufficiently addresses the bias?

What metric(s) are you using to measure bias in general, and what do those metric(s) look like before and after your tuning?

DtNZNkLN · 2 years ago
This is the most thought-provoking company I’ve encountered in a long time. Congrats on doing genuinely interesting innovation.

Speaking as a potential user, my biggest hang up is trust. How can I trust that Roundtable’s results are accurate and not the result of hallucination?

One of the powerful things about data is that they surprise you. This is why data integrity is so important (“crap in, crap out” as the old adage goes). But if I get a surprising result from Roundtable, how can I verify it? I think you two are already thinking about this and building features to address it.

I’m also wondering if trying to verify a surprising result from Roundtable is the wrong response…Why would a LLM give me that answer? There may be something useful to understand about why the LLM is “hallucinating.” In terms of features, it may be interesting to see whether Roundtable’s LLM could explain its answer.

The UX could be like having a brilliant but inscrutable research assistant…

timshell · 2 years ago
Thank you, this is exactly where our headspace is too
tempusalaria · 2 years ago
Hi congrats.

LLMs model a static distribution, whereas consumer preferences change over time to the point that companies regularly run the same survey at different points in time. At my old fund we would run the same surveys every month to track changes on various companies. How do you counteract this time effect? Presumably a lot of your training data is from the past.

To give one example from your summary - the demographics of Tesla owners have change significantly over time from a pure luxury, avant garde market to much mass market. So info about Tesla from 5 years ago is not that useful

timshell · 2 years ago
Pasting below answer to niko001

The data we trained on has year, so we can specify the year you ask the question (the default is 2023). You can also see how answers change over time. [1] shows how the distribution for "Do you support the President" changes from 2000 to 2023 (see the 9/11 spike, end of Bush era, Obama era, Trump era, etc.) [1] https://roundtable.ai/sandbox/2dd4e9d32c24e9abff01810695e948...

tempusalaria · 2 years ago
Which is logical and kind of what I expected. But raises the obvious question of where does your data come from going forward? The internet is getting more and more polluted with machine generated data, previous big ongoing data sources like Twitter, Reddit, etc. are all full of GPT spam and are trying to monetise their data.

I’d also be interested in how much you think your platform is just capturing say reported surveys/data. President polling is something that must be all over LLM datasets- isn’t that just replicating the training data?

I think you could do a better job of showing on your website the following - here are some unusual survey results we generated from the model - I.e. stuff definitely not in the training data - and here’s the data we actually got when we did that survey for real

samsee · 2 years ago
This is a really cool idea and beautiful UX, congrats on the launch!

One related pain point I have seen many times with surveys is that the people writing them don't know what they're doing and get bad data as a result of biased questions.

Could be cool to add functionality down the line to help people craft better questions. For example, your app could provide alternate ways of phrasing questions and then simulate how results would differ based on the wording.

Excited to see where this goes! Going to share with my partner who works for a survey software company and see what she thinks.

timshell · 2 years ago
Exactly where we're headed :)

Thank you for the kind words / reference

golergka · 2 years ago
Just played with the sandbox, and it seems like 16% of Apple users wouldn't consider buying Apple VR headset even for $3,5. I don't think even the lochness monster would be so stingy.
coderintherye · 2 years ago
In a world where everyone could buy the headset for $3.50 (thus there is no profit value to buying it and then re-selling it) then that percentage actually makes sense.
sebzim4500 · 2 years ago
Presumably the question is being interpreted as "would you buy a Apple VR headset if the standard price was $3.50?", rather than "would you buy a headset for $3.50 that you could immediately resell for 1000x that?".

The answer seems plausible with that interpretation.

golergka · 2 years ago
Don't you think that even playing with it for an hour and then never touching it again would still be worth a three fiddy?
ahzhou · 2 years ago
FWIW, I think the discussion here this highlights the biggest problem with the approach. Either it's confirmatory and wasn't a question worth asking in the first place or it's surprising and people need real data to evaluate the response.
timshell · 2 years ago
One of our major weaknesses right now is sensitivity to price
wrftaylor · 2 years ago
My company recently ran a survey of UK-based Creatives on the topic of their working preferences (n=250, July 2023) - so I compared our data to responses provided by Roundtable.ai

https://docs.google.com/spreadsheets/d/1YtvcLkC-xaTw3q6LOxCq...

The average delta across 11 questions between actual selected response % and simulated %, across 11 questions was 7%. Seems like a good start - it would make it useful for certain low-impact, high-speed business decisions.

timshell · 2 years ago
Thank you for sharing these results!
DavidFerris · 2 years ago
Interesting idea! One of the problems with any primary research (surveys included) is the delay in collecting responses, which can take hours to weeks depending on sample, IR, incentives, etc. This would solve that!

It's not surprising that LLMs can predict the answers to survey questions, but really good primary research generates surprising insights that are outside of existing distributions. Have you found that businesses trust your results? I have found that most businesses don't trust survey research much at all, and this seems like it might be even less reliable.

-----

Context: I co-founded & sold survey software company (YC W20).

timshell · 2 years ago
Thank you!

Trust is one of the biggest issues we're trying to solve. This motivated the tSNE plots and similarity scores under 'Investigate Results', but we definitely have a long way to go. Generally speaking, survey practitioners trust us more than their clients (perhaps not surprising)

famouswaffles · 2 years ago
You might want to take a look at the papers i've linked here that go into this kind of research

https://news.ycombinator.com/item?id=36868552