gkk (u/gkk) - Readit News

gkk commented on 5-week co-founder trial: 90% AI-coded app that coaches founders on The Mom Test unpitched.app/?hn... · Posted by u/gkk

gkk · 8 months ago

My trial co-founder and I built Unpitched as a 5-week run to test if we'd work well together. The twist: we're experienced software engineers yet AI wrote ~90% of the 24k lines of code in Unpitched.

The product analyzes customer interview transcripts to catch when founders slip into "pitch mode" instead of learning. It's based on principles from The Mom Test book - essentially a digital coach that flags your mistakes and gives you personalized advice how to do better.

Why this project for our trial:

- Real problem we'd witnessed (founders talking too much in user interviews) - Tight scope but production-grade requirement - Chance to push AI-accelerated development to its limits

Tech: Next.js 15, Supabase, Trigger.dev, GPT-4.1 via Vercel AI SDK. We used Cursor, Claude Code, V0, and (briefly) Grok for development.

Key learning: AI development requires adopting new working patterns. You can think of AI as a chaotic software engineering intern. You need to be highly intentional in guiding the AI to do the right thing. Just like with human teams, bad managers get bad output from their people and the same applies to managing AI.

If you're an experienced software engineer, you have a lot of implicit assumptions about how to build software, how to rate importance of tasks, etc. You need to transfer these to the AI, and we think we found early patterns how to do this well.

For example, we used "walking skeleton" and "tracer bullet" concepts to structure project planning we did with AI. We found the basic pattern of think-research-brainstorm, and plan before writing any code to dramatically improve the quality of AI coding, as the project gets more complex. E.g. we'd plan error handling with AI first, save it as a doc, then use that as context for implementation - this kept the AI consistent across the codebase.

We shared details of this approach at Warsaw AI Tinkerers (over 200 people attending) a couple of weeks ago.

The co-founder trial worked - we built a working mini-product in 5 weeks, found out how we approach this alien technology in the form of modern AI, and uncovered many interesting personal quirks of each other (everyone has them).

You can check out Unpitched at https://unpitched.app. Sadly, we require sign up as underlying LLM calls are a little expensive.

We wrote more about how we approached the cofounder trial process at https://unpitched.app/about. Let us know if you have any questions about our trial, maybe share your own stories of looking for cofounders, or have any feedback on the app!

PS. Shootout to Circleback team (YC W24) as the only note-taking app we found that has working webhooks that we could integrate with Unpitched.

-- gkk & ykka

gkk commented on Llama 3.1 llama.meta.com/... · Posted by u/luiscosio

cchance · 2 years ago

Super cool, though sadly 405b will be outside most personal usage without cloud providers which sorta defeats the purpose of opensource to some extent atleast sadly, because .. nvidia's rampup of consumer VRAM is glacial

gkk · 2 years ago

If you think of open source as a protocol through which the ecosystem of companies loosely collaborate, then it's a big deal. E.g. Groq can work on inference without a complicated negotiations with Meta. Ditto for Huggingface, and smaller startups.

I agree with you on open source in the original, home tinkerer sense.

gkk commented on Better Call GPT: Comparing large language models against lawyers [pdf] arxiv.org/abs/2401.16212... · Posted by u/vinnyglennon

DanielSantos · 2 years ago

Your post is very interesting. Thanks for sharing.

If your focus is narrow enough the vanilla gpt can still provide good enough results. We narrow down the scope for the gpt and ask it to answer binary questions. With that we get good results.

Your approach is better for supporting broader questions. We support that as well and there the results aren’t as good.

gkk · 2 years ago

Thanks for reading it! I agree that binary questions are easy enough for vanilla GPT to answer. If your problem space fits them - great. Sadly, the space I'm in doesn't have an easy mode!

gkk commented on Better Call GPT: Comparing large language models against lawyers [pdf] arxiv.org/abs/2401.16212... · Posted by u/vinnyglennon

hansonkd · 2 years ago

Thanks for the response. I will check it out.

Specific failure modes can be something as simple as extraction of beneficiary information from a Trust document. Sometimes it works, but a lot of times it doesn't even with startups with AI products specific to extracting information from documents. For example it will have an incomplete list of beneficiaries, or if there are contingent beneficiaries, it won't know what to do. Not even a hard question about the contingency. Just making a simple list with percentages of if no-one dies what is the distribution.

Further trying to get an AI to describe the contingency is a crap shoot.

While I expect these options to get better and better, I have fun trying them out and seeing what basic thing will break. :)

gkk · 2 years ago

Thanks for the response! I'm not familiar with Trust documents but I asked ChatGPT about them: https://chat.openai.com/share/c9d86363-b64a-4e44-9fd4-1d5b18...

If the example is representative, I see two problems: a simple extraction of information that is laid out bare (list of beneficiaries), and reasoning to interpret the section of contingent beneficiaries and connect it facts from other parts. Is that correct?

If that's the case, then Hotseat is miles ahead when it comes to analyzing regulations (from the civil law tradition, which is different from the US), and dealing with the categories of problems you mentioned.

gkk commented on Better Call GPT: Comparing large language models against lawyers [pdf] arxiv.org/abs/2401.16212... · Posted by u/vinnyglennon

hansonkd · 2 years ago

I run a startup that does legal contract generation (contracts written by lawyers turned into templates) and have done some work GPT analysis of the contract for laypersons to interact and ask questions about the contract they are getting.

In terms of contract review, what I've found is that GPT is better at analysis of the document than generating the document, which is what this paper supports. However, I have used several startups options of AI document review and they all fall apart with any sort of prodding for specific answers. This paper looks like it just had to locate the section not necessarily have the back and forth conversation about the contract that a lawyer and client would have.

There is also no legal liability for GPT for giving the wrong answer. So It works well for someone smart who is doing their own research. Just like if you are smart you could use google before to do your own research.

My feelings on contract generation is that for the majority of cases, people are better served if there were simply better boilerplate contracts available. Laywers hoard their contracts and it was very difficult in our journey to find lawyers who would be willing to write contracts we would turn into templates because they are essentially putting themselves and their professional community out of income streams in the future. But people don't need a unique contract generated on the fly from GPT every time when a template of a well written and well reviewed contract does just fine. It cost hundreds of millions to train GPT4. If $10m was just spent building a repository of well reviewed contracts, it would be a more useful than spending the equivalent money training a GPT to generate them.

People ask pretty wide range of questions about what they want to do with their documents and GPT didn't do a great job with it, so for the near future, it looks like lawyers still have a job.

gkk · 2 years ago

Hi hansonkd,

I'm working on Hotseat - a legal Q&A service where we put regulations in a hot seat and let people ask sophisticated questions. My experience aligns with your comment that vanilla GPT often performs poorly when answering questions about documents. However, if you combine focused effort on squeezing GPT's performance with product design, you can go pretty far.

I wonder if you have written about specific failure modes you've seen in answering qs from documents? I'd love to check whether Hotseat is handling them well.

If you'r curious, I've written about some of the design choices we've made on our way to creating a compelling product experience: https://gkk.dev/posts/the-anatomy-of-hotseats-ai/

gkk commented on Show HN: Hotseat AI – Collaborative FAQ for the EU AI Act hotseatai.com... · Posted by u/gkk

artninja1988 · 2 years ago

Man I the part about having to get a license to release a foundational model really friggin sucks and I hope it doesn't make it to the final version of the text

gkk · 2 years ago

Is the licensing requirement actually in the bill? I've seen a confusion around the distinction between foundational and high-risk models - they're not the same.

(on a larger point of the AI Act leaving much to be desired, I agree)

gkk commented on Show HN: Hotseat AI – Collaborative FAQ for the EU AI Act hotseatai.com... · Posted by u/gkk

jakozaur · 2 years ago

Any ETA for questions?

I asked during Product Hunt launch and still waiting for the answer. There should be option to provide e-mail and get notified.

gkk · 2 years ago

It takes 90-120s to compute the answer. I just checked and the bot died mid-way computing answers earlier in the day and picked up from later point in the queue. I fixed it and you should get an answer soon.

Re email: the submission form has a second step where you can opt-in to leave your email address to get notified. Did it not show for you?

gkk commented on Show HN: Hotseat AI – Collaborative FAQ for the EU AI Act hotseatai.com... · Posted by u/gkk

gkk · 2 years ago

(author here)

One of the most non-obvious discoveries we made was that for such long documents, turning it into a Markdown (with marked headings), as opposed to plain text, made a night-and-day difference in LLM's reasoning performance. I have my guesses as to why this could be the case, but I'm curious to hear your hypothesis and whether you've seen similar effects in the wild?