Readit News logoReadit News
mitko commented on Gemini 2.5   blog.google/technology/go... · Posted by u/meetpateltech
malisper · 5 months ago
I've been using a math puzzle as a way to benchmark the different models. The math puzzle took me ~3 days to solve with a computer. A math major I know took about a day to solve it by hand.

Gemini 2.5 is the first model I tested that was able to solve it and it one-shotted it. I think it's not an exaggeration to say LLMs are now better than 95+% of the population at mathematical reasoning.

For those curious the riddle is: There's three people in a circle. Each person has a positive integer floating above their heads, such that each person can see the other two numbers but not his own. The sum of two of the numbers is equal to the third. The first person is asked for his number, and he says that he doesn't know. The second person is asked for his number, and he says that he doesn't know. The third person is asked for his number, and he says that he doesn't know. Then, the first person is asked for his number again, and he says: 65. What is the product of the three numbers?

mitko · 5 months ago
Loved that puzzle, thanks for sharing it. I’ve solved a lot of math problems in the past but this one had a unique flavor of interleaving logical reasoning, partial information and a little bit of arithmetic.
mitko commented on MOATs Aren't Useful   rohan.ga/blog/moats/... · Posted by u/ocean_moist
mitko · 10 months ago
Defensibility re: companies is well studied, by Hamilton Helmer who and his grad student did case studies of several hundred "defensible" companies and identified 7 "powers" that help companies defend against specific competitors. Note that each power is only applicable towards a specific competitor, and not as a blanket statement. The powers are 1. economies of scale, 2. network effects, 3. counter positioning, 4. Switching costs, 5. Branding, 6. Cornered resource and 7. process power.

I highly recommend Cedric's writing on the topic (behind a paywall) https://commoncog.com/7-powers-summary/

mitko commented on Ask HN: Who is hiring? (November 2024)    · Posted by u/whoishiring
mitko · 10 months ago
Pioneer | Senior Software Engineer | Climate Tech and LLMs | Seed stage | Remote within USA-48

USA only | $150K+, equity

Apply here: https://pioneerclimate.com/careers

About: Pioneer's mission is to coordinate the funding for rapid decarbonization. We take the pain out of the government grants application process by using dozens of LLM workflows to reduce the effort required to apply and win government awards, and we’ve helped companies win $160M to date. We have more demand than we can serve, and we’re growing revenue.

Culture: Our core values are kindness, impact, intentionality, initiative, feedback and efficiency in that order, and we implement these throughout our processes.

You: Enjoy the 0-to-1, hungry for growth, actively collaborate and pair with users, can handle the nondeterminism of LLMs, start sentences with “What if…“, enjoy building across the full stack.

Stack: We use TypeScript, React, Next.JS, LangSmith and other modern tools.

mitko commented on Chain-of-thought can hurt performance on tasks where thinking makes humans worse   arxiv.org/abs/2410.21333... · Posted by u/benocodes
lolinder · 10 months ago
This is a regression in the model's accuracy at certain tasks when using COT, not its speed:

> In extensive experiments across all three settings, we find that a diverse collection of state-of-the-art models exhibit significant drop-offs in performance (e.g., up to 36.3% absolute accuracy for OpenAI o1-preview compared to GPT-4o) when using inference-time reasoning compared to zero-shot counterparts.

In other words, the issue they're identifying is that COT is an less effective model for some tasks compared to unmodified chat completion, not just that it slows everything down.

mitko · 10 months ago
Yeah! That's the danger with any kind of "model" whether it is CoT, CrewAI, or other ways to outsmart it. It is betting that a programmer/operator can break a large tasks up in a better way than an LLM can keep attention (assuming it can fit the info in the context window).

ChatGPT's o1 model could make a lot of those programming techniques less effective, but they may still be around as they are more manageable, and constrained.

mitko commented on Chain-of-thought can hurt performance on tasks where thinking makes humans worse   arxiv.org/abs/2410.21333... · Posted by u/benocodes
mitko · 10 months ago
This is so uncannily close to the problems we're encountering at Pioneer, trying to make human+LLM workflows in high stakes / high complexity situations.

Humans are so smart and do so many decisions and calculations on the subconscious/implicit level and take a lot of mental shortcuts, so that as we try to automate this by following exactly what the process is, we bring a lot of the implicit thinking out on the surface, and that slows everything down. So we've had to be creative about how we build LLM workflows.

mitko commented on Founder Mode   paulgraham.com/foundermod... · Posted by u/bifftastic
picafrost · a year ago
I struggle to see "founder mode" as something that scales. Is there not some self-selection bias occurring here, given the audience "included a lot of the most successful founders we've funded"?

If a founder is exceptional and all of the other stars necessary for a startup to succeed have aligned, this may be a good approach. But then we are just back to the question YC has always tried to answer: what makes a founder exceptional?

What about the founders who failed _because_ they were in "founder mode"?

I am not sure this article represents the beginning of a paradigm shift like it seems to think it does.

mitko · a year ago
Maybe one way to think of it is fractal management where a manager would have deep read-write interactions with few skip levels. HBR style says read everywhere, write only direct reports. And it makes for a good software design except that humans are not computers, but there is a shared global context - the company vision and mission.

Through fractal management, a visionary leader can have a better chance to ensure that the vision is translated into practice at the various levels of detail.

Fractal management is only part of it, though as it is a technique, but it doesn’t cover the enormous skin in the game founders have about the success of the company. For many founders, the company is their baby(I am projecting here) and they want to make it succeed. Contrastingly, many of the professional fakers instead see it as just a job, and a step on the ladder. Principal/agent. Without genuine care, and cohesive vision, fractal management can quickly devolve into chaos. It is high reward and also higher risk!!! Maybe that’s why only the founders do it but not their VPs. I wonder if any VPs at Airbnb are doing anything remotely similar to what Bryan Chesky is doing as management style? (Honestly I have no idea)

I am sure that many founders failed also because of it as they might have been missing the charisma, clarity and conviction to pull this off.

(PS. Take my ideas with a big serving of salt, I am a founder but not at a large organization, and the article mainly focuses on large orgs)

mitko commented on Founder Mode   paulgraham.com/foundermod... · Posted by u/bifftastic
mitko · a year ago
Tony Fadell’s idea of a Parent CEO vs Babysitter CEO in Build comes close to this founded mode concept but I am so glad PG memed the founder mode concept into existence, and talk about the gaslighting. And I bet he is right about there are many founders operating somewhat that way already but without a shared mental model and a name to it.

It is really hard to put in practice because of all the gravitational pull towards mediocrity.

mitko commented on Impacts of lack of sleep   belkarx.github.io/posts/f... · Posted by u/belkarx
mitko · 2 years ago
As someone who currently runs on decreased sleep, I can relate to a bunch of these.

But in my case, the root cause seems to be stress from increased load, and the higher cortisol levels it creates

mitko commented on Ask HN: Who is hiring? (August 2023)    · Posted by u/whoishiring
cadr · 2 years ago
You seem to have a "Founding Operations Lead" already under "Meet the team" - does this role report to that person?
mitko · 2 years ago
We are hiring more!
mitko commented on Ask HN: Who is hiring? (August 2023)    · Posted by u/whoishiring
mitko · 2 years ago
Pioneer | Founding Senior Software Engineer | Climate Tech and LLMs | Seed stage | SF Bay Area, DC or remote in USA only | $150K+, equity

Apply here: https://usepioneer.com/careers

About: Pioneer's mission is to coordinate the funding for rapid decarbonization. We take the pain out of the government application process by using LLMs to gradually reduce the effort required to identify, qualify, apply, and comply with government awards. We are passionate about climate impact and creating a supportive growth environment based on the fundamentals of Radical Candor. We also have a proven business model and rapidly growing revenue.

Culture: Our core values are kindness, feedback, intentionality, ownership, and impact. We implement these in various creative ways: toms, laps instead of sprints, feedback instead of 1:1, and decision journals! Our newest team member said “Every interaction is “kind”, coming from a good place of respecting each other.”

You: Enjoy the 0-to-1, pre-PMF phase, have an owner mindset, collaborative, have curiosity for technology like LLMs, start sentences with “What if…“, enjoy building across the full stack.

Stack: We use TypeScript, React, Next.JS, and other modern tools.

P.S. We are also hiring for a Founding Operations Lead - apply through the same link as above.

u/mitko

KarmaCake day863August 9, 2008
About
Essays at d13v.com

Founder at pioneerclimate.com

View Original