Terence Tao on O1 - Readit News

Once GPT is tuned more heavily on Lean (proof assistant) -- the way it is on Python -- I expect its usefulness for research level math to increase.

I work in a field related to operations research (OR), and ChatGPT 4o has ingested enough of the OR literature that it's able to spit out very useful Mixed Integer Programming (MIP) formulations for many "problem shapes". For instance, I can give it a logic problem like "i need to put i items in n buckets based on a score, but I want to fill each bucket sequentially" and it actually spits out a very usable math formulation. I usually just need to tweak it a bit. It also warns against weak formulations where the logic might fail, which is tremendously useful for avoiding pitfalls. Compare this to the old way, which is to rack my brain over a weekend to figure out a water-tight formulation of MIP optimization problem (which is often not straightforward for non-intuitive problems). GPT has saved me so much time in this corner of my world.

Yes, you probably wouldn't be able to use ChatGPT well for this purpose unless you understood MIP optimization in the first place -- and you do need to break down the problem into smaller chunks so GPT can reason in steps -- but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.

side: a lot of people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category) LLMs are useless to them probably (1) do not know how to use LLMs in way that maximizes their strengths; (2) have expectations that are too high based on the hype, expecting one-shot magic bullets. (3) LLMs are really not good for their domain. But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.

Many of us who have discovered how to exploit LLMs in their areas of strength -- and know how to check for their mistakes -- often find them providing significant leverage in our work.

WhatIsDukkha · 2 years ago

I entirely agree about their utility.

HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.

Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.

It's entirely a learned skill, the models (and very importantly the tooling around them) have arrived at the base line they needed.

Much Much more productive world by just knuckling down and learning how to do the work.

edit: https://aider.chat/ + paid 3.5 sonnet

skydhash · 2 years ago

> Much Much more productive world by just knuckling down and learning how to do the work.

The fact everyone that say they've become more productive with LLMs won't say how exactly. I can talk about how VIM have make it more enjoyable to edit code (keybinding and motions), how Emacs is a good environment around text tooling (lisp machine), how I use technical books to further my learning (so many great books out here). But no one really show how they're actually solving problems with LLMs and how the alternatives were worse for them. It's all claims that it's great with no further elaboration on the workflows.

> I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.

Code is intent described in terms of machinery actions. Those actions can be masked by abstracting them in more understandable units, so we don't have to write opcodes, but we can use python instead. Programming is basically make the intent clear enough so that we know what units we can use. Software engineering is mostly selecting the units in a way to do minimal work once the intent changes or the foundational actions do.

Chatting with a LLM look to me like your intent is either vague or you don't know the units to use. If it's the former, then I guess you're assuming it is the expert and will guide you to the solution you seek, which means you believe it understands the problem more than you do. The second is more strange as it looks like playing around with car parts, while ignoring the manuals it comes with.

What about boilerplate and common scenarios? I agree that LLMs helps a great deal with that, but the fact is that there are perfectly good tools that helped with that like snippets, templates, and code generators.

cjbgkagh · 2 years ago

In my view these models produce above average code which is good enough for most jobs. But the hacker news sampling could be biased towards the top tier of coders - so their personal account of it not being good enough can also be true. For me the quality isn't anywhere close to good enough for my purposes, all of my easy code is already done so I'm only left working on gnarly niche stuff which the LLMs are not yet helpful with.

For the effect on the industry, I generally make the point that even if AI only replaces the below average coder it will cause a downward pressure on above average coders compensation expectation.

Personally, humans appear to be getting dumber at the same time that AI is getting smarter and while, for now, the crossover point is at a low threshold that threshold will of course increase over time. I used to try to teach ontologies, stats, SMT solvers to humans before giving up and switching to AI technologies where success is not predicated on human understanding. I used to think that the inability for most humans to understand these topics was a matter of motivation, but have rather recently come to understand that these limitations are generally innate.

delusional · 2 years ago

What sort of problems do you solve? I tried to use it. I really did. I've been working on a tree edit distance implementation base on a paper from 95. Not novel stuff. I just can't get it to output anything coherent. The code rarely runs, it's written in absolutely terrible style, it doesn't follow any good practices for performant code. I've struggled with getting it to even implement the algorithm correctly, even though it's in the literature I'm sure it was trained on.

Even test cases have brought me no luck. The code was poorly written, being too complicated and dynamic for test code in the best case and just wrong on average. It constantly generated test cases that would be fine for other definitions of "tree edit distance" but were nonsense for my version of a "tree edit distance".

What are you doing where any of this actually works? I'm not some jaded angry internet person, but I'm honestly so flabbergasted about why I just can't get anything good out of this machine.

minkles · 2 years ago

That’s fine until your code makes its way to production, an unconsidered side effect occurs and then you have to face me.

You are still responsible for what you do regardless of the means you used to do it. And a lot of people use this not because it’s more productive but because it requires less effort and less thought because those are the hard bits.

I’m collecting stats at the moment but the general trend in quality as in producing functional defects is declining when an LLM is involved in the process.

So far it’s not a magic bullet but a push for mediocrity in an industry with a rather bad reputation. Never a good story.

benterix · 2 years ago

> I've found that I haven't written a line of code in weeks

Which is great until your next job interview. Really, it's tempting in the short run but I made a conscious decision to do certain tasks manually only so that I don't lose my basic skills.

apsurd · 2 years ago

LLMs are certainly not useless.

But "lines of code written" is a hollow metric to prove utility. Code literacy is more effective than code illiteracy.

Lines of natural language vs discrete code is a kind of preference. Code is exact which makes it harder to recall and master. But it provides information density.

> by just knuckling down and learning how to do the work?

This is the key for me. What work? If it's the years of learning and practice toward proficiency to "know it when you see it" then I agree.

acedTrex · 2 years ago

> I've found that I haven't written a line of code in weeks

How are people doing this, none of the code that gpt4o/copilot/sonnet spit out i ever use because it never meets my standards. How are other people accepting the shit it spits out.

anujsjpatel · 2 years ago

For someone who didn't study a STEM subject or CS in school, I've gone from 0 to publishing a production modern looking app in a matter of a few weeks (link to it on my profile).

Sure, it's not the best (most maintainable, non-redundant styling) code that's powering the app but it's more than enough to put an MVP out to the world and see if there's value/interest in the product.

threeseed · 2 years ago

> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.

This is cult like behaviour that reminds me so much of the crypto space.

I don't understand why people are not allowed to be critical of a technology or not find it useful.

And if they are they are somehow ignorant, over-reacting or deficient in some way.

rafaelmn · 2 years ago

I use sonet 3.5 and while it's actually usable for codegen (compared to gpt/copilot) it's still really not that great. It does well at tasks like "here's a stinky collection of tests that accrued over time - clean this up in style of x" but actually writing code still shows fundamental lack of understanding of underlying API and problem (the most banal example being constantly generating `x || Array.isArray(x)` test)

wokwokwok · 2 years ago

> I've found that I haven't written a line of code in weeks

Please post a video of your workflow.

It’s incredibly valuable for people to see this in action, otherwise they, quite legitimately, will simply think this is not true.

perching_aix · 2 years ago

> HN, and the internet in general, have become just an ocean of reactionary sandbagging and blather about how "useless" LLMs are.

Now imagine how profoundly depressing it is to visit a HN post like this one, and be immediately met with blatant tribalism like this at the very top.

Do you genuinely think that going on a performative tirade like this is what's going to spark a more nuanced conversation? Or would you rather just the common sentiment be the same as yours? How many rounds of intellectual dishonesty do we need to figure this out?

riku_iki · 2 years ago

> Meanwhile, in the real world, I've found that I haven't written a line of code in weeks. Just paragraphs of text that specify what I want and then guidance through and around pitfalls in a simple iterative loop of useful working code.

could it be that you are mostly engaged in "boilerplate coding", where LLMs are indeed good?

Deleted Comment

holoduke · 2 years ago

People in general don't like change and are naturally defending against it. And the older people get the greater the percentage of people fighting against it. A very useful and powerful skill is to be flexible and adaptable. You positioned yourself in the happy few.

ijustlovemath · 2 years ago

How much do you typically pay in a month of tokens?

_wire_ · 2 years ago

Comment on first principles:

Following the dictum that you can't prove the absence of bugs, only their presence, the idea of what constitutes "working code" deserves much more respect.

From an engineering perspective, either you understand the implementation or you don't. There's no meaning to iteratively loop of producing working code.

Stepwise refinement is a design process under the assumption that each step is understood in a process of exploration of the matching of a solution to a problem. The steps are the refinement of definition of a problem, to which is applied an understanding of how to compute a solution. The meaning of working code is in the appropriateness of the solution to the definition of the problem. Adjust either or both to unify and make sense of the matter.

The discipline of programming is rotting when the definition of working is copying code from an oracle you run it to see if it goes wrong.

The measure of works must be an engineering claim of understanding the chosen problem domain and solution. Understanding belongs to the engineer.

LLMs do not understand and cannot be relied upon to produce correct code.

If use of an LLM puts the engineer in contact with proven principles, materials and methods which he adapts to the job at hand, while the engineer maintains understanding of correctness, maybe that's a gain.

But if the engineer relies on the LLM transformer as an oracle, how does the engineer locate the needed understanding? He can't get it from the transformer: he's responsible for checking the output of the transformer!

OTOH if the engineer draws on understanding from elsewhere, what is the value of the transformer but as a catalog? As such, who has accountability for the contents of the catalog? It can't be the transformer because it can't understand. It can't be the developer of the transformer because he can't explain why the LLM produces any particular result! It has to be the user of the transformer.

So a system of production is being created whereby the engineer's going-in position is that he lacks the understanding needed to code a solution and he sees his work as integrating the output of an oracle that can't be relied upon.

The oracle is a peculiar kind of calculator with a unknown probability of generating relevant output that works at superhuman speeds, while the engineer is reduced to an operator in the position of verifying that output at human speeds.

This looks like a feedback system for risky results and slippery slope towards heretofore unknown degrees of incorrectness and margins for error.

At the same time, the only common vernacular for tracking oracle veracity is in arcane version numbers, which are believed, based on rough experimentation, to broadly categorize the hallucinatory tendencies of the oracle.

The broad trend of adoption of this sketchy tech is in the context of industry which brags about seeking disruption and distortion, regards its engineers as cost centers to be exploited as "human resources", and is managed by a specialized class of idiot savants called MBAs.

Get this incredible technology into infrastructure and in control of life sustaining systems immediately!

skybrian · 2 years ago

What sort of code do you write this way?

amrrs · 2 years ago

Curious why Aider? Why not Cursor ?

evilfred · 2 years ago

writing code is the easy part, designing is hard and not LLMable

sterlind · 2 years ago

I also do OR-adjacent work, but I've had much less luck using 4o for formulating MIPs. It tends to deliver correct-looking answers with handwavy explanations of the math, but the equations don't work and the reasoning doesn't add up.

It's a strange experience, like taking a math class where the proofs are weird and none of the lessons click for you, and you start feeling stupid, only to learn your professor is an escaped dementia patient and it was gobbledygook to begin with.

I had a similar experience yesterday using o1 to see if a simple path exists through s to t through v using max flow. It gave me a very convincing-looking algorithm that was fundamentally broken. My working solution used some techniques from its failed attempt, but even after repeated hints it failed to figure out a working answer (it stubbornly kept finding s->t flows, rather than realizing v->{s,t} was the key.)

It's also extremely mentally fatiguing to check its reasoning. I almost suspect that RLHF has selected for obfuscating its reasoning, since obviously-wrong answers are easier to detect and penalize than subtly-wrong answers.

mjburgess · 2 years ago

Yip. We need research into how long it takes experts to repair faulty answers, vs. generate them on their own.

Benchmarking 10,000 attempts on an IQ test is irrelevant if on most of those attempts the time taken to repair an answer is long than the time to complete the test yourself.

I find its useful to generate examplars in areas you're roughly familiar with, but want to see some elaboration or a refresher. You can stich it all together to get further, but when it comes time to actually build/etc. something -- you need to start from scratch.

The time taken to reporduce what it's provided, now that you understand it, is trivial compared to the time needed to repair its flaws.

CJefferson · 2 years ago

I'm currently teaching a course on MIP, and out of interest I tried asking 4o about some questions I ask students. It could give the 'basic building blocks' (How to do x!=y, how to do a knapsack), but as soon as I asked it a vaguely interesting question that wasn't "bookwork", I don't think any of it's models were right.

I'm interested on how you seem to be getting better answers than me (or, maybe I just discard the answer once I can see it's wrong and write it myself, once I see it's wrong?)

In fact, I just asked it to do (and explain) x!=y for x,y integer variables in the range {1..9}, and while the constraints are right, the explanation isn't.

Deleted Comment

wenc · 2 years ago

I had to prompt it correctly (tell it to exclude x=y case in the x≠y formulation), but ChatGPT seems to have arrived at the correct answer:

https://chatgpt.com/share/66e652e1-8e2c-800c-abaa-92e29e0550...

l33t7332273 · 2 years ago

I an also working in OR and I have had the complete opposite experience with respect to MILP optimization(and the research actually agrees; there was a big survey paper published earlier this year showing LLMs were mostly correct on textbook problems but got more and more useless as complexity and novelty increased.)

The results are boiler plate at best, but misleading and insidious at worst, especially when you get into detailed tasks. Ever try to ask a LLM what a specific constraint does or worse ask it to explain the mathematical model of some proprietary CPLEX syntactic sugar? It hallucinates the math, the syntax, the explanation, everything.

wenc · 2 years ago

Can you point me to that paper? What version of the model were they using?

Have you tried again with the latest LLMs? ChatGPT4 actually (correctly) explains what each constraint does in English -- it doesn't just provide the constraint when you ask it for the formulation. Also, not sure if CPLEX should be involved at all -- I usually just ask it for mathematical formulations, not CPLEX calling code (I don't use CPLEX). The OR literature primarily contains math formulations and that's where LLMs can best do pattern matching to problem shape.

Many of the standard formulations are in here:

https://msi-jp.com/xpress/learning/square/10-mipformref.pdf

All the LLM is doing is fitting the problem description to a combination of these formulations (and others).

marmakoide · 2 years ago

I had the same experience with computational geometry.

Very good at giving a textbook answer ("give a Python/ Numpy function that returns the Voronoi diagram of set of 2d points").

Now, I ask for the Laguerre diagram, a variation that is not mentioned in textbooks, but very useful in practice. I can spend a lot of time spoon-feeding the answer, I just have the bullshiting student answers.

I tried other problems like numerical approximation, physics simulation, same experience.

I don't get the hype. Maybe it's good at giving variations of glue code ie. Stack Overflow meet autocomplete ? As a search tool it's bad because it's so confidently incorrect, you may be fooled by bad answers.

CamperBob2 · 2 years ago

But many of the low-effort comments seem to mostly fall into (1) and (2) -- cynicism rather than cautious optimism.

One good riposte to reflexive LLM-bashing is, "Isn't that just what a stochastic parrot would say?" Some HN'ers would dismiss a talking dog because the C code it wrote has a buffer overflow error.

Workaccount2 · 2 years ago

It's understandable that people whose career and lifelong skill set that are seemingly on the precipice of obsolescence are going to be extremely hostile to that threat.

How many more years is senior swe work going to be a $175k/yr gig instead of an $75k check-what-the-robot-does gig?

jazzyjackson · 2 years ago

Id rather live in the world without talking dogs if their main utility is authoring buggy code

airstrike · 2 years ago

It also doesn't help that Lean has had so many breaking changes in such little time. When I tried using GPT-4 for it, it mostly rendered old code that would fail to run unless you already knew the answer and how to fix it, which basically made it entirely unhelpful.

benterix · 2 years ago

> people who complain on HN that (paid/good - only Sonnet 3.5 and GPT4o are in this category)

Correction: I complain that the only decent model in "Open"AI's arsenal, that is GPT-4, has been replaced by a cheaper GPT-4o, which gives subpar answers to most of my question (I don't care it does it faster). As they moved it to "old, legacy" models, I expect they will phase it out, at which point I'll cancel my OpenAI subscriptions and Sonnet 3.5 will become the clear leader for my daily tasks.

Kudos to Anthropic for their great work, you guys are going in the right direction.

bongodongobob · 2 years ago

Nah, o1 is fucking impressive. It's really fucking good. I'm guessing you haven't used it yet.

EvgeniyZh · 2 years ago

There is ~3 order of magnitude more Python code in the internet than Lean code (200GB vs 200MB in the stack v2). You can't tune it "the same way"

agumonkey · 2 years ago

Fair point but a lot of python code is redundant and low quality.

Davidzheng · 2 years ago

I'm not sure the lean coverage of pure math research is that much (maybe like 1% is represented on mathlib). But I think a system like alpha proof could even today be useful for mathematicians--I mostly dislike systems like o1 where they confidently say nonsense with such high frequency. But i think value is already there.

lanstin · 2 years ago

The point about using lean is you don't have to trust you can verify.

RayVR · 2 years ago

I’m amazed you have had any luck with 4o. I found 4 was much better than 4o but still quite bad.

I tried to use 4/4o for a MIP several months ago. Frequently, it would iterate through three or four bad implementations over and over.

Claude 3.5 has been a significant improvement. I don’t really use chatgpt for anything at this point.

po76 · 2 years ago

Give it a few months. ChatGPT will be recommending GPTs to use or do it automatically.

Nothing is static in the way things are moving.

andrepd · 2 years ago

I take cynicism over unbridled optimism. People speak as if we were on the cusp of technological singularity, but I've seen nothing to indicate we're not already past the inflection point of the logistic curve, and well into diminishing returns territory.

riffraff · 2 years ago

_can_ GPT be tuned more heavily on Lean? It looks like the amount of python code in the corpus would outnumber Lean something like 1000:1. Although I guess OpenAI could generate more and train on that.

agumonkey · 2 years ago

side question, are there good OR websites / platforms (reddit, mastodon) to get involved in the field ?

dtquad · 2 years ago

Most OR researchers and practitioners are on Mastodon.

rabf · 2 years ago

Most people are on X.

thelastparadise · 2 years ago

> but for someone who can and does, the $20/month I pay for ChatGPT more than pays for itself.

Would you be willing to pay even more, if it meant you were getting proportionally more valuable answers?

E.g. $200/month or $2,000/month (assuming the $2,000/month gets into employee/intern/contractor level of results.)

This might drive a positive feedback loop.

eab- · 2 years ago

Why do you expect GPT being tuned on Lean will help it for research-level math?

threeseed · 2 years ago

> side

Or (4) LLMs simply do not work properly for many use cases in particular where large volumes of trained data doesn't exist in its corpus.

And in these scenarios rather than say "I don't know" it will over and over again gaslight you with incoherent answers.

But sure condescendingly blame on the user for their ignorance and inability to understand or use the tool properly. Or call their criticism low-effort.

wenc · 2 years ago

That's category (3).

zamadatix · 2 years ago

What's the difference between (3) and (4), shouldn't the former contain the latter?

lanstin · 2 years ago

Yeah I have been using them to help with learning graduate maths as a grad student. Claude Sonnet 3.5 was unparalleled and the first quite useful one. GPT4o preview seems about equal (based on cutting and pasting the past six months of prompts into it).

Rewind your mind to 2019 and imagine reading a post that said

“The experience seemed roughly on par with trying to advise a mediocre, but not completely incompetent, graduate student.”

With regard to interacting with the equivalent of Alexa. That’s a remarkable difference in 5 years.

JumpCrisscross · 2 years ago

The first profession AI seems on track to decimate is programming. In particular, the brilliant but remote and individual contributor. There is an obvious conflict of interest in this forum.

vessenes · 2 years ago

I see this theory a lot but mostly from people who haven’t tried pair coding with a quality llm. In fact these llms give experienced developers super powers; you can be crazy productive with them.

If you think we are close to the maximum useful software in the world already, then maybe. I do not believe that. Seeing software production and time costs drop one to two orders of magnitude means we will have very different viable software production processes. I don’t believe for a second that it disenfranchises quality thinkers; it empowers them.

IncreasePosts · 2 years ago

Before it can replace the brilliant programmer, it needs to be able to replace the mediocre programmer. There is so much programming and other tech/it related work that businesses or people want, but can't justify paying even low tech salaries in America for.

So far, there is little chance of a non-technical person developing a technical solution to their problems using AI.

HarHarVeryFunny · 2 years ago

The programmers who will find LLMs most useful are going to be those who prior to LLMs were copying and pasting from Stack Overflow, and asking questions online about everything they were doing - tasks that LLMs have precisely replaced (it has now memorized all that boilerplate code, consensus answers, and API usage examples).

The developers who will find LLMs the least useful are the "brilliant" ones who never found any utility in any of that stuff, partly because they are not reinventing the wheel for the 1000th time, but instead addressing more challenging and novel problems.

dartos · 2 years ago

No, the first profession AI was on track to decimate was artists, but that didn’t really happen.

AI just destroyed shutterstock.

langcss · 2 years ago

I believe LLMs decimating the role of a software engineer requires AGI, which the second that happens decimates all jobs.

What it may do is change the job requrements. Web/JS has decimated (reduced by 90% or more) MFC C++ jobs after all.

The programmer doesnt just write Python. That is the how... not the what.

__loam · 2 years ago

It's going to be incredible watching you people write way more code than you can feasibly maintain.

ken47 · 2 years ago

The conflict of interest might have something to do with the fact that OpenAI's CEO/founder was once a major figure in Y Combinator. But I think you wanted to insinuate that the conflict of interest ran in the other direction.

Once ChatGPT can even come close to replacing a junior engineer, you can retry your claim. The progression of the tech underlying ChatGPT will be sub-linear.

talldayo · 2 years ago

I would better believe that if any superior software was being primarily designed by AI.

Deleted Comment

IshKebab · 2 years ago

I doubt it. It can do some impressive stuff for sure, but I very rarely get a perfectly working answer out of ChatGPT. Don't get me wrong, it's often extremely useful as a starting point and time saver, but it clearly isn't close to replacing anyone vaguely competent.

noch · 2 years ago

The important point is, I feel, that most people are not even at the level of intelligence of a "a mediocre, but not completely incompetent, graduate student." A mediocre graduate science student, especially of the sort who graduates and doesn't quit, is a very impressive individual compared to the rest of us.

For "us", having such a level of intelligence available as an assistant throughout the day is a massive life upgrade, if we can just afford more tokens.

a_wild_dandan · 2 years ago

My sheer productivity boost from these models is miraculous. It's like upgrading from a text editor to a powerful IDE. I've saved a mountain of hours just by removing tedious time sinks -- one-off language syntax, remembering patterns for some framework, migrating code, etc. And this boost applies to nearly all of my knowledge work.

Then I see contrarians claiming that LLMs are literally never useful for anyone, and I get "don't believe your lying eyes" vibes. At this point, such sentiments feel either willfully ignorant, or said in bad faith. It's wild.

sn9 · 2 years ago

Anyone intelligent enough to make a living programming likely has more than enough IQ to become a mediocre somewhat competent graduate student in math.

They just don't have the background, and probably lack the interest to dedicate studying for a few years to get to that level.

kiba · 2 years ago

We are more limited by our emotions, and then our skills in learning and acquiring knowledge.

Intelligence is probably a distant third.

thewanderer1983 · 2 years ago

>A mediocre graduate science student, especially of the sort who graduates and doesn't quit, is a very impressive individual compared to the rest of us.

Incorrect. University graduates shows a good work ethic, a certain character and a ability to manage time. It's not a measure of being better than the rest of humanity. Also, it's not a good measure of intelligence. If you only want to view the world through credentials. Academics don't consider your intelligence until you have a Ph.D and X years of work in your field. Industry only uses graduates as a entry requirement for junior roles and then favors and cares only about your years of experience after that. Given that statement I can only assume you haven't been to University. You are mistaken to think, especially in time we are in now that the elite class are any more knowledgeable then you are.

fumeux_fume · 2 years ago

Rewind your mind to 1950 and reading that the future is chatting with bots about solving math homework.

nathanasmith · 2 years ago

They would be wondering why it took so long.

ksec · 2 years ago

Which is why I think the AI era isn't hype but very much real. Jensen said AI has reached the era of iPhone.

We wont have AGI or ASI, whatever definition people have with those terms in the next 5 - 10 years. But I would often like to refer AI as Assisted or Argumented Intelligence. And it will provide enough value that drives current Computer and Smartphone sales for at least another 5 - 10 years. Or 3-4 cycles.

j_timberlake · 2 years ago

Terry is a genius that can get that value out of an LLM.

Average Joe can't do anything like that yet, both because he won't be as good at prompting the model, and because his problems in life aren't text-based anyway.

fnordpiglet · a year ago

I think this is where multi modal LLMs are so powerful. The ability to directly speak to the LLM with your voice is huge.

bamboozled · 2 years ago

Remind your mind to 1850, imagine seeing a lightbulb.

talldayo · 2 years ago

To be honest, I have gotten 100x more useful answers out of Siri's WolframAlpha integration than I ever have out of ChatGPT. People don't want a "not completely incompetent graduate student" responding to their prompts, they want NLP that reliably processes information. Last-generation voice assistants could at least do their job consistently, ChatGPT couldn't be trusted to flick a light switch on a regular basis.

meowface · 2 years ago

I use both for different things. WolframAlpha is great for well-defined questions with well-defined answers. LLMs are often great for anything that doesn't fall into that.

fnordpiglet · 2 years ago

I use home assistant with the extended open ai integration from HACS. Let me tell you, it’s orders of magnitude better than generic voice assistants. It can understand fairly flexibly my requests without me having a literal memory of every device in the house. I can ask for complex tasks like turning every light in the basement on without there being a zone basement by inferring from the names. I have air quality sensors throughout and I can ask it to turn on the fan in areas with low air quality and if literally does it without programming an automation.

Usually Alexa will order 10,000 rolls of toilet paper and ship them to my boss when I ask it to turn on the bathroom fan.

Personally tho the utility of this level of skill (beginner grad in many areas) for me personally is in areas I have undergraduate questions in. While I literally never ask it questions in my field, I do for many other fields I don’t know well to help me learn. over the summer my family traveled and I was home alone so I fixed and renovated tons of stuff I didn’t know how to do. I work a headset and had the voice mode of ChatGPT on. I just asked it questions as I went and it answered. This enabled me to complete dozens of projects I didn’t know how to even start otherwise. If I had had to stop and search the web and sift through forums and SEO hell scapes, and read instructions loosely related and try to synthesize my answers, I would have gotten two rather than thirty projects done.

Karrot_Kream · 2 years ago

How does this square up with literally what Terence Tao (TFA) writes about O1? Is this meant to say there's a class of problems that O1 is still really bad at (or worse than intuition says it should be, at least)? Or is this "he says, she says" time for hot topics again on HN?

thelastparadise · 2 years ago

Wait til you generate WolframAlpha queries from natural language using Claude 3.5 and use it to interpret results as well.

segmondy · 2 years ago

Then you have a skill issue. 10 million paying are for GPT monthly because a large of them are getting useful value out of it. WolframAlpha has been out for a while and didn't take off for a reason. "GPT couldn't be trusted to flick a light switch on a regular basis" pretty much implies you are not serious or your knowledge about the capabilities of LLM is pretty much dated or derived from things you have read.

TrackerFF · 2 years ago

Even more amazing, there plenty - PLENTY - of posters here that routinely either completely shit on LLMs, or casually dismiss them as "hype", "useless", and what have you.

I've been saying this for quite some time now, but some people are in for a very rude awakening when the SOTA models 5-10 years from now are able to completely replace senior devs and engineers.

Better buckle up, and start diversifying your skills.

ramraj07 · 2 years ago

The way I see it these models especially O1 is an intelligence booster. If you start with zero it gives you back zero. Especially if you’re just genuinely trying to use it and not just trying to do some gotcha stuff.

zeroonetwothree · 2 years ago

Not sure how this post is evidence of AIs replacing senior devs.

achierius · 2 years ago

Diversifying to what? When AI can fully replace senior developers the world as we know it is over. Best case capitalism enters terminal decline: buy rifles. Worst case, hope that whatever comes out the either side is either benevolent or implodes quickly.

Dead Comment

meroes · 2 years ago

I mean paying several hundred to thousands of grad students to RLHF for several years and you get a corpus of grad-student text. I'm not surprised at all. AI companies hire grad students to RLHF in every subject matter (chemistry, physics, math, etc).

The grad-students write the prompts, correct the model, and all of that is fed into a "more advanced" model. It's corpi of text. Repeat this for every grade level and subject.

Ask the model that's being trained on chemistry grad level work a simple math question and it will probably get it wrong. They aren't "smart". It's aggregations of text and ways to sample and then predict.

fnordpiglet · 2 years ago

Except you’re talking about a general purpose foundation model that’s doing all these subjects at once. It’s not like you choose the subject specific model with Claude or gpt-01.

The key isn’t whether these things are smart or not. The key is that they put something that can answer basic grad level questions on almost any subject. For people that don’t have a graduate level education in any subject this is a remarkable tool.

I don’t know why the statement that “wow this is useful and a remarkable step forward” is always met with “yeah but it’s not actually smart.” So? Half of all humans have an IQ less than 100. They’re not smart either. Is this their value? For a machine, being able to produce accurate answers to most basic graduate level questions is -science fiction- regardless of whether it’s “smart.”

The NLP feat alone is stunning, and going from basically one step above gibberish to “basic grad school” in two years is a mouth dropping rate of change. I suspect folks who quibble over whether it’s “real intelligence” or simply a stochastic parrot have lost the ability to dream.

Dead Comment