machinelearning (u/machinelearning)

machinelearning commented on Show HN: Automate robot data quality improvement github.com/RoboticsData/s... · Posted by u/machinelearning

marshavoidance · 2 months ago

this adds a tool to “grade” robot demo episodes by analyzing blur, collisions, and movement smoothness, then filters the bad ones out of the dataset. Seems like a pragmatic way to tackle the data quality problem in robotics would be great to see how much it moves the needle in real training runs.

machinelearning · a month ago

Training is expensive so I wouldn't necessarily call it "pragmatic".

- the tool's goal is actually to provide a lightweight, practical way to avoid wasting training cycles on bad data.

Evals for robotics are also expensive.

- validation loss is a poor proxy of robot performance because success is underconstrained by imitation learning data

- most robot evals today are either done in sim (which at best serves as a proxy) or by a human scoring success in the real world (which is expensive).

It's great if you have evals and want to backtrack (we're building tools for that too) but you definitely don't want to discover you have bad data after all that effort (learned that the hard way, multiple times).

The metrics the tool scores vary from tedious to impossible for a human to sanity check so there's some non-obvious practical value in automating some of it.

machinelearning commented on Fei-Fei Li: Spatial intelligence is the next frontier in AI [video] youtube.com/watch?v=_PioN... · Posted by u/sandslash

jandrewrogers · 6 months ago

I appreciate the video and generally agree with Fei-Fei but I think it almost understates how different the problem of reasoning about the physical world actually is.

Most dynamics of the physical world are sparse, non-linear systems at every level of resolution. Most ways of constructing accurate models mathematically don’t actually work. LLMs, for better or worse, are pretty classic (in an algorithmic information theory sense) sequential induction problems. We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch.

There are a bunch of fundamental computer science problems that stand in the way, which I was schooled on in 2006 from the brightest minds in the field. For example, how do you represent arbitrary spatial relationships on computers in a general and scalable way? There are no solutions in the public data structures and algorithms literature. We know that universal solutions can’t exist and that all practical solutions require exotic high-dimensionality computational constructs that human brains will struggle to reason about. This has been the status quo since the 1980s. This particular set of problems is hard for a reason.

I vigorously agree that the ability to reason about spatiotemporal dynamics is critical to general AI. But the computer science required is so different from classical AI research that I don’t expect any pure AI researcher to bridge that gap. The other aspect is that this area of research became highly developed over two decades but is not in the public literature.

One of the big questions I have had since they announced the company, is who on their team is an expert in the dark state-of-the-art computer science with respect to working around these particular problems? They risk running straight into the same deep, layered theory walls that almost everyone else has run into. I can’t identify anyone on the team that is an expert in a relevant area of computer science theory, which makes me skeptical to some extent. It is a nice idea but I don’t get the sense they understand the true nature of the problem.

Nonetheless, I agree that it is important!

machinelearning · 6 months ago

"Most ways of constructing accurate models mathematically don’t actually work" > This is true for almost anything at the limit, we are already able to model spatiotemporal dynamics to some useful degree (see: progress in VLAs, video diffusion, 4D Gaussians)

"We’ve known for well over a decade that you cannot cram real-world spatial dynamics into those models. It is a clear impedance mismatch" > What's the source that this is a physically impossible problem? Not sure what you mean by impedance mismatch but do you mean that it is unsolvable even with better techniques?

Your whole third paragraph could have been said about LLMs and isn't specific enough, so we'll skip that.

I don't really understand the other 2 paragraphs, what's this "dark state-of-the-art computer science" you speak of and what is this "area of research became highly developed over two decades but is not in the public literature" how is "the computer science required is so different from classical AI research"?

machinelearning commented on Good Writing paulgraham.com/goodwritin... · Posted by u/oli5679

pdonis · 7 months ago

> Are the standards for whether something “sounds bad” based on the average person’s reading or the intended audience.

As pg describes it in the article, it's neither; it's based on the writer's judgment. The writer of course is writing for some intended audience, and their judgment of what sounds good or sounds bad should be influenced by that. But pg is describing the writer's process of judging what they write.

machinelearning · 7 months ago

> The reason is that it makes the essay easier to read. It's less work to read writing that flows well. How does that help the writer? Because the writer is the first reader

Note that the writer's judgement only serves as an initial proxy for how well the essay reads. This implies that the reader, whoever that is, is the true judge of how well it reads. My point is that that group is ill defined.

If it were sufficient for the writer to be the only judge of how well something reads, surely PG wouldn't feel the need to have other proofread his essays. And surely it is not sufficient for someone who lacks taste to judge their own writing as good.

The way I read that statement is the same as the startup advice of "build what you would yourself want". However you still have to validate that the market exists and is big.

There is really nothing profound in that paragraph anyway, all it is saying is that a writer should edit and proofread their work. That whole paragraph could be deleted honestly. It is obvious table stakes for one to edit their work. What differentiates good from bad is a matter of taste + who is judging it.

machinelearning commented on Good Writing paulgraham.com/goodwritin... · Posted by u/oli5679

rossdavidh · 7 months ago

So, this would seem to be fairly easy to test empirically. Get a reasonably object measurement of the quality of writing, and use it on something where you know if it's true:

1) court testimony which we know (from outside evidence) is either true or not true 2) scientific papers which we know to have been reproducible, or not 3) stock pundits predictions about the future of some company or other, which we know with hindsight to have been accurate or not

Much more convincing to me than any amount of good writing about writing, would be to have some empirical evidence.

machinelearning · 7 months ago

So one thing to note is that the essay mentions that it refers specifically to "writing that is used to develop ideas" vs. "writing meant to describe others ideas".

The way I interpret this is that it refers to claims that build on each other to come to a conclusion. So the way to test for truth is to somehow test each claim and the conclusion, which could vary in difficulty based on the kind of claims being made.

As this essay exemplifies, it is difficult to test for truth if you make broad claims that are so imprecise that they can't be verified or don't tell you anything interesting when verified using reasonable assumptions.

machinelearning commented on Good Writing paulgraham.com/goodwritin... · Posted by u/oli5679

pdonis · 7 months ago

That writing that sounds bad is likely to be wrong.

machinelearning · 7 months ago

The issue with this article is that it is very imprecise.

Are the standards for whether something “sounds bad” based on the average person’s reading or the intended audience.

In its most general form (how the median article sounds to the median person), the argument is pretty vacuous.

Most writing discusses simple ideas and they should sound good (familiar, easy, pleasurable) to the median person.

But the most valuable kind of writing could sound tedious and filled with incomprehensible terminology to the median person but concise and interesting to the intended audience.

The current way the idea is stated doesn’t sound correct because you can convincingly defend all 4 quadrants of the truth table.

machinelearning commented on Good Writing paulgraham.com/goodwritin... · Posted by u/oli5679

machinelearning · 7 months ago

> “So here we have the most exciting kind of idea: one that seems both preposterous and true.”

Am I missing something or is the “seems true” part taking too many liberties here?

If anything, as described in the previous few sentences, the premise seems false, not true.

Kind of ironic since the line sounds right but isn’t rigorously right, so it undercuts the main argument.