ege_erdil (u/ege_erdil)

ege_erdil commented on How to Automate Software Engineering mechanize.work/blog/how-t... · Posted by u/Tamaybes

AnimalMuppet · 10 months ago

At the risk of coming across as a smart aleck, 40 years of experience building software.

What experienced software engineers have is a sense of taste - this looks like good code and/or design, that doesn't. But they don't have data; they have, at best, a couple of anecdotes. It's more a sense of "that was harder to work with than it should have been; that approach seems to have drawbacks". But you only get a few examples of that in a career.

And there are very few outfits compiling usable data that could shape the approaches that software engineers use.

So I don't think how humans got there was primarily data.

ege_erdil · 10 months ago

that's not the relevant data i'm talking about

how much real-world data do you think went into the evolution of the human brain and all its learning algorithms?

having 40 years of experience building software gives you no more insight into that than having 40 years of experience using language gives you insight into where your language skills come from

ege_erdil commented on How to Automate Software Engineering mechanize.work/blog/how-t... · Posted by u/Tamaybes

mattgreenrocks · 10 months ago

OpenAI spent several years prior to ChatGPT getting AI to play Dota 2 well. They got some good results out of that, but it was a subset of the game: only a handful or two of characters. I’m not sure why they stopped; maybe that was when they pivoted to something more general?

Regardless, software dev I consider way more dimensional (eg nuanced) than Dota 2, even if a lot of patterns recur on a smaller scale in the code itself. If they weren’t able to crack Dota 2, why should I believe that software eng is just around the corner?

ege_erdil · 10 months ago

we don't think it's just around the corner

ege_erdil commented on How to Automate Software Engineering mechanize.work/blog/how-t... · Posted by u/Tamaybes

jackb4040 · 10 months ago

> The key question now is: what data do we need, exactly?

What if I told you that's not the key question, and the "more data" approach has obviously and publicly hit a wall that requires causal reasoning to move past?

ege_erdil · 10 months ago

then i would disagree

ege_erdil commented on How to Automate Software Engineering mechanize.work/blog/how-t... · Posted by u/Tamaybes

AnimalMuppet · 10 months ago

Pretty sure that however humans crossed it, it wasn't just "a data problem".

ege_erdil · 10 months ago

what makes you sure about that?

ege_erdil commented on How to Automate Software Engineering mechanize.work/blog/how-t... · Posted by u/Tamaybes

dasil003 · 10 months ago

> The roadmap to success will most likely start with training or fine-tuning on data from human professionals performing the task, and proceed with reinforcement learning in custom environments designed to capture more of the complexity of what people do in their jobs.

> [...]

> We think this is essentially a data problem, not an algorithms problem.

This is extremely hand-wavy. How are you going to instrument the various thought processes and non-verbal communication that goes into building successful software? A huge part of it is intuition about what makes sense to other humans. It's related to the idea of common sense, but in the software world there's this layer of unforgiving determinism and rigidity that most humans don't want to deal with. I just don't see how AI crosses that chasm.

ege_erdil · 10 months ago

how do you think humans cross that chasm?

ege_erdil commented on Chinchilla Scaling: A replication attempt arxiv.org/abs/2404.10102... · Posted by u/tosh

saurabh20n · 2 years ago

Looks like you’re one of the authors.

It would be nice if you could post if the actual data matches your reconstruction—now that you have it in hand. Would help us not worry about the data provenance and focus on the result you found.

ege_erdil · 2 years ago

we're not sure if the actual data exactly matches our reconstruction, but one of the authors pointed out to us that we can exactly reproduce their scaling law if we make the mistake they made when fitting it to the data

what they did was to take the mean of the loss values across datapoints instead of summing them and used L-BFGS-B with the default tolerance settings, so the optimizer terminated early, and we can reproduce their results with this same mistake

so our reconstruction appears to be good enough

ege_erdil commented on Chinchilla Scaling: A replication attempt arxiv.org/abs/2404.10102... · Posted by u/tosh

cgearhart · 2 years ago

TL;DR—couldn’t exactly replicate their results, but broadly confirmed their findings. They agree that the optimal range is 5–40 tokens per parameter, and close to 20 for the “chinchilla” model from the original paper.

Very unusual choice to reconstruct the dataset by eyeballing the graph in the source paper (why not just ask for it…?) and it’s not really clear why the result is dressed up behind the salacious-seeming abstract.

ege_erdil · 2 years ago

we didn't eyeball the graph, there are more accurate ways of extracting the data from a pdf file than that

we did ask for the data but got no response until we published on arxiv

what is supposed to be "salacious" about the abstract?

ege_erdil commented on Chinchilla Scaling: A replication attempt arxiv.org/abs/2404.10102... · Posted by u/tosh

warbaker · 2 years ago

Calling this a "replication attempt" implied to me that they tried to replicate the Chinchilla Scaling paper and found that it did not replicate, which would be a very big deal!

Instead, they just redid the analysis based on a figure in the paper and found that the old model with slightly different parameters gave a better fit to the data. This is a valuable contribution, but a bit over-stated by the paper title, and the confrontational, "gotcha" tone of the paper is unwarranted.

A better framing would have been something like "Chinchilla Scaling: Reanalyzed".

ege_erdil · 2 years ago

one of their three approaches does not replicate and it's because of a software bug in the optimizer they used, i don't know what else we were supposed to say

ege_erdil commented on Chinchilla Scaling: A replication attempt arxiv.org/abs/2404.10102... · Posted by u/tosh

magnio · 2 years ago

> To extract the data from the figure, we first downloaded the PDF from Hoffmann et al.’s arXiv submission and saved it in SVG format. We then parsed the SVG content to navigate and search the SVG structure. Within the SVG, we identified the group of points representing the scatter plot data and iterated over each point to extract its fill color and position (x and y coordinates) using the attributes of the corresponding SVG elements.

> To map the SVG coordinates to the model size and training FLOP values, we used the location of the labels or ticks on the respective axes. This allowed us to establish a correspondence between the SVG coordinates and the actual data values represented in the plot.

They ... reconstructed the data ... from a plot ... using ruler and eyes? Why not just emailed the original authors for the raw data? I can't help but feel like @yuvaltheterrible debunking papers.

ege_erdil · 2 years ago

we did and gave them a two week grace period to respond, but they only responded to us after we published on arxiv

also, we didn't reconstruct the data using a ruler, you can automate that entire process so that it's much more reliable than that