Half the dataset being synthetic is interesting. I wonder what that actually means. They say that Datology needed 2048 H100s to generate the synthetic data. Does that mean they were generating data using other open weight LLMs? Seems like that would undermine the integrity of a "US based" dataset.
Personally, I don’t bother with nextjs at all.
LLMs reduce the effort to create a plausible PR down to virtually zero. Requiring a human to write the code is a good indicator that A. the PR has at least some technical merit and B. the human cares enough about the code to bother writing a PR in the first place.
Deleted Comment
Microsoft is doing more with Github than I can say for most of their products. I won't go to bat for the Xbox or Windows teams, but Github is... fine. Almost offensively usable.
> intermittent outages
Those seem like conflicting statements to me. Last outage was only 13 days ago: https://news.ycombinator.com/item?id=45915731.
Also, there have been increasing reports of open source maintainers dealing with LLM generated PRs: https://news.ycombinator.com/item?id=46039274. GitHub seems perfectly positioned to help manage that issue, but in all likelihood will do nothing about it: '"Either you have to embrace the Al, or you get out of your career," Dohmke wrote, citing one of the developers who GitHub interviewed.'
I used to help maintain a popular open source library and I do not envy what open source maintainers are now up against.
https://aws.amazon.com/blogs/opensource/using-strands-agents...
1: https://www.decodingdiscontinuity.com/p/open-source-inflecti...
Deleted Comment
Also, didn't said company piss people off in some way that led to Open Tofu being created?