yxhuvud (u/yxhuvud) - Readit News

yxhuvud commented on I'm too dumb for Zig's new IO interface openmymind.net/Im-Too-Dum... · Posted by u/begoon

What's the benchmark for how long something can be pre-1.0? Seems like a nonsense argument.

yxhuvud · 2 days ago

Something can be pre-1.0 as long as there are no stability guarantees.

yxhuvud commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

KingOfCoders · 5 days ago

Did I misread the article, or did they take the tool config from the PR not the repo?

yxhuvud · 5 days ago

Unfortunately that mostly has to be the case or else the developer experience configuring these would be too bad.

yxhuvud commented on MCP doesn't need tools, it needs code lucumr.pocoo.org/2025/8/1... · Posted by u/the_mitsuhiko

jahsome · 7 days ago

Are you referring to MCP? If so, it's fully spelled out in the first sentence of the first paragraph, and links to a more thorough post on the subject. That meets 2 of the 3 criteria you've dictated.

yxhuvud · 7 days ago

That was not the case when I commented. It has obviously been updated since then.

yxhuvud commented on MCP doesn't need tools, it needs code lucumr.pocoo.org/2025/8/1... · Posted by u/the_mitsuhiko

diggan · 7 days ago

> or at least a link to some other page that explain what is going on

There is a link to a previous post by the same author (within the first ten words even!), which contains the context you're looking for.

yxhuvud · 7 days ago

A link to a previous post is not enough, though of course appreciated. But it would be something I click on after I decide if I should spend time on the article or not. I'm not going on goose chases to figure out what the topic is.

yxhuvud commented on MCP doesn't need tools, it needs code lucumr.pocoo.org/2025/8/1... · Posted by u/the_mitsuhiko

yxhuvud · 7 days ago

First rule of writing about something that can be abbreviated: First have some explanation so people have an idea of what you are talking about. Either type out what the abbreviation stands for, have an explanation or at least a link to some other page that explain what is going on.

EDIT: This has since been fixed in link, so it is outdated.

yxhuvud commented on PYX: The next step in Python packaging astral.sh/blog/introducin... · Posted by u/the_mitsuhiko

x3n0ph3n3 · 11 days ago

You assume the OS package manager I happen to be using even has packages for some of the libraries I want to use.

yxhuvud · 11 days ago

Or for that matter, that the ones they do have are compatible with packages that comes from other places. I've seen language libraries be restructured when OS packagers got hold of them. That wasn't pretty.

yxhuvud commented on Leonardo Chiariglione – Co-founder of MPEG leonardo.chiariglione.org... · Posted by u/eggspurt

mike_hearn · 18 days ago

Universities love patent licensing. I don't think academia is the solution you're looking for.

yxhuvud · 18 days ago

The solution to that is to remove the ability to patent codecs.

yxhuvud commented on So you want to parse a PDF? eliot-jones.com/2025/8/pd... · Posted by u/UglyToad

yxhuvud · 21 days ago

Well, perhaps you are exposed only to special snowflakes of pdfs that are from a single source and somewhat well formed and easy to extract from. Other, like me, are working at companies that also have lots of PDFs, from many, many different sources, and there are no easy ways to extract structured data or even text in a way that always work.

yxhuvud commented on So you want to parse a PDF? eliot-jones.com/2025/8/pd... · Posted by u/UglyToad

throwaway4496 · 21 days ago

Yes, and don't for a second think this approach of rastering and OCR'ing is sane, let alone a reasonable choice. It is outright absurd.

yxhuvud · 21 days ago

Noone has claimed getting structured data out of pdfs are sane. What you seem to be missing is that there are no sane ways to get a decent output. The reasonable choice would be to not even try, but business needs invalidate that choice. So what remain is the absurd ways to solve the problem.

yxhuvud commented on So you want to parse a PDF? eliot-jones.com/2025/8/pd... · Posted by u/UglyToad

throwaway4496 · 21 days ago

So you parse PDFs, but also OCR images, to somehow get better results?

Do you know you could just use the parsing engine that renders the PDF to get the output? I mean, why raster it, OCR it, and then use AI? Sounds creating a problem to use AI to solve it.

yxhuvud · 21 days ago

Well, you clearly hasn't parsed a wide variety of pdfs. Because if you had, you had been exposed to pdfs that contain only images, or those that contain embedded text, but that embedded text is utter nonsense and doesn't match what is shown on the page when rendered.

And that is before we even get into text structure, because as everyone knows, reading text is easier if things like paragraphs, columns and tables are preserved in the output. And guess what, if you just use the parsing engine for that, then what you get out is a garbled mess.

u/yxhuvud

KarmaCake day3687October 2, 2009View Original