jawiggins (u/jawiggins)

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

sarchertech · 15 days ago

Only if it doesn’t use any unsafe code, which I don’t think is the case here.

jawiggins · 15 days ago

Is that true? I thought if you compiled a rust crate with, `#[deny(unsafe_code)]`, there would not be any issues. xmloxide has unsafe usage only in the the C FFI layer, so the rest of the system should be fine.

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

jawiggins · 15 days ago

Yes, in testing I did add four fuzzing targets to the repo:

1. fuzz_xml_parse: throws arbitrary bytes at the XML parser in both strict and recovery mode

2. fuzz_html_parse: throws arbitrary bytes at the HTML parser

3. fuzz_xpath: throws arbitrary XPath expressions at the evaluator

4. fuzz_roundtrip: parse → serialize → re-parse, checking that the pipeline never panics

Because this project uses memory safe rust, there isn't really the need to find the memory bugs that were the majority of libxml2's CVEs.

There is a valid point about logic bugs or infinite loops, which I suppose could be present in any software package, and I'm not sure of a way to totally rule out here.

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

fourthark · 15 days ago

Does it fix the security flaws that caused the original project to be shut down?

jawiggins · 15 days ago

Because it was written in C, libxml2's CVE history has been dominated by use-after-free, buffer overflows, double frees, and type confusion. xmloxide is written in pure Rust, so these entire vulnerability classes are eliminated at compile time.

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

nicoburns · 15 days ago

How does it compare to the original in terms of source code size (number of lines of code?)

jawiggins · 15 days ago

It's significantly smaller. Because Rust doesn't require header files or memory management, xmloxide is ~40k lines while libxml2 is ~150k lines.

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

kburman · 15 days ago

Amazing work! I'd love to hear more details about your workflow with Claude Code.

As a side note and this isn't a knock on your project specifically. I think the community needs to normalize disclaimers for "vibe-coded" packages. Consumers really need to understand the potential risks of relying on agent-generated code upfront.

jawiggins · 15 days ago

Yeah its a fair point. I wondered if it might be irresponsible to publish the package because it was made this way, but I suspect I'm not the first person to try and develop a package with Claude Code, so I think the best I can do is be honest about it.

As for the workflow, I think the best advice I can give is to setup as many guardrails and tools as possible, so Claude and do as many iterations before needing any intervention. So in this case I setup pre-commit hooks for linting and formatting, gave it access to the full testing suite, and let it rip. The majority of the work was done in a single thinking loop that lasted ~3 hours where Claude was able to run the tests, see what failed, and iterate until they all passed. From there, there was still lots of iterations to add features, clean up, test, and improve performance - but allowing Claude to iterate quickly on it's own without my involvement was crucial.

jawiggins commented on Show HN: Xmloxide – an agent-made Rust replacement for libxml2 github.com/jonwiggins/xml... · Posted by u/jawiggins

wooptoo · 15 days ago

A comment on libxml, not on your work: Funny how so many companies use this library in production and not one steps in to maintain this project and patch the issues. What a sad state of affairs we are in.

jawiggins · 15 days ago

Yeah I agree, maintaining OS projects has been a weird thing for a long time.

I know a few companies have programs where engineers can designate specific projects as important and give them funds. But it doesn't happen enough to support all the projects that currently need work, maybe AI coding tools will lower the cost of maintenance enough to improve this.

I do think there are two possible approaches that policy makers could consider.

1) There could probably be tax credits or deductions for SWEs who 'volunteer' their time to work on these projects.

2) Many governments have tried to create cyber reserve corps, I bet they could designate people as maintainers of key projects that they rely on to maintain both the projects as well as people skilled with the tools that they deem important.

jawiggins commented on Structured outputs on the Claude Developer Platform claude.com/blog/structure... · Posted by u/adocomplete

gradys · 4 months ago

You might be using JSON mode, which doesn’t guarantee a schema will be followed, or structured outputs not in strict mode. It is possible to get the property that the response is either a valid instance of the schema or an error (eg for refusal)

jawiggins · 4 months ago

How do you activate strict mode when using pydantic schemas? It doesn't look like that is a valid parameter to me.

No, I don't get refusals, I see literally invalid json, like: `{"field": ["value...}`

jawiggins commented on Structured outputs on the Claude Developer Platform claude.com/blog/structure... · Posted by u/adocomplete

mmoskal · 4 months ago

OpenAI is using [0] LLGuidance [1]. You need to set strict:true in your request for schema validation to kick in though.

[0] https://platform.openai.com/docs/guides/function-calling#lar... [1] https://github.com/guidance-ai/llguidance

jawiggins · 4 months ago

I don't think that parameter is an option when using pydantic schemas.

class FooBar(BaseModel): foo: list[str] bar: list[int]

prompt = """#Task Your job is to reply with Foo Bar, a json object with foo, a list of strings, and bar, a list of ints """

response = openai_client.chat.completions.parse( model="gpt-5-nano-2025-08-07", messages=[{"role": "system", "content": FooBar}], max_completion_tokens=4096, seed=123, response_format=CommentAnalysis, strict=True )

TypeError: Completions.parse() got an unexpected keyword argument 'strict'

jawiggins commented on Structured outputs on the Claude Developer Platform claude.com/blog/structure... · Posted by u/adocomplete

simonw · 4 months ago

You have to explicitly opt into it by passing strict=True https://platform.openai.com/docs/guides/structured-outputs/s...

jawiggins · 4 months ago

Are you able to use `strict=True` when using pydantic models? It doesn't seem to be valid for me. I think that only works for json schemas.

class FooBar(BaseModel): foo: list[str] bar: list[int]

prompt = """#Task Your job is to reply with Foo Bar, a json object with foo, a list of strings, and bar, a list of ints """

response = openai_client.chat.completions.parse( model="gpt-5-nano-2025-08-07", messages=[{"role": "system", "content": FooBar}], max_completion_tokens=4096, seed=123, response_format=CommentAnalysis, strict=True )

> TypeError: Completions.parse() got an unexpected keyword argument 'strict'