Apple's ToS pretty explicitly forbid the kind of automation required to download everything. But even if someone did that, it'd only be a snapshot in time. And a lot can can change between OS releases.
As for the hosted web app, I wanted to provide this as a public service. I plan to open source it, so anyone can self-host instead, if they're inclined.
I think this one would be slightly better if it rendered that Markdown as simple HTML if accessed through a real browser, but I can imagine even this version being pretty useful.
I think it could also make the "Small web" crowd pretty happy too.
For the Swift standard library and other open-source frameworks, you could probably extract this from documentation comments using DocC, Jazzy, or the like.
How hard would it be to build an MCP that's basically a proxy for web search except it always tries to build the markdown version of the web pages instead of passing HTML?
Basically Sosumi.ai but instead of working on only for Apple docs it works for any web page (including every doc on the internet)
In many cases, a Markdown distillation of HTML can improve the signal-to-noise ratio — especially for sites mired in <div> tag soup (intentionally or not). But that's an optimization for token efficiency; LLMs can usually figure things out.
The motivation behind Sosumi is better understood as a matter of accessibility. The way AI assistants typically fetch content from the web precluded them from getting any useful information from developer.apple.com.
You could start to solve the generalized problem with an MCP that 1) used a headless browser to access content for sites that require JS, and 2) used sampling (i.e. having a tool use the host LLM) to summarize / distill HTML.
For example, AFAIK, https://github.com/swiftlang/swift/blob/main/stdlib/public/c... is used to generate https://developer.apple.com/documentation/swift/array.
Even for those open source projects, there is still some value added in the generated documentation that isn’t directly available from documentation comments, such as type members and protocol conformances (though a LLM could certainly suss that out with the right context).
I looked at the examples you posted and did a quick glance. For example
'''init?(exactly: Float80)'''
the tool converted it to
'''- [initexactly-63925](/documentation/Swift/Double/init(exactly:)-63925)'''
To achieve its goal I would be worried that it dropped the verbatim function signature. Claude still figured it out, but for more obscure stuff that could be an issue.
If GitHub could support .docc files, that would be great. Otherwise, I still use Jazzy Docs.
Long live Jazzy.