Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown

Show HN: Sosumi.ai – Convert Apple Developer docs to AI-readable Markdown sosumi.ai/...

I got tired of Claude hallucinating Swift APIs. It does a good job at Python and TypeScript, but ask it about SwiftUI and it's basically guessing.

The problem? Apple's docs are JavaScript-rendered, so when you paste URLs into AI tools, they just see a blank page. Copy-pasting works but... c'mon.

So I built something that converts Apple Developer docs to clean markdown. Just swap developer.apple.com with sosumi.ai in any Apple docs URL and you get AI-readable content.

For example:

- Before: https://developer.apple.com/documentation/swift/double

- After: https://sosumi.ai/documentation/swift/double

The site itself is a small Hono app running on Cloudflare Workers. Apple's docs are actually available as structured data, but Apple doesn't make it obvious how to get it. So what this does is map the URLs, fetch the original JSON, and render as Markdown.

It also provides an MCP interface that includes a tool to search the Apple developer website, which is helpful.

Anyway, please give this a try and let me know what you think!

How to reliably HTML to MD for any page on the internet? I remember struggling with this in the past

How hard would it be to build an MCP that's basically a proxy for web search except it always tries to build the markdown version of the web pages instead of passing HTML?

Basically Sosumi.ai but instead of working on only for Apple docs it works for any web page (including every doc on the internet)

_mattt · 3 days ago

I think HTML -> Markdown is a bit of a red herring.

In many cases, a Markdown distillation of HTML can improve the signal-to-noise ratio — especially for sites mired in <div> tag soup (intentionally or not). But that's an optimization for token efficiency; LLMs can usually figure things out.

The motivation behind Sosumi is better understood as a matter of accessibility. The way AI assistants typically fetch content from the web precluded them from getting any useful information from developer.apple.com.

You could start to solve the generalized problem with an MCP that 1) used a headless browser to access content for sites that require JS, and 2) used sampling (i.e. having a tool use the host LLM) to summarize / distill HTML.

crazylogger · 4 days ago

https://pure.md is exactly what you're looking for.

But stripping complex formats like html & pdf down to simple markdown is a hard problem. It's nearly impossible to infer what the rendered page looks like by looking at the raw html / pdf code. https://github.com/mozilla/readability helps but it often breaks down over unconventional div structures. I heard the state of the art solution is using multimodal LLM OCR to really look at the rendered page and rewrite the thing in markdown.

Which makes me wonder: how did OpenAI make their model read pdf, docx and images at all?

michaelmior · 4 days ago

There are APIs such as Jina AI's reader API that do this pretty well. It doesn't produce output as clean as Sosumi for Apple docs, but it's free and does a decent job.

https://jina.ai/reader

tiahura · 4 days ago

https://github.com/vivekVells/mcp-pandoc

dewey · 5 days ago

For those wondering about the name, it has a fun story behind it: https://en.wikipedia.org/wiki/Sosumi

CharlesW · 4 days ago

Jim Reekes talking about this in a deleted scene from the documentary Welcome to Macintosh: https://www.youtube.com/watch?v=k4J8AF25PjA

dewey · 4 days ago

There's also a episode on https://www.macintosh.fm about the topic that I can recommend.

jbgt · 4 days ago

When I was a kid I used Sosumi as my chime on the apple IIGS!

oneeyedpigeon · 5 days ago

"AI-readable Markdown" — can't we just say "Markdown"? I'm serious about this, why are we focussing on making things accessible to AI when they should just be machine-readable and accessible to human beings in the first place? No need to taint this by bringing AI into it.

forrestthewoods · 5 days ago

> why are we focussing on making things accessible to AI

Because that’s the authors actual goal? To take a web page that looks fine to human eyes but is unintuitively not accessible to AI. That’s genuinely useful and valuable.

Sure it’s no different than converting it to markdown for human eyes. But it’s important to be clear about not just WHAT but also WHY.

C’mon now. This isn’t controversial or even bad.

socalgal2 · 4 days ago

Makes one wonder what apple’s actual goal is

DeepYogurt · 4 days ago

I mean.... it could have broader appeal without artificially restricting its audience

qazxcvbnmlp · 4 days ago

Great promise; sometimes need to reference docs to build context.

I looked at the examples you posted and did a quick glance. For example

'''init?(exactly: Float80)'''

the tool converted it to

'''- [initexactly-63925](/documentation/Swift/Double/init(exactly:)-63925)'''

To achieve its goal I would be worried that it dropped the verbatim function signature. Claude still figured it out, but for more obscure stuff that could be an issue.

_mattt · 4 days ago

Thanks for pointing that out. That’s most likely a mistake in how I’m translating into Markdown. I’ll look into this.

Following up — I just pushed a fix for this. This latest version significantly improves how references like protocol conformances and default implementations are rendered.

danielfalbo · 4 days ago

Someone · 4 days ago

Hm, I would have extracted the markdown from the Swift source code. That’s what Apple uses to generate their pages, using https://www.swift.org/documentation/docc/.

For example, AFAIK, https://github.com/swiftlang/swift/blob/main/stdlib/public/c... is used to generate https://developer.apple.com/documentation/swift/array.

This is only possible for some of Apple‘s open source Swift code, including the Swift standard library. This is not the case for hundreds of other SDK frameworks, such as SwiftUI.

Even for those open source projects, there is still some value added in the generated documentation that isn’t directly available from documentation comments, such as type members and protocol conformances (though a LLM could certainly suss that out with the right context).

pomber · 4 days ago

Nice! Do you think it could be adapted to other docs sites?

I made a small clone of the tutorials section (https://clone-swiftui-tutorial.vercel.app/) where the content is already Markdown (and use codehike to turn the markdown into a rich UI). This made me realize that codehike is AI-friendly, in the sense that even for non-linear UIs the original content is still AI-readable Markdown.

jcoletti · 5 days ago

This is awesome and timely for me...going to give it a whirl. Thanks for building. Also, there should totally be an easter egg where clicking something somewhere plays the sound!

Great idea! I just added that. Try clicking the icon in the header.

jcoletti · 3 days ago

Ha, amazing and so nostalgic

smerrill25 · 5 days ago

As someone who is currently building my first iOS app, I am extremely happy to have this. This will be much nicer doing my animation documentation.