Readit News logoReadit News
Posted by u/oshams 2 years ago
Show HN: Auto Wiki – Turn your codebase into a Wikiwiki.mutable.ai...
Hi HN! I’m Omar from Mutable.ai. We want to introduce Auto Wiki (https://wiki.mutable.ai/), which lets you generate a Wiki-style website to document your codebase. Citations link to code, with clickable references to each line of code being discussed. Here are some examples of popular projects:

React: https://wiki.mutable.ai/facebook/react

Ollama https://wiki.mutable.ai/jmorganca/ollama

D3: https://wiki.mutable.ai/d3/d3

Terraform: https://wiki.mutable.ai/hashicorp/terraform

Bitcoin: https://wiki.mutable.ai/bitcoin/bitcoin

Mastodon: https://wiki.mutable.ai/mastodon/mastodon

Auto Wiki makes it easy to see at a high level what a codebase is doing and how the work is divided. In some cases we’ve identified entire obsolete sections of codebases by seeing a section for code that was no longer important. Auto Wiki relies on our citations system which cuts back on hallucinations. The citations link to a precise reference or definition which means the wiki generation is grounded on the basis of the code being cited rather than free form generation.

We’ve run Auto Wiki on the most popular 1,000 repos on GitHub. If you want us to generate a wiki of a public repo for you, just comment in this thread! The wikis take time to generate as we are still ramping up our capacity, but I’ll reply that we’ve launched the process and then come back with a link to your wiki when it’s ready.

For private repos, you can use our app (https://app.mutable.ai) to generate wikis. We also offer private deployments with our own model for enterprise customers; you can ping us at info@mutable.ai. Anyone that already has access to a repo through GitHub will be able to view the wiki, only the person generating the wikis needs to pay to create them. Pricing starts at $4 and ramps up by $2 increments depending on how large your repo is.

In an upcoming version of Auto Wiki, we’ll include other sources of information relevant to your code and generate architectural diagrams.

Please check out Auto Wiki and let us know your thoughts! Thank you!

teraflop · 2 years ago
Cool concept. Right off the bat I see some big issues with the generated CPython documentation:

> This provides a register-based virtual machine that executes the bytecode through simple opcodes.

Python's VM is stack-based, not register-based.

> The tiered interpreter in …/ceval.c can compile bytecode sequences into "traces" of optimized microoperations.

No such functionality exists in CPython, as far as I know.

> The dispatch loop switches on opcodes, calling functions to manipulate the operand stack. It implements stack manipulation with macros.

No it doesn't. If you look at the bytecode interpreter, it's full of plain old statements like `stack_pointer += 1;`.

> The tiered interpreter is entered from a label. It compiles the bytecode sequence into a trace of "micro-operations" stored in the code object. These micro-ops are then executed in a tight loop in the trace for faster interpretation.

As mentioned above, this seems to be a complete hallucination.

> During initialization, …/pylifecycle.c performs several important steps: [...] It creates the main interpreter object and thread

No, the code in this file creates an internal thread state object, corresponding to the already-running thread that calls it.

> References: Python/clinic/import.c.h The module implements finding and loading modules from the file system and cached bytecode.

This is kinda sorta technically correct, but the description never mentions the crucial fact that most of this C code only exists to bootstrap and support the real import machinery, which is written in Python, not C. (Also, the listed source file is the wrong one: it just contains auto-generated function wrappers, not the actual implementations.)

> Core data structure modules like …/arraymodule.c provide efficient implementations of homogeneous multidimensional arrays

Python's built-in array module provides only one-dimensional arrays.

And so on.

nerdponx · 2 years ago
Great example of plausible but completely incorrect outputs from an AI model that would go largely undetected by a non-expert human.
oshams · 2 years ago
Thank you for this feedback. We actually have an Auto Wiki v2 in the works which is even higher quality, would be interesting to see how it changes when that comes out.
faizshah · 2 years ago
Can you talk a little bit about the crawler or what information are you feeding the agent about the repostory? My main concern is that this is just hallucinating the documentation and that with the more well known repos like React it can pull the data from training data like blogs etc.

I think the concept is really great would just like to understand especially for enterprise use cases.

GrinningFool · 2 years ago
I would expect something like this to output only factually correct documentation since it would be used as reference; but it sounds like even under the upcoming v2 that's not the case?
hk__2 · 2 years ago
That’s nice but the name is confusing: it’s not generating a wiki at all, but a documentation website with a Wikipedia-like theme. Wikis are collaborative websites; Wikipedia is only one of them.
userbinator · 2 years ago
...and I thought it was a wiki about cars.
oshams · 2 years ago
Apologies for the confusion, we are thinking of adding the ability to edit the wikis. What do you think?
e12e · 2 years ago
A wiki is a wiki wiki (quick, fast) web. It's defining characteristic is quickly and easily user-editable.

Your current product isn't a wiki generator, it's a website generator.

https://en.m.wikipedia.org/wiki/Wiki

velox_neb · 2 years ago
Reading these wikis makes me feel we need to invent some visual convention to indicate AI-generated text. Like a particular color or font. This would make it so people don't feel cheated after they realize they just spent several minutes trying to make sense of something churned out by an LLM. (I mean this as a voluntary design enhancement for sites that want to be nice, of course people can always cheat.)
charcircuit · 2 years ago
I think this would be better as a more general bot authored distinction.
lucasban · 2 years ago
Color or font would not necessarily be accessible, a consistent icon or tag around it would likely be easier for screen readers or other low vision situations
oshams · 2 years ago
Hi! Appreciate your comment, I personally think AI generated content is the future. The reactions people are having to AI generated content is very similar to the reactions to the printing press whereby anyone could write anything and mass distribute it. I think people also had similar reactions to Google indexing the web. (Note: I'm not discounting existential risk, that is real but another topic for another day.)
kuhewa · 2 years ago
You may lose some potential audience with this kind of comment, saying AI content is the future is a non-sequitur to an idea about offsetting the very real time-wasting quality issues that it yields for the near future. The printing press analogy might be apt if the Gutenberg bible was full of verses that were modified by the press itself and looked coherent at a glance but with totally different (and sometimes nonsensical) meaning from the original. The Gutenberg press would have still had incredible potential but it would be more than useful to be able to identify book copies that may be affected.
wintogreen74 · 2 years ago
Pretty big stretch to compare a massive disruption to the medium with a massive disruption to the content. The press is far closer to the web than generative content. The output from AI is closer to the unibomber's manifesto, and the only entity calling for burning detractors at the stake is vested-interest individuals like you, and AI itself.
elicksaur · 2 years ago
With how the typical search engine experience is going these days, is comparing yourself to Google really a good thing?
8organicbits · 2 years ago
I think this falls into a common mistake people make about documentation. Good documentation doesn't explain what the code does, it explains why the code is written the way it is, the constraints that caused this decision to be made and even alternatives not considered. You cant really guess those things by looking a code. I'm a fan of ADRs for that reason.

Honestly this looks overly verbose to me, a common LLM problem. The mistakes others cite, are also pretty concerning.

https://adr.github.io/

_a_a_a_ · 2 years ago
Good point, completely agree and interesting link, thanks
paxys · 2 years ago
The entire point of a Wiki is that it can be collaboratively edited. This is static documentation, just with a Wikipedia-like UI.
oshams · 2 years ago
Would this be your top request? We're thinking of adding that functionality.
loktarogar · 2 years ago
Regardless if it's a wiki or not right now, documentation that is wrong and cannot be fixed is worthless
hobo_mark · 2 years ago
Not the op but I don't think it's a request, it's semantics. You are calling wiki something that is really not a wiki, because wikis are editable.
bbor · 2 years ago
Consider leaning into “wiki” features fully so you don’t have to lose your name, and adding an edit button isn’t gonna be enough for most people IMO. As you can probably tell from these unusually hostile comments for HN (I think someone above even swore!), people are understandably protective of the wiki movement. It’s a bright, persistent star in a darkening internet, so seeing people use it to make money can be tough.

For example: what if instead of one document creator bot, you had an ensemble of personas that act out the motions of wiki editing - many diverse sources submitting small edits, experts reviewing edits before inclusion, etc. Basically just turning the simple 1-step citation verification you mentioned in another comment into a complex stage play of sorts, both for the sake of the bots (would probably cut down on hallucinations to follow Wikipedia procedures) and the humans (what’s the point of a verification tool that you can’t follow along with and meta-verify?). This also solves the edit problem, since humans can follow the same procedure and get the same automated reviews.

It wouldnt hurt to go open source, either! Either way cool project, Godspeed.

paxys · 2 years ago
I haven't used it enough to know if the two models can feasibly coexist. It may be better to just name it more accurately. There will be immense demand for an "automatic documentation generator". A wiki, not quite as much.
dormento · 2 years ago
And its wrong! Its not difficult to find whole paragraphs that were entirely made up. LLMs are not fit for this sort of thing.
CGamesPlay · 2 years ago
I'd love to see the wiki generated for a less already-documented example. These high-profile projects are good demos and the results look compelling (I checked out AutoGPT's and NeoVim's), but these projects already have a ton of documentation that helps the model substantially. What are the smaller projects where it has to generate documentation from code (and not necessarily well-commented code) rather than existing documentation?
oshams · 2 years ago
Great point! Here's an example of an obscurer repo with a good wiki: https://wiki.mutable.ai/dadongshangu/async_FIFO
CGamesPlay · 2 years ago
Impressive. I would be interested in this once it hits general availability. I would also love to see it operate on a local repository, because I may not be hosting my source code on Github.
TheEzEzz · 2 years ago
Super cool. When I think about accelerating teams while maintaining quality/culture, I think about the adage "if you want someone to do something, make it easy."

Maintaining great READMEs, documentation, onboarding docs, etc, is a lot of work. If Auto Wiki can make this substantially easier, then I think it could flip the calculus and make it much more common for teams to invest in these artifacts. Especially for the millions of internal, unloved repos that actually hold an org together.

oshams · 2 years ago
Thank you! We like the analogy of dehydrating knowledge that can be used (hydrated) later. Beyond even unloved repos, we'd even argue broader organizational knowledge that seems to have been lost to history like Roman Concrete or how to precisely build the Saturn V could potentially be "stored" using AI.
Amigo5862 · 2 years ago
The only thing I see that this adds over existing docs-to-HTML tooling is that it uses a wikipedia-inspired theme.

Meanwhile on the negative side, it adds hallucinations. You say you "cut back" on them but as teraflop's comment shows, it still has plenty.

BTW: even the Mastodon link from your OP says "wiki not found" for me.