I tried vibe coding in BASIC and it didn't go well

All these stories about vibe coding going well or wrong remind me of an old joke.

A man visits his friend's house. There is a dog in the house. The friend says that the dog can play poker. The man is incredulous, but they sit at a table and have a game of poker; the dog actually can play!

The man says: "Wow! Your dog is incredibly, fantastically smart!"

The friend answers: "Oh, well, no, he's a naïve fool. Every time he gets a good hand, he starts wagging his tail."

Whether you see LLMs impressively smart or annoyingly foolish depends on your expectations. Currently they are very smart talking dogs.

tliltocatl · a month ago

Nobody says LLMs aren't impressive. But there is a subtle difference an impressive trick and being something worth throwing 1% of GDP at. Proponents say that if we keep throwing more money at it, it will improve, but this is far from certain.

glimshe · a month ago

We've thrown fantastical amounts on some very uncertain things, like the moon landing. I think that the willingness of betting big at a potentially transformative technology such as AI is a good thing and a bit of a return to the old days when humanity still engaged in big infrastructure bets. Yes, it may fail, but that's intrinsic of any ambitious project.

mhh__ · a month ago

Some people definitely think they aren't impressive.

> 1 % of GDP

LLMs are basically the only thing genuinely new in decades , that have someone excited in basically every dept in the entirely world, why is it so bad that we spend money on them? The alternative is going back to shovelling web3 crap.

There's definitely a new generation of bullshit merchants to go with LLMs but I think they (the models) target a very different part of the brain to normal tech so in some ways they're much more resilient to usual fad archetypes (this is also why some people who are a bit jittery socially hate them)

b33j0r · a month ago

I get the sense that people might mean that the transformer paradigm might not scale. But I do not understand the argument that AI in general is hype, and that investing in it is cult-like.

It’s just a technology; one that will improve, sometimes stagnate, sometimes accelerate. Like anything else, right? I don’t see a time when we’ll just stop using AI because it “feels so trite.”

mrits · a month ago

The alternative is to not throw money at AI. The amount that we spend could be justified just under a national security budget and not even increased GDP

Dead Comment

kqr · a month ago

Somehow this also reminds me of http://raisingtalentthebook.com/wp-content/uploads/2014/04/t...

"I taught my dog to whistle!"

"Really? I don't hear him whistling."

"I said I taught him, not that he learnt it."

pyman · a month ago

A person is flying a hot air balloon and realises he’s lost. He lowers the balloon and spots a man down below. He shouts:

“Excuse me! Can you help me? I promised a friend I’d meet him, but I have no idea where I am.”

The man replies, “You’re in a hot air balloon, hovering 30 feet above the ground, somewhere between 40 and 41 degrees north latitude and between 59 and 60 degrees west longitude.”

“You must be a Prompt Engineer,” says the balloonist.

“I am,” replies the man. “How did you know?”

“Well,” says the balloonist, “everything you told me is technically correct, but it’s of no use to me and I still have no idea where I am.”

The man below replies, “You must be a Vibe Coder.”

“I am,” says the balloonist. “How did you know?”

"Because you don’t know where you are or where you’re going. You made a promise you can’t keep, and now you expect me to solve your problem. The fact is, you’re in the same position you were in before we met, but now it’s somehow my fault!"

markus_zhang · a month ago

From my experience (we have to vibe code as it becomes a norm in the company), vibe coding is most effective when the developer feeds detailed context to the agent beforehand, and gives very specific commands to it for each task. It still speeds up development quite a bit once everything goes the right direction.

6LLvveMx2koXfwn · a month ago

I vibe code in domains I am unfamiliar with - getting Claude to configure the right choice of AWS service for a specific use-case can take a very long time. But would that still be quicker than me alone with docs is hard to tell.

Marazan · a month ago

The variation I've seen on this applied to AIs is:

Fred insists to his friend that he has a hyper intelligent dog that can talk. Sceptical, the friend enquires of the dog "What's 2+2?"

"Five" says the dog

"Holy shit a talking dog!" says the friend "This is the most incredible thing that I've ever seen in my life".

"What's 3+3?"

"Eight" says the dog.

"What is this bullshit you're trying to sell me Fred?"

codeflo · a month ago

This joke is a closer analogy to reality with a small addition. After the friend is suitably impressed:

> "Holy shit a talking dog!" says the friend "This is the most incredible thing that I've ever seen in my life".

this happens:

"Yes," says Fred. "As you can see, it's already at PhD level now, constantly improving, and is on track to replace 50% of the economy in twelve months or sooner."

Confused, the friend asks:

> "What's 3+3?"

> "Eight" says the dog.

> "What is this bullshit you're trying to sell me Fred?"

Deleted Comment

totetsu · a month ago

Try playing an adversarial word game with ChatGPT. like the rules are, one player asks questions and the other is not allowed to say "yes" or "no", not allowed to reuse the same wording, and not allowed to evade the question. You'll see its tail wagging pretty quickly.

ragequittah · a month ago

You could very likely train an AI (llama?) to easily do this but trying to get a general LLM to play a game such as this doesn't make sense. Best way to get around it? Have it create a python program that will play the game correctly instead.

ak_111 · a month ago

I asked chatgpt to write a short fable about the phenomena of vibe coding in the style of Aesop:

The Owl and the Fireflies

One twilight, deep in the woods where logic rarely reached, an Owl began building a nest from strands of moonlight and whispers of wind.

"Why measure twigs," she mused, "when I can feel which ones belong?"

She called it vibe nesting, and declared it the future.

Soon, Fireflies gathered, drawn to her radiant nonsense. They, too, began to build — nests of smoke and echoes, stitched with instinct and pulse. "Structure is a cage," they chirped. "Flow is freedom."

But when the storm came, as storms do, their nests dissolved like riddles in the rain.

Only the Ants, who had stacked leaves with reason and braced walls with pattern, slept dry that night. They watched the Owl flutter in soaked confusion, a nestless prophet in a world that demanded substance.

Moral: A good feeling may guide your flight, but only structure will hold your sky.

suddenlybananas · a month ago

This is incoherent

I work on niche platforms where the amount of example code on Github is minimal, and this definitely aligns with my observations. The error rate is way too high to make "vibe coding" possible.

I think it's a good reality check for the claims of impending AGI. The models still depend heavily on being able to transform other people's work.

winrid · a month ago

Even with typescript Claude will happily break basic business logic to make tests pass.

motorest · a month ago

> Even with typescript Claude will happily break basic business logic to make tests pass.

It's my understanding that LLMs change the code to meet a goal, and if you prompt them with vague instructions such as "make tests pass" or "fix tests", LLMs in general apply the minimum necessary and sufficient changes to any code that allows their goal to be met. If you don't explicitly instruct them, they can't and won't tell apart project code from test code. So they will change your project code to make tests work.

This is not a bug. Changing project code to make tests pass is a fundamental approach to refactoring projects, and the whole basis of TDD. If that's not what you want, you need to prompt them accordingly.

bapak · a month ago

Speaking of TypeScript, every time I feed a hard type problem to LLMs they just can't do it. Sometimes I find out it's a TS limitation or just not implemented yet, but that won't stop us from wasting 40 minutes together.

rs186 · a month ago

When I vibe coded with GitHub Copilot in TypeScript, it keeps using "any" even though those variables had clear interfaces already defined somewhere in the code. This drove me crazy, as I had to go in and manually fix all those things. The only thing that helps a bit is me screaming "DO NOT EVER USE 'any' TYPE". I can't understand why it would do this.

CalRobert · a month ago

That seems like the tests don’t work?

pygy_ · a month ago

I've had a similar problem with WebGPU and WGSL. LLMs create buffers with the wrong flags (and other API usage errors), doesn't clean up resources, mix up GLSL and WGSL, write semi-less WGSL (in template strings) if you ask them to write semi-less [0] JS...

It's a big mess.

0. https://github.com/isaacs/semicolons/blob/main/semicolons.js

poniko · a month ago

Yes and if you work with a plarform that has been arround for long time like .net you will most definitely get a mix of really outdated deprecated code mixed with the latest features.

remich · a month ago

I recommend the context7 MCP tool for this exact purpose. I've been trying to really push agents lately at work to see where they fall down and whether better context can fix it.

As a test recently I instructed an agent using Claude to create a new MCP server in Elixir based on some code I provided that was written in Python. I know that, relatively speaking, Python is over-represented in training data and Elixir is under-represented. So, when I asked the agent to begin by creating its plan, I told it to reference current Elixir/Phoenix/etc documentation using context7 and to search the web using Kagi Search MCP for best practices on implementing MCP servers in Elixir.

It was very interesting to watch how the initially generated plan evolved after using these tools and how after using the tools the model identified an SDK I wasn't even aware of that perfectly fit the purpose (Hermes-mcp).

ragequittah · a month ago

This is easily solved by feeding the LLM the correct documentation. I was having problems with tailwind because of this right up until I had ChatGPT deep research come up with a spec sheet on how to use the latest version of it. Fed it into the various AIs I've been using (worked for ChatGPT, Claude, and Cursor) and no problems since.

gompertz · a month ago

Yep I program in some niche languages like Pike, Snobol4, Unicon. Vibe coding is out of the question for these languages. Forced to use my brain!

johnisgood · a month ago

You could always feed it some documentation and example programs. I did it with a niche language and it worked out really well, with Claude. Around 8 months ago.

empressplay · a month ago

I don't know if you're working with modern models. Grok 4 doesn't really know much about assembly language on the Apple II but I gave it all of the architectural information it needed in the first prompt of a conversation and it built compilable and executable code. Most of the issues I encountered were due to me asking for too much in a prompt. But it built a complete, albeit simple, assembly language game in a few hours of back and forth with it. Obviously I know enough about the Apple II to steer it when it goes awry, but it's definitely able to write 'original' code in a language / platform it doesn't inherently comprehend.

timschmidt · a month ago

This matches my experience as well. Poor performance usually means I haven't provided enough context or have asked for too much in a single prompt. Modifying the prompt accordingly and iterating usually results in satisfactory output within the next few tries.

vineyardmike · a month ago

Completely agree. I’m a professional engineer, but I like to get some ~vibe~ help on person projects after-work when I’m tired and just want my personal project to go faster. I’ve had a ton of success with go, JavaScript, python, etc. I had mixed-success with writing idiomatic Elixir roughly a year ago, but I’ve largely assumed that this would be resolved today, since every model maker has started aggressively filling training data with code, since we found the PMF of LLM code-assistance.

Last night I tried to build a super basic “barely above hello world” project in Zig (a language where IDK the syntax), and it took me trying a few different LLMs to find one that could actually write anything that would compile (Gemini w/ search enabled). I really wasn’t expecting it considering how good my experience has been on mainstream languages.

Also, I think OP did rather well considering BASIC is hardly used anymore.

andsoitis · a month ago

> The models

The models don’t have a model of the world. Hence they cannot reason about the world.

pygy_ · a month ago

I tried vibe coding WebGPU/WGSl, which is thoroughly documented, but has little actual code around, and LLMs are pretty bad at it right now.

They don't need a formal model, they need examples from which they can pilfer.

bawana · a month ago

The theory is that language is an abstraction built on top of the world and therefore encompasses all human experience of the world. The problem will arise however when the world (aka nature) acts in an unexpected way outside human experience

hammyhavoc · a month ago

"reason" is doing some heavy-lifting in the context of LLMs.

jjmarr · a month ago

I've noticed the error rate doesn't matter if you have good tooling feeding into the context. The AI hallucinates, sees the bug, and fixes it for you.

cmrdporcupine · a month ago

I find for these kinds of systems, if I pre-seed Claude Code with a read of the language manual (even the BNF etc) and a TLDR of what it is, results are far better. Just part of the initial prompt: read this summary page, read this grammar, and look at this example code.

I have had it writing LambdaMOO code, with my own custom extensions (https://github.com/rdaum/moor) and it's ... not bad considering.

nine_k · a month ago

recipe19 · a month ago

manca · a month ago

I literally had the same experience when I asked the top code LLMs (Claude Code, GPT-4o) to rewrite the code from Erlang/Elixir codebase to Java. It got some things right, but most things wrong and it required a lot of debugging to figure out what went wrong.

It's the absolute proof that they are still dumb prediction machines, fully relying on the type of content they've been trained on. They can't generalize (yet) and if you want to use them for novel things, they'll fail miserably.

h4ck_th3_pl4n3t · a month ago

I just wished the LLM model providers would realize this and instead would provide specialized LLMs for each programming language. The results likely would be better.

chuckadams · a month ago

The local models JetBrains IDEs use for completion are specialized per-language. For more general problems, I’m not sure over-fitting to a single language is any better for a LLM than it is for a human.

abrookewood · a month ago

Clearly the issue is that you are going from Erlang/Elixir to Java, rather than the other way around :)

Jokes aside, they are pretty different languages. I imagine you'd have much better luck going from .Net to Java.

tsimionescu · a month ago

Sure, it's easier to solve an easier problem, news at eleven. In particular, translating from C# to Java could probably be automated with some 90% accuracy using a decent sized bash script.

This mostly means that LLMs are good at simpler forms of pattern matching, and have much harder time actually reasoning at a significant depth. (It's not easy even for human intellect, the finest we currently have.)

nerdsniper · a month ago

Claude Code / 4o struggle with this for me, but I had Claude Opus 4 rewrite a 2,500 line powershell script for embedded automation into Python and it did a pretty solid job. A few bugs, but cheaper models were able to clean those up. I still haven't found a great solution for general refactoring -- like I'd love to split it out into multiple Python modules but I rarely like how it decides to do that without me telling it specifically how to structure the modules.

conception · a month ago

I’m curious what your process was. If you just said “rewrite this in Java” I’d expect that to fail. If you treated the llm like a junior developer or an official project, worked with them to document the codebase, come up with a plan, tasks for each part of the code base and a solid workflow prompt- I would expect it to succeed.

4hg4ufxhy · a month ago

There is a reason to go the extra mile for juniors. They eventually learn and become seniors. With AI I'd rather just do it myself and be done with it.

Yes, if you do all the difficult time consuming bits I bet it would work.

credit_guy · a month ago

If you try to ride a bicycle, do you expect to succeed at the first try? Getting AI code assistants to help you write high quality code takes time. Little by little you start having a feel for what prompts work, what don't, what type of tasks the LLMs are likely to perform well, which ones are likely to result in hallucinations. It's a learning curve. A lot of people try once or twice, get bad results, and conclude that LLMs are useless. But few people conclude that bicycles are useless if they can't ride them after trying once or twice.

They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

zer00eyz · a month ago

I will give you an example of where you are dead wrong, and one where the article is spot on (without diving into historic artifacts).

I run HomeAssistant, I don't get to play/use it every day. Here, LLM's excel at filling in the (legion) of blanks in both the manual and end user devices. There is a large body of work for it to summarize and work against.

I also play with SBC's. Many of these are "fringe" at best. LLM's are as you say "not fit for purpose".

What kind of development you are using LLM's for will determine your experience with them. The tool may or may not live up to the hype depending how "common", well documented and "frequent" your issue is. Once you start hitting these "walls" you realize that no, real reason, leaps of inference and intelligence are still far away.

> They'll never be fit for purpose. They're a technological dead-end for anything like what people are usually throwing them at, IMO.

This comment is detached from reality. LLMs in general have been proven to be effective at even creating complete, fully working and fully featured projects from scratch. You need to provide the necessary context and use popular technologies with enough corpus to allow the LLM to know what to do. If one-shot approaches fail, a few iterations are all it takes to bridge the gap. I know that to be a fact because I do it on a daily basis.

edent · a month ago

Vibe Coding seems to work best when you are already an experienced programmer.

For example "Prompt: Write me an Atari BASIC program that draws a blue circle in graphics mode 7."

You need to know that there are various graphics modes and that mode 7 is the best for your use-case. Without that preexisting knowledge, you get stuck very quickly.

baxtr · a month ago

This is a description of a “tool”. Anyone can use a hammer and chisel to carve out wood, but only an artist with extensive experience will create something truly remarkable.

I believe many in this debate are confusing tools with magic wands.

tonyhart7 · a month ago

this marketing and social media buzz that AI (artificial intelligence) that would replace human intelligence or people job for everyone didn't help either

sure it maybe someday but not today, but there are jobs that already get replaced tho for example like writing industry

Sharlin · a month ago

> I believe many in this debate are confusing tools with magic wands.

Unfortunately, it's usually the ones who control the money.

JdeBP · a month ago

Previous generations would have simply read something like the circle drawing writeup by Jeffrey S. McArthur in chapter 4 of COMPUTE!'s Third Book of Atari which as a matter of fact is available in scrapable text. (-:

* https://archive.org/details/ataribooks-computes-third-book-o...

* https://atariarchives.org/c3ba/page153.php

Fun fact: Orson Scott Card can be found in chapter 1.

ack_complete · a month ago

Even then, I've seen LLMs generate code with subtle bugs that even experienced programmers would trip on. For the Atari specifically, I've seen:

- Attempting to use BBC BASIC features in Atari BASIC, in ways that parsed but didn't work - Corrupting OS memory due to using addresses only valid on an Apple II - Using the ORG address for the C64, such that it corrupts memory if loaded from Atari DOS - Assembly that subtly doesn't work because it uses 65C02 instructions that execute as a NOP on a 6502 - Interrupt handlers that occasionally corrupt registers - Hardcoding internal OS addresses only valid for the OS ROM on one particular computer model

The POKE 77,0 in the article is another good example. ChatGPT labeled that as hiding the cursor, but that's wrong -- location 77 is the attract timer counter on the Atari OS. Clearing it to 0 periodically resets the timer that controls the OS's primitive screensaver. But in order for this to work, it has to be done periodically -- doing it at the start will just reset this timer once, after which attract mode will start in 9 minutes. So effectively, this is an easter egg that got snuck into the program, and even if the unrequested behavior was desirable, doesn't work.

throwawaylaptop · a month ago

Exactly this. I'm a self taught PHP/jQuery guy that learned it well enough to make an entire saas that enough companies pay for that it's a decent little lifestyle business.

I started another project recently basically vibe coding in PHP. Instead of a single page app like I made before, it's just page by page single loading. Which means the AI also only needs to keep a few functions and the database in its head, not constantly work on some crazy ui management framework (what that's called).

It's made in a few days what would have taken me weeks as an amateur. Yet I know enough to catch a few 'mistakes' and remind it to do it better.

I'm happy enough.

j4coh · a month ago

In this case I asked ChatGPT without the part specifying mode 7 and it replied with a working program using mode 7, with a comment at the top that mode 7 would be the best choice.

Exactly. I would not like to be called a vibe coder for using an LLM for tedious tasks though, is it not a pejorative term? I used LLMs for a few projects and it did well, because I knew what I wanted and how I wanted. So yeah, you do have to be an experienced programmer to excel with LLMs.

That said, you can learn a lot using LLMs, which is nice. I have a friend who wants to learn Python, and I have given him actual resources, but I have also told him to use LLMs.

> Vibe Coding seems to work best when you are already an experienced programmer.

I think that is a very vague and ambiguous way of putting it.

I would frame it a tad more specific: vibecode seems to work best when users know what they want and are able to set requirements and plan ahead.

Vibecoding doesn't work at all or is an unmaintainable god awful mess if users don't do software engineering and instead hack stuff together hoping it works.

Garbage in, garbage out.

Not only is it a useful constraint to ask for mode 7, but making sure the context contains domain-expert technology puts the LLM in a better spot in the sampling space.

cfn · a month ago

Just for fun I asked ChatGPT "How would you ask an LLM to write a drawing program for the ATARI?" and it asked back a bunch of details to which I answered "I have no idea, just go with the simplest option". It chose the correct graphics mode and BASIC and created the program (which I didn't test).

I still agree with you for large applications but for these simple examples anyone with a basic understanding of vibe coding could wing it.

forinti · a month ago

Exactly! If you can't properly assess the output of the AI, you are really only shooting into the dark.

Earw0rm · a month ago

Has anyone tried it on x87 assembly language?

For those that don't know. x87 was the FPU for 32-bit x86 architectures. It's not terribly complicated, but it uses stack-based register addressing with a fixed size (eight entry) stack.

All operations work on the top-of-stack register and one other register operand, and push the result onto the top of the stack (optionally popping the previous top of stack before the push).

It's hard but not horribly so for humans to write.. more a case of annoyingly slow and having to be methodical, because you have to reason about the state of the stack at every step.

I'd be very curious as to whether a token-prediction machine can get anywhere with this kind of task, as it requires a strong mental model of what's actually happening, or at least the ability to consistently simulate one as intermediate tokens/words.

userbinator · a month ago

If you are familiar with HP's calculators, x87 Asm isn't that difficult. Also noteworthy is that its density makes it a common choice for tiny demoscene productions.

Not too bad to write, kind of horrible to read.

silisili · a month ago

I'm going to doubt that. I was pushing GPT a couple weeks ago to test its limits. It's 100% unable to write compilable Go ASM syntax. In fairness it's slightly oddball, but enough exists that it's not esoteric.

In the error feedback cycle, it kept blaming Go, not itself. A bit eye opening.

messe · a month ago

I'm comfortable writing asm for quite a few architectures, but Go's assembler...

When I struggle to write Go ASM, I also blame Go and not myself.

The thing with x87 is that it's easy to write compilable, correct-looking code, and much harder to write correct compilable code, even for trivial sequences of a dozen or so operations.

Whereas in most asm dialects, register AX is always register AX (word length aliasing aside), that's not the case for x87: the object/value at ST3 in one operation may be ST1 or ST5 in a couple of instructions' time.

FeepingCreature · a month ago

Prediction: it can do it, so long as you tell it to explicitly keep track of the FPU stack in comments on every FPU instr.

ofrzeta · a month ago

It didn't go well? I think it went quite well. It even produced an almost working drawing program.

Yep, thought the same thing. I guess people have very different expectations.

Even though I was impressed with the original article (so kind of contrary to the author) in the meantime I tried the same thing with Claude Sonnet 4 (because some people here criticized the approach due to not using a proper "coding model") and got no better results. Now I tried about a dozen iterations but it did not manage to create a "BASIC programm for Atari 800XL that makes use of display list interrupts to draw rainbow-like colored horizontal stripes", although this is like a "hello world" for that technique and there should be plenty of samples on the Internet. I am curious to see if anyone can make that work with an LLM.

JKCalhoun · a month ago

Are there really plenty of examples on the internet?

My first thought reading the article was that Atari BASIC is kind of a little specialized. If BASIC is kind of an under-represented language in general on the internet (you know, compared to Javascript, for example) then Atari BASIC has to be a white whale.

Yeah, when you google for "atari display list interrupts" a lot of similar code comes up. Here's a complete example https://makezine.com/article/technology/generate-the-atari-r...

Also, sure, compared to Javascript it's a small amount.

ilaksh · a month ago

I think it's a fair article.

However I will just mention a few things. When you make an article like this please take note of the particular language model used and acknowledge that they aren't all the same.

Also realize that the context window is pretty large and you can help it by giving it information from manuals etc. so you don't need to rely on the intrinsic knowledge entirely.

If they used o3 or o3 Pro and gave it a few sections of the manual it might have gotten farther. Also if someone finds a way to connect an agent to a retro computer, like an Atari BASIC MCP that can enter text and take screenshots, "vibe coding" can work better as an agent that can see errors and self-correct.