MoreQARespect (u/MoreQARespect)

MoreQARespect commented on Delete tests andre.arko.net/2025/06/30... · Posted by u/mooreds

Came here to comment this. Most of the flakey tests are badly written, some warn you about bugs you don't yet understand.

Couple years ago I helped to bring a project back on track. They had a notoriously flakey part of test suite, turned out to be caused by a race condition. And a very puzzling case of occasional data corruption - also, turns out, caused by the same race condition.

MoreQARespect · 3 days ago

I tend to find that those bugs are in the extreme minority.

Most flakiness ends up being a bug in the test or nondeterminism exhibited by the code which users dont actually care about.

MoreQARespect commented on Delete tests andre.arko.net/2025/06/30... · Posted by u/mooreds

MathMonkeyMan · 4 days ago

Integration tests at $DAY_JOB are often slow (sleeps, retries, inadequate synchronization, startup and shut down 8 processes that are slow to start and stop), flaky (the metrics for this rate limiter should be within 5%, this should be true within 3 seconds, the output of this shell command is the same on all platforms), undocumented, and sometimes cannot be run locally or with locally available configurations. When I run a set of integration tests associated with some code I'm modifying, I have no idea what they are, why they were written, what they do, how long they will take to run, or whether I should take failures seriously.

Integration tests are closer to what you want to know, but they're also more. If I want to make sure that my state machine returns an error when it receives a message for which no state transition is defined, I could spin up a process and set up log collection and orchestrate with python and... or I could write a unit test that instantiates a state machine, gives it a message, and checks the result.

My point is that we need both. Write a unit test to ensure that your component behaves to its spec, especially with respect to edge cases. Write an integration test to make sure that the feature of which your component is a part behaves as expected.

MoreQARespect · 3 days ago

The way some programmers treat test flakiness is weird.

With other types of bug programmers want to fix it. With flakiness they either want to rerun the test until it passes or tear it down and write an entirely different type of test - as if it is in fact not a bug, but some immutable fact of life.

MoreQARespect commented on Delete tests andre.arko.net/2025/06/30... · Posted by u/mooreds

creesch · 3 days ago

> I find testing terminology very confusing and inconsistent.

That's because it is both confusing and inconsistent. In my experience, every company uses slightly different names for different types of tests. Unit tests are generally fairly well understood as testing the single unit (a method/function) but after that things get murky fast.

For example, integration tests as reflected by the confused conversation in this thread already has wildly different definitions depending on who you ask.

For example, someone might interpret them as "unit integration tests" where it reflects a test that tests a class, builder, etc. Basically something where a few units are combined. But, in some companies I have seen these being called "component tests".

Then there is the word "functional tests" which in some companies means the same as "manual tests done by QA" but for others simply means automated front-end tests. But in yet other companies those automated tests are called end 2 end tests.

What's interesting to me when viewing these online discussions is the complete lack of awareness people display about this.

You will see people very confidently say that "test X should by done in such and such way" in response to someone where it is very clear they are actually talking about different types of tests.

MoreQARespect · 3 days ago

Unit tests dont have a coherent agreed upon definition either.

In fact, when I first saw Kent Beck's definition I did a double take because it covered what I would have called hermetic end to end tests.

The industry badly needs new words because it's barely possible to have a coherent conversation within the confines of the current terminology.

MoreQARespect commented on Delete tests andre.arko.net/2025/06/30... · Posted by u/mooreds

integralid · 3 days ago

> So if your code is ascertained to work at the high level, you also know that it must be working at the lower level too

In the ideal world maybe. But It's very hard to test edge cases of a sorting algorithm with integration test. In general my experience is that algorithms and some complex but pure functions are worth writing unit tests for. CRUD app boilerplate is not.

MoreQARespect · 3 days ago

Ive never in my life written a test for a sorting algorithm nor, im sure, will i ever need to.

The bias most developers have towards integration tests reflects the fact that even though we're often interviewed on it, it's quite rare that most developers actually have to write complex algorithms.

It's one of the ironies of the profession.

MoreQARespect commented on Turning Claude Code into my best design partner betweentheprompts.com/des... · Posted by u/scastiel

SkyPuncher · 9 days ago

No, TDD failed because it assumed you could design a perfect systems before implementation.

It’s a totally waste of time to do TDD to only find out you made a bad design choice or discovered a conflicting problem.

MoreQARespect · 9 days ago

This is precisely the problem I alluded to which is solved by writing higher level tests with TDD that make fewer assumptions about your design.

TDD ought to let you make a bad design decision and then refactoring it while keeping the test as is.

MoreQARespect commented on Turning Claude Code into my best design partner betweentheprompts.com/des... · Posted by u/scastiel

mattmanser · 9 days ago

I feel TDD ended up fizzling out quite a bit in the industry, with some evangelists later admitting they'd taken to often writing the code first, then the tests.

To me it's always felt like waterfall in disguise and just didn't fit how I make programs. I feel it's just not a good way to build a complex system with unknown unknowns.

That the AI design process seems to rely on this same pattern feels off to me, and shows a weakness of developing this way.

It might not matter, admittedly. It could be that the flexibility of having the AI rearchitect a significant chunk of code on the fly works as a replacement to the flexibility of designing as you go.

MoreQARespect · 9 days ago

TDD fizzled because not enough emphasis was put on writing high level tests which matched user stories and too much emphasis was put on it as a tool of design.

MoreQARespect commented on Vibe Debugging: Enterprises' Up and Coming Nightmare marketsaintefficient.subs... · Posted by u/someoneloser

AstroBen · 11 days ago

TDD is really commonly misunderstood to be a testing strategy that helps reliability- it's not. It's supposed to guide your software design

MoreQARespect · 10 days ago

It's pretty bad at this. It's much better used as a testing methodology than a design methodology.

It can provide high level guardrails confirming implementation correctness that are as indifferent to software design as possible (giving freedom to refactor).

MoreQARespect commented on Vibe Debugging: Enterprises' Up and Coming Nightmare marketsaintefficient.subs... · Posted by u/someoneloser

rootnod3 · 11 days ago

So, there you have it. TDD is good if applied correctly, and only if you apply it 100% correct. And so it seems for LLM usage. If it doesn't work for you, then you are obviously doing it wrong according to many folks here. TDD is nice to catch refactoring mistakes, LLMs are nice to maybe do some initial refactoring on a small enough code base. And it doesn't mean that one precludes the other. But I haven't seen TDD put engineers out of work and neither should LLMs. Trust either model fully and you are in for a world of hurt.

MoreQARespect · 11 days ago

I would usually measure "TDD correctness" in terms of how closely the test matches a user story vs how closely it mirrors code implementation.

The former is desirable, not common. The latter is common, not desirable.

MoreQARespect commented on Why we dont like TDD oneuptime.com/blog/post/2... · Posted by u/ndhandala

MoreQARespect · 15 days ago

>first the developer writes a failing automated test case that defines a desired improvement or new function

If you TDD at the highest level that makes sense, with a test that mirrors the requirements in the form of a user story rather than the implementation (e.g. of a new function) most of the comments in this article dont really make sense.

Yes, if youre not sure about requirements its too early to write the test, but it's also far too early to be writing any production code.

There arent any real benefits to test-after. The risks of writing a test that mirror an implementation are higher. The risk of undertesting is higher.

And, I've also noticed that people who are very thorough with test-after tend to over test.

MoreQARespect commented on AI doesn't lighten the burden of mastery playtechnique.io/blog/ai-... · Posted by u/gwynforthewyn

dimal · 16 days ago

There’s always been this draw in software engineering to find the silver bullet that will allow you to turn off your brain and just vibe your way to a solution. It might be OOP or TDD or pair programming or BDD or any number of other “best practices”. This is just an unusual situation where someone really can turn off their brain and get a solution that compiles and solves the problem, and so for the type of person that doesn’t want to think, it feels like they found what they’re looking for. But there’s still no silver bullet for complexity. I guess there’s nothing to do but reject the PR and say “Explain this code to me, then I’ll review it.”

MoreQARespect · 15 days ago

Most juniors I watch program very quickly get overwhelmed by complexity because they dont know how to follow strategies like BDD or TDD which isolate parcels of complexity (e.g. how the program is supposed to behave) from other parcels (e.g. how the code actually works).

Even worse though, they all seem to think that the solution to becoming overwhelmed with complexity isnt to parcel it up with strategies like BDD and TDD but to just get better at stuffing more complexity into their brains.

To be honest, I see a similar attitude with LLMs where loads of people think you just need to stuff more into the context window and tweak the prompt and then it'll be reliable.