Software testing, and why I'm unhappy about it

I have a lot of bitter things to say about automated testing, having spent 14 years of my life trying to knead it into a legitimate profession, but here's the most significant:

You test case is more useless than a turd in the middle of the dining room table unless you put a comment in front of it that explains what it assumes, what it attempts, and what you expect to happen as a result.

Because if you just throw in some code, you're only giving the poor bastard investigating it two puzzles to debug instead of one.

quadrifoliate · 3 years ago

At an old job, one manager would put in his employees' annual reports stuff like "Developer X wrote N automated tests, fixed M bugs, and filed P new bugs this quarter..."

The obvious result of Goodhart's Law ensued, leading to test cases like you mention.

Lesson to leaders: Please stop your bad managers from pulling stupid crap like this. It wastes a lot more time in the longer run.

randomdata · 3 years ago

Which is funny as the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions. It is documentation.

If you have to document your documentation, you might be missing something fundamental in how you are writing your first order documentation. Not to mention that in doing so you defeat the reason for writing your documentation in an executable form (to be able to automatically validate that the documentation is true).

vincnetas · 3 years ago

So i understand correctly, that your position is "code is the documentation"?

Over time im inclined to value human written documentation. Especially when things involve integrations of multiple systems. I had real cases, where two parties point at code and say their code is correct. And in isolation code looks correct. But when time comes to integrate these systems. It breaks. And then if you have human readable document where intentions and expectations are specified it's much easier to come to common (working) solution.

Not all languages have capability to express complex intentions so code as documentation does not work most of the time.

jmcomets · 3 years ago

Disregarding the "code is doc" position, it's still common to have an overview or index for documentation, which points readers in the right direction instead of dumping pages of detailed docs on them.

Now, you could also have a well organized test suite that goes from most obvious to most detailed, split into sections for each use-case, but this sounds a lot more tedious than "write a one-line comment describing the unit test".

charcircuit · 3 years ago

>the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions

No, the point of automated testing is to verify that what is under test behaves correctly and to be able to scale this verification cheaper than having humans do it. Documenting what it verifies and under what conditions is just a side effect.

hulitu · 3 years ago

The test plan is the documentation. That people are cutting corners is unfortunate.

A test must be reproduceable. If it is not, is not a test.

ramesh31 · 3 years ago

>You test case is more useless than a turd in the middle of the dining room table unless you put a comment in front of it that explains what it assumes, what it attempts, and what you expect to happen as a result.

This is why I found Gherkin/Cucumber (and BDD in general) to be a total revelation when I first encountered it. No one should be writing tests any other way IMO.

https://cucumber.io/docs/gherkin/reference/

randomdata · 3 years ago

Gherkin/Cucumber reintroduce the very problem TDD/BDD was intended to solve: Documentation falling out of sync with the implementation.

The revelation of TDD, which was later rebranded as BDD to deal with the confusion that arose with other types of testing, was that if your documentation was also executable the machine could be used to prove that the documentation is true. The Gherkin/Cucumber themselves are not executable and require you to re-document the function in another language with no facilities to ensure that the two are consistent with each other.

If you are attentive enough to ensure that the documentation and the implementation are aligned, you may as well write it in plain English. It will give you all of the same benefits without the annoying syntax.

invalidname · 3 years ago

This sounds like a good theory but the practice of it is really hard. Pretty quickly you end up with tests that "say" one thing but have nuanced different behavior in the underlying implementation.

Then try to debug a "document"...

I like the idea. But having tried it at scale, it becomes a mess. Code I can understand. I can read English comments. I can't debug English.

chriswarbo · 3 years ago

I agree. One nice feature of property-driven testing is that assumptions often end up causing test failures. For example (in ScalaTest):

  "Average of list" should "be within range" in {
    forAll() {
      (l: List[Float]) => {
        val avg = l.average
        assert(avg >= l.min && avg <= l.max)
      }
    }

This test will fail, since it doesn't hold for e.g. empty lists. Requiring non-empty lists will still fail, if we have awkward values like NaNs, etc. The following version has a better chance of passing:

  "Average of list" should "be within range" in {
    forAll() {
      (raw: List[Float]) => {
        val l = raw.filter(n => !n.isNaN && !n.isInfinite)
        whenever (l.nonEmpty) {
          val avg = l.average
          assert(avg >= l.min && avg <= l.max)
        }
      }
    }

Getting this test to pass required us to make those assumptions explicit. Of course, it doesn't spot everything; here's an article which explores this example in more depth (in Python) https://hypothesis.works/articles/calculating-the-mean

midasz · 3 years ago

I always use (if the scenario is simple enough, which most are):

@Test

public void myTestMethod_Scenario_ShouldReturnThis() {....

geraldwhen · 3 years ago

Jest makes this far more straightforward.

It(“throws when the object belongs to another user”)

It(“does a business thing when thing is in state BLAH”)

sirsinsalot · 3 years ago

To some degree, this is what BDD attempts to solve, separation of test mechanics and documentation of the test's intention.

I don't think it quite does it right, but it is of note.

hgomersall · 3 years ago

We have a policy of making each test a spec. That is, a test requires a plain text spec to be attached to it in its doc string. It's kind of like BDD but without all the weird DSLs.

dgb23 · 3 years ago

What about data driven tests where you lay out several variants, including edge cases, for function arguments? Seems pretty clear to me.

debug-desperado · 3 years ago

So, "given when then" style tests (e.g. Spock) plus a descriptive test name. Or more than that?

amelius · 3 years ago

I suppose soon you could ask GPT "what is this code supposed to do?"

(I would buy a Copilot subscription for this)

nosianu · 3 years ago

Which would replace all those humans producing perfectly valid sounding explanations that if you invest some research effort have no basis in (the usually far more complex, but also far more fascinating and infinitely deep) reality. So yes, I think AI can indeed replace lots of human-produced thoughts :-)

I admit to have been guilty of this myself. I have a famous anecdote-example where I had a very well-paid contractor job and explained something about how my then department's software worked to someone from another department. I think I must have sounded very convincing, the person went off to change something in how they used our stuff. A few minutes later, after accidentally meeting and casually chatting with my boss for that job I realized everything I had said was total garbage. I quickly excused myself from my boss and hurried after the person to tell them to forget and ignore everything I had just explained to them because it was all wrong. I think this last step is not what happens in those cases because we don't usually realize that such a thing just happened.

The brain, or parts of it, are great at producing "explanations". I think that it was part of the more established and reproducible results of psychology that our brain first decides and acts, and only then produces some (often bullshit) "reason" when/if our conscious self asks for one? Does anybody remember if this is true and has a link?

Doctor, it hurts when I punch myself in the head!

If testing that way is painful (and it is), then work with people to remove the pain. Tests are supposed to help developers, not constrain or punish them.

Put tests in the same repo as the SUT. Do more testing closer to the code (more service and component tests) and do less end-to-end testing. Ban "flakey" tests - they burn engineering time for questionable payoff.

Test failures can be thought of as "things developers should investigate." Make sure the tests are focused on telling you about those things as fast as possible.

Also, take the human out of the "wait for green, then submit PR" steps. Open a PR but don't alert everyone else about it until you run green, maybe?

cranium · 3 years ago

It would work for most "classical" software development. In this case, the author talks about conformance tests (a HUGE collection) from an external vendor. Most of them will fail at first, then you make them pass slowly but steadily.

The problem becomes: I want to know if there are significant regressions in the vendor tests, ie. tests that were green for a long time and suddenly changed. You could flag any test that became green at some point as "required" to pass the CI, but then you have tests that randomly succeed or fail depending on code you have not yet written (eg. locking around concurrent structures). Marking these tests manually is impractical and could definitively be replaced by tooling that supports some statistical modeling of success/failure.

You may have the best testing strategy for internal code but as long as you have to test against these conformance tests it's simply unfeasible to say "sorry, only green allowed".

kubanczyk · 3 years ago

> take the human out of the "wait for green, then submit PR"

It'd be great if GitHub could open a PR for reviews (aka un-draft) automatically after CI succeeds. (If not in the core product, is there a bot that does that?)

mike_hearn · 3 years ago

My company uses a workflow where we don't use PRs for code reviews. Instead we each have our own git repo that's a fork of the tech lead's, with some git rules in place to impose a branch namespace. To open a review request you push a branch into the reviewer's repository. Our CI system detects the new branch and starts running it. Once CI passes that updates the bug tracker which triggers a notification to the reviewer.

The reviewer then does a git fetch, and then checks out the newly created rr/ branch. They make any small changes that aren't worth a roundtrip and push them to the rr branch. They add FIXME comments for bigger changes. They then either assign the ticket back to the developer, or go ahead and merge straight into their own dev branch. Once an rr branch is merged it's simply deleted. The dev branch is then pushed and CI will merge it to that user's master when it's green.

IntelliJ will show branches in each origin organized by "folder" if you use backslashes in branch names, and gitolite (which is what we use to run our repos) can impose ACLs by branch name too. So for example only user alice can push to a branch named rr/alice/whatever in each persons repo. That ensures it's always clear where a PR/RR is coming from.

Because each user gets their own git repo and cloned set of individual CI builds, you can push experimental or WIP branches to your personal area and iterate there without bothering other people.

This workflow gets rid of things like draft PRs (which are a contradiction), it ensures each reviewer has a personal review queue, it means work and progress is tracked via the bug tracker (which understands commands in commit messages so you can mark bugs as fixed when they clear CI automatically) and it eliminates the practice of requesting dozens of tiny changes that'd be faster for the reviewer to apply themselves, because reviewer and task owner can trade commits on the rr branch using git's features to keep it all organized and mergeable.

{ helpers ? import (fetchGit { url = "git://url-of-helpers.git"; ref = "master"; rev = "11111"; }) , some-library ? import (fetchGit { url = "git://url-of-some-library.git"; ref = "master"; rev = "22222" }) {} }: helpers.build-a-service { name = "my-service"; src = ./src; deps = { inherit some-library; }; }

import (fetchGit { url = "git://url-of-some-library.git"; ref = "master"; # No 'rev' given, so it will fetch 'HEAD' }) { # Build with this checkout of some-library, instead of the pinned version some-library = import ./. {}; }