Sensible Software Engineering

wildmanx · 7 years ago

> Bugs are correlated with lines of code and TDD forces writing more code so how can it reduce bug counts? If the test code has no bugs then just write the rest of the code in the same style

Nobody has pointed out the fault in this reasoning yet, so I will.

The linear relationship lines-of-code vs. total-bug-count is based on independence of bug introduction in different parts of the code. That is, introducing a bug in line 10000 is more-or-less independent of adding line 20000. For product code that is arguably the case, but adding test code into the mix, this basic assumption doesn't just not hold, but it's turned upside down.

Test code is not customer-visible, but only dev-visible, with the sole purpose of finding bugs. Thus, adding test code to a code base decreases the average probability of a bug for lines of product code. More formally, if you have N lines of product code and thus c * N bugs for some fraction c, then adding M lines of test code does not increase the number of customer-visible bugs to c * (N + M). Instead, it reduces that number to c' * N with c'<c and the difference being caused by your test coverage. (A 100% test coverage, i.e., an exhaustive test without bugs or, equivalently, a formal verification, would bring c' to 0.) Sure, the M lines of test code may well have bugs on their own, but that only increases c' slightly while keeping it below c, and more importantly, those test bugs are not customer-visible. They only annoy developers.

I agree with the rest of the post though.

ericathegreat · 7 years ago

Agree. Consider the situation of data entry professionals, those people who transcribe audio recordings or enter data into databases.

If you have one person entering data into a computer, then the odds of them introducing an error and failing to spot it are fairly high.

If you have twice as many people entering twice as much data data, then the odds of an error getting introduced are roughly doubled.

However, if you have those two people entering the same data, then their mistakes cancel each other out. If person A and person B both entered the same thing, it's extremely unlikely that it's incorrect. If they differ though, the a problem has been identified, and can now be fixed.

The odds of both of those people entering the same piece of data incorrectly is tiny. Likewise, accidentally introducing a bug into both the production code and the test is pretty unlikely.

That said though, if those two theoretical data entry people above are given the wrong data to enter, then they the system cannot protect them. They will both correctly enter incorrect data. "Garbage in, garbage out".

Likewise, if the requirements of a piece of software are poorly understood, then it is quite likely that both the test and the production code will implement the same "bug". Writing tests won't fix a failure to understand the problem you're trying to solve. And they're not supposed to.

gdfasfklshg4 · 7 years ago

> The odds of both of those people entering the same piece of data incorrectly is tiny.

This is just not correct. There may be a systematic reason why they are making a mistake (e.g. a miss-pronounced word) in which case increasing the confidence intervals does not increase the accuracy. Check out the concepts of accuracy, precision etc from physical sciences.

sombremesa · 7 years ago

Tests don't exist for solving bugs as much as they exist to prevent regression. Writing exhaustive tests is not sufficient to bring c' to zero, because some of those bugs exist because of the developer's own presumptions about how code should behave.

chii · 7 years ago

But testing exhaustively means the full range of inputs are used, and therefore, if a bug arises as inputs not considered by the dev initially, it would show up as a failure.

What you're actually saying is that reaching full exhaustive testing is near impossible.

mikekchar · 7 years ago

Bugs in tests exist (and are just as prevalent as bugs in production code in my experience). However, writing production code introduces a kind of constraint on the system. If you run a function with certain program state, then it returns a certain result. No matter what you do, this will happen because that's how you wrote the code.

A test introduces the same constraint. Instead of implementing the function, it runs the function. No matter what you do, the test will run certain code and get a certain result. Tests pass when the constraints introduced by the production code match the constraints introduced by the tests.

Because of this, if there is a software error in the production code, usually you need to have a corresponding error in the tests in order for the test to pass. This definitely happens from time to time, but the odds of it happening are much lower. Similarly, if you have a software error in the tests, you need to have a corresponding software error in the production code in order for the test to pass -- assuming that the test is actually exercising the code (something you can't always assume, unfortunately).

p1necone · 7 years ago

The other fault in that reasoning is that usually test code is much much simpler than product code. It's a lot harder to write bugs when the structure of the code is just 1. call single function with some input data 2. verify that output data/side effects meet expectations 3. repeat for more sets of inputs.

neo2006 · 7 years ago

One of the mistake I see a lot around this is when people start to write intelligent mocks. Usually this is where test bugs reside.

foobarchu · 7 years ago

I think this holds for well written tests, but if one is not talented in writing them then more tests will simply bolster the bugs in the main code. For example, if a developer generate test data by running the program and copying it's output (something I see far more often than I'm comfortable with), they can tell themselves that section was tested thoroughly and must be bug free. Thus when someones notices an end result someplace is wrong, the first place checked may not be that "well tested" function. Tests are only as good as the domain knowledge of the person writing them.

gilbetron · 7 years ago

Do you have an actual source that corroborates that with actual data? It seems true, but I haven't found anything that actually tests the premise. I've found studies that show that code with lots of unit tests tends to be more bug free, but not anything about whether unit tests increases (or decreases) development time or whether test code actually has less bugs. I think it would be interesting if there is also a measure of code complexity, and if less complex test code correlates with less bugs in the test code.

Aeolun · 7 years ago

Arguing from the point of logic and formalisms the author is talking about, this would seem to be true.

paulddraper · 7 years ago

Yeah, it's like saying "CAP says partitions decreases availability/consistency, so putting your app servers in a scale group will make things less available/consistent."

That's really superficial analysis. Yes there is some fundamental tradeoffs. But it is possible to change the properties of a system or combine them in intelligent ways and move the curve.

Or it's like saying that speedometers make your GPS less accurate due the Heisenberg principal.

jasode · 7 years ago

>Bugs are correlated with lines of code and TDD forces writing more code so how can it reduce bug counts? If the test code has no bugs then just write the rest of the code in the same style

I'm not advocating for TDD (the programmer methodology in the IDE) but the author's explanation about "test code" isn't correct. Code written for explicit purposes of a test to exercise other code has been shown to increase correctness. E.g. SQLite database has 711x more test code than the core engine code.[1] (I made a previous comment why this is possible: https://news.ycombinator.com/item?id=15593121)

Low-level infrastructure code like database engines, string manipulation libraries, crypto libraries, math libraries, network protocol routines, etc can benefit from a suite of regression tests.

It's the high-level stuff like GUIs in webpages being tested with Selenium or LoadRunner that has conflicting business value because altering one pixel can have a cascading effect of breaking a bunch of fragile UI test scripts.

[1] https://www.sqlite.org/testing.html

userbinator · 7 years ago

The question that always comes to mind whenever testing is discussed is "how do you know the test code is itself free of bugs?"

I distinctively remember once posing that question in a meeting about testing, and a manager replying --- seriously --- with "then perhaps the test code should itself have tests." Someone else must've come up with that before too, because (at a different job) I've also worked on a codebase where a surprising number of tests were basically testing the function of another test.

gjm11 · 7 years ago

You don't know that the test code is free of bugs. You don't need to.

Case 0: No bugs in the test code. All is well.

Case 1: Bug in the test code that causes some bugs in the real code not to get caught. That's bad, but you're no worse off than if you didn't have the test at all.

Case 2: Bug in the test code that causes correct real code to look buggy. Result: the test fails, you look for problems, most likely you find that the problem is in the test code and fix it. Going forward, you have a working test.

Case 3: Bug in the test code that makes something else break. This can happen and is genuinely bad, but (1) it only affects testing, not your actual product, and (2) most bugs don't behave that way.

The test code is a net win if the bugs it catches in your real code are worth the effort of writing and debugging the test code. That's no less true on account of the possibility of bugs in the test code. It just means that when you estimate the benefit you have to be aware that sometimes the tests might be less effective because of bugs, and when you estimate the cost you have to be aware that you have to debug the code, not just write it the first time. And then you just ... decide what's the best tradeoff, just like everything else in engineering.

(And no, you don't need tests for your test code. The test code is a test for itself as well as for the code it's testing.)

zerogvt · 7 years ago

Add to this countless hours going into engineering tests that have to bent over backwards and mock the universe itself to test a tiny bit of logic that doesn't make sense to test in the first place. Then multiply with the hours getting lost trying to debug frail tests that break time-expensive continuous integration pipelines.

Good unit and integration tests are a rarity. Instead tests like the above ones which actually are being developed like a side-project to the real project are the norm and they are bogging down the whole project. But you cannot deliver any code that is not "covered" because that would be against the current 100%-TDD bible/best practice/call it however you want.

So the next dev/maintainer has to work out a badly written test (probably bearing untrue assumptions about the programs intended business logic) and after some fun hair pulling he does the reasonable thing which is working around the test or pampering the test to get it to pass. And that is how you end up with test code that is actually buggy and problematic in itself and does not really test much but it is increasing the test-coverage holly counter.

danmaz74 · 7 years ago

What the author didn't consider is that test code adds redundancy to the code base - if you introduce a bug only in the test code, while the tested code is correct, will make the test fail, and you will fix this.

Where instead the amount of test code can become a problem, in my experience, is with maintenance. Efficiency is very important.

greenyouse · 7 years ago

You can tune the pixel difference in Selenium testing so it doesn't fail when under the acceptable pixel change limit. If it's off by one or two no biggie but off by 30 or 50 then maybe there was a structural change to the page, so it should fail. Something like "misMatchTolerance" from wdio-visual-regression-service[0] would allow for tweaking this. I'm sure there are similar tools for other languages.

If the UI tests catch bugs during development or help the team during a data migration, they're probably still worth having.

[0] https://github.com/zinserjan/wdio-visual-regression-service

herge · 7 years ago

Maybe the spry takeaway is that you should write tests for things you want to make sure they will not break.

Your UI being off by a pixel won't break your application, so if a test hangs on that, then it is not a good test.

However, your business logic, or network protocol routine, those should not break even if you heavily refactor or add new features (especially business logic where a broken behaviour might seem correct), so those need to be heavily tested.

If it is hard to test the juicy parts like business logic without also dragging in the UI, different OS/platform/db parts, etc, then you should look at how your application is structured and if it is really optimized for writing good tests.

gav · 7 years ago

> you should write tests for things you want to make sure they will not break

You should also write tests for things that are already broken before you work on the fix so that you can be sure it's actually fixed. Basically the red/green/refactor cycle[1] from TDD.

[1] http://www.brendanconnolly.net/test-driven-testing-red-green...

jasode · 7 years ago

>Your UI being off by a pixel won't break your application, so if a test hangs on that, then it is not a good test.

For brevity in the previous comment, I didn't fully flesh out the background on why fragile UI tests get created. It happens accidentally.

What sometimes happens is the the UI tester uses a "macro recorder" to record mouse movements and clicks. But then, a programmer shifts the position of a zipcode field by one pixel which throws the script off because it expected UI elements in a different spot. Fixing the broken UI tests is time consuming and can leave a bad impression that tests create a lot of effort for very little payback.

The return-on-investment of UI tests depends on the business circumstances. I'm guessing Boeing and Airbus have automated UI tests that sometimes break when programmers change things which causes rework. However, the pain of fixing the UI tests and keeping it sync'd with the UI code is worth it for avionics software.

james_s_tayler · 7 years ago

>So here’s the punchline: if you want to be a good programmer then learn a technology and language agnostic formalism. Logic, statistics, and game theory are probably good bets.

I think in an abstract sense control theory is a reasonably good bet.

https://en.wikipedia.org/wiki/Control_theory

I can't say I know it deeply, but a lot of the ideas resonate when I think about software engineering. If you think of everything as basically an n-dimensional vehicle with an interface to control it and the control mechanism is used to set all the parameters relevant to the system then a few things follow:

  every system has a safe operating envelope
  parameters are usually linked to eachother such that turning one up turns another down

I find there is a lot of mileage to be had about thinking about which parameters are linked to eachother and when you get excited about turning one of them up (ie you bring in TDD or a Kubernetes type solution) what is the effect you are having on the other parameters? That's where a new source of pain is going to come from. The biggest mistake I see in reasoning about these kind of things is being blind to the negative side of the trade-off due to the overwhelming excitement of finally being able to jump on the bandwagon and join the cargo cult. You have to train yourself to hunt for the parameter that is being affected indirectly as that's the most important side of the trade-off. You need to reason about whether or not the indirectly affected parameter's new value would take you outside of the safe operating envelope.

With every decision we make about systems we build and run we are essentially trying to steer them, albeit clumsily, in this manner.

keithnz · 7 years ago

good recommendation, if the author understood control theory he'd understand that code that feedbacks on code is quite different than code with no feedback loop.

while I find it hard to find anything I'd recommend about the authors article as most of the reasoning seems a little off, I can understand the sentiment of the article .

Agile and TDD are really recognition of control theories ideas of feedback loops keep things in better control and can adapt faster vs long feedback loops going out of control far easier. This is more targeted at the human side of creating software. Nothing to say there aren't better strategies than TDD and Agile techniques, however I think that principle of feedback loops to give confidence will stay in some form. I think there is a LOT more to be said about engineering / designing correct, robust, and secure software.

james_s_tayler · 7 years ago

>This is more targeted at the human side of creating software

This in my experience is the most important factor.

oldgradstudent · 7 years ago

Spotted the control theorist.

We all think our pet subjects are the right lens to view the world with.

james_s_tayler · 7 years ago

I've never formally studied it. My pet subject is CS tbh. It's just a lens I found one day, picked it up and started inspecting things through it, found it useful, pocketed it and moved on. Comes in handy a lot along with several others I've collected over the years.

leetrout · 7 years ago

Really buried the lede here. I get that it was acknowledged as a rant / opinion piece but there's not a lot of really actionable advice for the general population of programmers, IMO. The article has good points, for sure, but the ending has the best part, IMO.

> So here’s the punchline: if you want to be a good programmer then learn a technology and language agnostic formalism. Logic, statistics, and game theory are probably good bets. As things stand that kind of skill is probably the only thing that is going to survive the coming automation apocalypse because so far no one has figured out a way around rigorous and heuristic thinking.

I think there's a lot of support there.

I don't think using Kubernetes as an example of "sequestering the complexity behind a distributed control system" was a good follow up to TDD generating more lines of code. Containers are a step in the right direction and Kubernetes _is not_ the best option for using containers in production but it _is_ the most popular option and so if you want community mindshare and support then it probably is the best choice if you can manage it or use a managed service.

"Serverless" is real, it's here, and containers / k8s are just a step along the way.

eikenberry · 7 years ago

I agree that Kubernetes is not the best technology out there, but I'm not 100% on which I would call the best. Did you have a winner in mind (it sounded like it)? If so why that one?

Also it may not really matter as you said Kubernetes is the most popular and is only getting more so at this point and the network affect is so strong in tech like this that the "technically best" most likely will become a moot point.

leetrout · 7 years ago

I am a big fan of Nomad's scheduler and the code is easy to pull in to an existing Go project. It doesn't have everything Kubernetes has but I don't think it needs it, either.

Kubernetes has for sure won the popularity contest but the overhead involved in running it The Right Way™ on your own is a lot. Given what I've seen I would advocate for OpenShift if you like RedHat products / projects or sticking with Kubernetes from one of the well-known cloud providers.

avip · 7 years ago

That's one high-quality rant. The likes of Pretending that the system and the runtime traces of the system are the specification is why there is always a shortage of programmers. That's really insightful. And I think the implied derived actionables are obvious.

asimpletune · 7 years ago

Off topic, but I’m curious what you think the best container solution is?

As an aside, I’ve never been able to get anyone to explain to me why k8 and not something like mesosphere’s DCOS, other than “google”.

Anything you can share there?

eikenberry · 7 years ago

I think Mesoshpere/DCOS suffers from the same basic problem that k8ts does... instead of going for the 80% solution, it goes for the 100% solution. That is both systems try to do to much and are overly complex and hard to manage.

myth2018 · 7 years ago

The only part I tend to disagree with is regarding TDD. There is a study from Microsoft Research showing that TDD results in greater quality -- although you pay a price for it [0]. I believe this applies as a general rule, although I recognize I'm not aware of further studies.

However I agree with the article's general idea. In the aviation industry there already are languages abstracting computers' internals and allowing programmers to reason about safety-critical programs using more high level constructs.

Due to its nature, I think there won't be such a technology for general purpose languages -- in order to be general enough, you can't have too much things abstracted away. Maybe we couldn't go much farther than what languages like Basic allows us.

On the other hand, I wish we had such languages for more specific tasks like ERP-like software, business web applications and so on. It's worth noticing that many of the biggest ERP companies in the world have their proprietary domain specific languages.

[0] https://www.microsoft.com/en-us/research/blog/exploding-soft...

AdieuToLogic · 7 years ago

From the article:

  So here’s the punchline: if you want to be
  a good programmer then learn a technology
  and language agnostic formalism.

No.

If you want to be "a good programmer", then learn how to define the problem for which you are tasked to solve. The technology is irrelevant. The "language agnostic formalism" is irrelevant.

Unless a person/team knows what must be done, then the rest really doesn't matter. Techniques which help to elicit repeatable delivery certainly are worthy to learn, even to advocate for. But without understanding what is needed, what use are they?

sinuhe69 · 7 years ago

Basically, the author suggested a “back to basic” approach, which is of course not totally wrong. But ignore the very real challenge of the software industry is not going to help either. We can not sit down to plan and wait for the so-called “global optimal” design and then and only then proceed to implement. Because to achieve a “global optimal” design one has to consider each and every details and aspects of the problem, which is simply impossible in the software industry because so many are unknown. The Waterfall model is long dead. History has proven it does more harm than good. Don’t dig it up.

PS: I don’t disagree that in some cases, academics can use the formal methods to find a (near) global optimal solution. But I don’t think it’s practical in a daily context, nor necessary. Our evolution is the best proof that local optima can lead over time to fantastic solutions.

scandox · 7 years ago

> The bottleneck has always been in understanding all the implicit assumptions baked into the system that for one reason or another are essential to its correctness.

> It takes a particular kind of masochist to enjoy reverse engineering a black box from just poking at it with simple linear impulses.

These are great observations and brilliantly put. In particular the second one I think rightly explains why some very smart people definitely do not take to programming as a profession.