Vladimir Sedach

Vladimir Sedach

Have Emacs - Will Hack

March 7, 2018

TDD does not work the way you think it does

Edsger Dijkstra pointed out: testing may convincingly demonstrate the presence of bugs, but can never demonstrate their absence.

This is why the core tenet of TDD is to first write a failing unit test. The only way to be sure that a unit test is not a tautology is to verify that it points out the presence of a bug.

This is also why unit testing is most valuable for regression tests - the regression unit test reproduces the presence of a known bug for a specific version of the code. The intention is that the unit test will signal that the bug is reoccurring if future versions of the code reintroduce it. This is usually the case, but there are two other possibilities that I have encountered:

The same (from the point of view of the application user) or similar-looking bug will crop up in another part of the code.
Over a long enough time span, the code changes so much that the test will always pass under any condition.

The second kind of test is what I refer to as a tautology, and represents the biggest potential pitfall of unit testing. A unit test might begin its life as a tautology, it might be inadvertently changed into a tautology as part of updating the test suite to reflect a change in the program, or the code that is being tested might change so that the unit test turns into a tautology without any changes to the test (this could happen, for example, if the code changes so that the initial conditions on which the unit test fails can no longer occur).

The tautological unit test is there, it runs, it passes, it has good coverage, and it is worse than useless because it is misleading.

In theory it would be possible to check whether a unit test is a tautology; in practice this is unworkable because it entails running the unit test over all of the possible program states.

The other obvious problem with TDD is putting the cart before the horse: the factoring of the code into procedures, the procedure argument lists, procedure names, are all decided on before any of the code is written. This makes the proselytism of TDD among programmers of dynamic languages with REPLs puzzling - why would you discard your ability to design programs interactively? As Stuart Halloway pointed out in his talk about the Clojure REPL: REPL development is faster than test-driven development.

From my experience, many TDD advocates are simply unfamiliar with interactive development. Integrated Development Environments like Eclipse provide only a façade of interactive development environments like Smalltalk, SLIME, or 1980s Lisp Machines.

If TDD has these downsides, why does it work so well?

The key thing to realize about TDD is that it is not a technology - it is a Taylorist management technique for reducing software maintenance costs.

Forcing unit tests to be written before the code ensures that unit tests are actually written. Unit tests describe the intended behavior of code in a fairly unambiguous way (up to some threshold of bad unit test code). This ensures that no matter how poorly the program code is structured, how confusing and ambiguous the procedure names are, there is some record indicating what the code is supposed to do. Code changes become easier because unit tests work properly almost all of the time (tautologies and duplicate tests are not a big problem in practice).

TDD is a management technique that enables organizations employing programmers with poor skills to write software that is more maintainable than they would otherwise be capable of producing.

Documentation and code comments are supposed to aid in software maintenance by expressing the intention behind the code - that is, documentation and code comments are a sort of redundancy. However, documentation and code comments cannot be machine-checked for consistency or completeness. TDD forces programmers to duplicate effort in writing the intention of the code in both the code itself and the unit tests. The unit tests can be checked against the code for consistency (the unit test passes) and completeness (the unit test invokes some percentage of the code statements; code coverage).

The best case scenario is that the code and the unit tests are correct with respect to the requirements, and complement each other in explaining the intention behind the code. When this happens, TDD results in a program with test suite whose behavior is more understandable and easier to change later on, than a program only or a program with documentation.

The worst case scenario is that the code is incorrect and the unit test is a misleading tautology. When this happens, TDD is a sort of software engineering conjunction fallacy - having both the unit test and the code actually leads to less maintainable software than having only the code.

Seen this way, it becomes more apparent why TDD is a software maintenance technique, not a way of writing software with less bugs. One of the criticisms of TDD for dynamic languages is that a lot of the tests are essentially type checking, and would be unnecessary in a statically typed programming language. A type system is a technology for provably eliminating certain kinds of bugs from a program. TDD is an organizational technique that only incidentally detects the occurrence of regression bugs on changes made to the code.