testing – Preston Lee's Blog

When I was finishing up my B.S. I took a class in embedded software testing. The big assignment was to write the software that controls a single elevator, test the software to our satisfaction and deliver the whole shebang at the end of the semester. The critical lesson I learned from the course was not that the elevator software was difficult to write, but that there are an infinite number of odd and unfortunate events that could happen to any component involved, at any time, and there is no way to declare with 100% confidence that you have accounted for all possible defects.

So most software is not about perfection, but sufficiency. Everyones wants ultra-high quality, defect free wares, but at some point you must put down the keyboard and declare the product “sufficient” for release. Key problems: “How do you know when you’ve done enough testing?” And just as important, “When is the right time to test?”

This topic has been a open talking point at OpenRain. Marc is a strong proponent of many TDD/BDD principles and goes knife-throwing-freak-show when stuff isn’t well covered. (Ed. note: possible slight exaggeration… maybe.) I am also highly concerned with sufficient tests, but prefer a incremental approach and am wary to invest too much effort in automated tests up front for several key reasons.

While development is underway, you incur unnecessary overhead to maintain tests developed before design stabilization. This overhead is inevitable during long-term maintenance, but the last thing I want to do on the project I started yesterday is refactor all my tests because I dropped a single column from the “users” table.
When inexperienced developers write tests too early, they oft end up testing the dummy data and underlying framework, not your design. It is not our job as application-level developers to write test cases for all underlying dependencies, but since that’s all you have at the beginning of a project, it’s easy to waste time here.
The benefits of writing tests first to flush out design details is diminished in dynamic languages. In Java, writing a quick block of pseudo-code to use your interface is a great way to explore your design from an “external” perspective. Once you’ve achieved design clarity, you can easily use your compiler errors to create correct interfaces. Dynamic languages such as Ruby, however, do not offer this compile-time help, lowering the benefit of the technique.
There’s no freaking way we’re checking in code that doesn’t compile. Sorry, but if I’m writing a Java unit test, there’s no way I’m putting up with 800 compiler errors (and no autocomplete) over the next day while I generate all my stubs. I don’t care if TDD says otherwise; it’s a stupid practice for statically typed languages.

Granted, if any of our systems crash, we probably aren’t going to irreparably harm anything except for my phone that goes flying across the room for ringing at 5AM, but we still have the issue of “sufficiency”. For OpenRain‘s Rails-based applications, I’ve been using the following philosophies on a personal level.

Models tests should be implemented first and as soon as possible. Validation logic and other constraints should be verified up front, as key bugs here will likely effect other code. Add sample data as necessary.
Only functional/integration tests for core use cases should be done early. Adding too many upfront tests to the yet-to-stabilize design tends to add maintenance liability before it’s able to pay itself off.
Tests for non-core features should be tested shortly after a brief “breathing” period, wherein others can comment on the design/code before you’re fully committed to it. Don’t waste your time with a massive test suite until people stop telling you it sucks.
Avoid complex methods of testing. Multi-threaded and singleton-based designs have inherent testing complexities, and should be designed out if possible.
Aim for 100% coverage in dynamic languages. Otherwise you won’t catch retarded bugs like syntax errors.
Have all known, likely and anticipated issues resulting in a significantly negative state covered by an automated case. This is, perhaps, the crux of my “sufficiency” perspective. You must have some mental benchmark that determines when you are “done”. This does not imply that all issues are resolved, only that they are tracked and, hopefully, all the significant ones are fixed.

I’d love to hear your thoughts on practical testing philosophy. Please let me know what you think!

No, it doesn’t. Unit tests that execute a large amount of code but fail to make assertions along the way give you a false sense of confidence in the code. They pass when they should fail. These problems, formally known as type 2 errors, are a huge liability for a development team because the tests are believed to be verifying the intended behavior of the software, but are really doing nothing in a really lengthy way.

For a new person maintaining the code under test, the problems worsen. The new maintainer will not understand what the code is supposed to be doing: what it’s currently written to do or what is implied by the possibly out-dated and incomplete unit tests. Good luck finding API documentation if the unit tests suck, and have fun with those future API changes when you must attend to the â€œunit testsâ€ that need to be updated to successfully add no value to the project, just as they were originally written. No thanks.

Units test exist to prove that software is behaving as intended, not simply “mock” user actions. This means being particular about states of things during a process, and doing mean negative testing by passing nil into that function that clearly requires a non-nil value. The rule of thumb is this: if, for whatever reason, you cannot write, fix, or otherwise finish work for a correct and complete unit test, assert false. You have not proven the software works correctly, so it doesn’t work. Period.