Sufficiency In Software Testing

 

When I was finishing up my B.S. I took a class in embedded software testing. The big assignment was to write the software that controls a single elevator, test the software to our satisfaction and deliver the whole shebang at the end of the semester. The critical lesson I learned from the course was not that the elevator software was difficult to write, but that there are an infinite number of odd and unfortunate events that could happen to any component involved, at any time, and there is no way to declare with 100% confidence that you have accounted for all possible defects.

So most software is not about perfection, but sufficiency. Everyones wants ultra-high quality, defect free wares, but at some point you must put down the keyboard and declare the product “sufficient” for release. Key problems: “How do you know when you’ve done enough testing?” And just as important, “When is the right time to test?”

This topic has been a open talking point at OpenRain. Marc is a strong proponent of many TDD/BDD principles and goes knife-throwing-freak-show when stuff isn’t well covered. (Ed. note: possible slight exaggeration… maybe.) I am also highly concerned with sufficient tests, but prefer a incremental approach and am wary to invest too much effort in automated tests up front for several key reasons.

  1. While development is underway, you incur unnecessary overhead to maintain tests developed before design stabilization. This overhead is inevitable during long-term maintenance, but the last thing I want to do on the project I started yesterday is refactor all my tests because I dropped a single column from the “users” table.
  2. When inexperienced developers write tests too early, they oft end up testing the dummy data and underlying framework, not your design. It is not our job as application-level developers to write test cases for all underlying dependencies, but since that’s all you have at the beginning of a project, it’s easy to waste time here.
  3. The benefits of writing tests first to flush out design details is diminished in dynamic languages. In Java, writing a quick block of pseudo-code to use your interface is a great way to explore your design from an “external” perspective. Once you’ve achieved design clarity, you can easily use your compiler errors to create correct interfaces. Dynamic languages such as Ruby, however, do not offer this compile-time help, lowering the benefit of the technique.  
  4. There’s no freaking way we’re checking in code that doesn’t compile. Sorry, but if I’m writing a Java unit test, there’s no way I’m putting up with 800 compiler errors (and no autocomplete) over the next day while I generate all my stubs. I don’t care if TDD says otherwise; it’s a stupid practice for statically typed languages.

Granted, if any of our systems crash, we probably aren’t going to irreparably harm anything except for my phone that goes flying across the room for ringing at 5AM, but we still have the issue of “sufficiency”. For OpenRain‘s Rails-based applications, I’ve been using the following philosophies on a personal level.

  • Models tests should be implemented first and as soon as possible. Validation logic and other constraints should be verified up front, as key bugs here will likely effect other code. Add sample data as necessary.
  • Only functional/integration tests for core use cases should be done early. Adding too many upfront tests to the yet-to-stabilize design tends to add maintenance liability before it’s able to pay itself off.
  • Tests for non-core features should be tested shortly after a brief “breathing” period, wherein others can comment on the design/code before you’re fully committed to it. Don’t waste your time with a massive test suite until people stop telling you it sucks.
  • Avoid complex methods of testing. Multi-threaded and singleton-based designs have inherent testing complexities, and should be designed out if possible.
  • Aim for 100% coverage in dynamic languages. Otherwise you won’t catch retarded bugs like syntax errors.
  • Have all known, likely and anticipated issues resulting in a significantly negative state covered by an automated case. This is, perhaps, the crux of my “sufficiency” perspective. You must have some mental benchmark that determines when you are “done”. This does not imply that all issues are resolved, only that they are tracked and, hopefully, all the significant ones are fixed.

I’d love to hear your thoughts on practical testing philosophy. Please let me know what you think!

 

Writing Good Error Messages

I received this little note from my Mac today.

mac_low_battery_warning.png

This made me feel all warm and fuzzy inside despite the interruption of my work because it satisfies my general criteria for displaying error messages to users.

  1. A graphical severity indicator is given so I know whether or not to care.
  2. It provides a succinct, human-readable desciption of the issue. (No “ERROR CODE: 23DD8” crap which is meaningless to the user.)
  3. An immediate, resolvable course of action is given to the user. Providing this makes the user feel empowered and accomplished for acting. Neglecting this makes the user concerned and irritated.
  4. A description of future symptoms is given for when/if the user does not take the suggested course of action. This gives the user reason to do what you’re asking.
  5. It shut up about the issue when I clicked OK and let the failure happen like it told me it would. When I noticed my mouse wasn’t responding I immediately remembered why.

The dialog is in stark contrast to this nifty gem constantly pooping out of my Solaris kernel..08-21-07_1354.jpg

“Pin widgit 27 is EAPD capable.”

WTF??? What the heck is a “pin widgit” and why do I care if it’s “EAPD capable”? Is this even a bad thing? Do I need to do something here? What happens if I ignore this, which I most definitely will since I have clue what it’s talking about? Why does it tell me this every time I start the machine?

Criteria failure on all counts. Bad computer!

Software Engineering Curse Words

images-1.jpeg

Here lie terms frequently used in software development which I don’t particularly care for.

Programmer

Commercial software is as much about programming as building bridges is about installing steel I-beams. Writing actual code is only part of the engineering effort. When I see a job posting entitled “Java Programmer” I usually suspect that this is either (1) a low-level monkey position and/or (2) the person behind the post doesn’t really understand the scope of typical developer work.

Developers are required–much unlike the mechanical nature of an assembly line worker–to make decisions and assumptions about the external purpose and internal nature of their work, often part of a seemingly ingrokable ecosystem. Unless you have a micro managing boss or a heavy-weight process itemizing every last byte of work, you must personally exercise critical thinking, time management and interpersonal skills to balance your never ending stream of unclear and incompletely stated priorities. Being a successful programmer thus requires much more than programming knowledge.

The Point: The term “programmer” in an inaccurate trivialization of the real job. I prefer “Software Engineer” formally and “developer” in colloquial usage.

Senior

For HR purposes, “Senior” is a nice way of labeling someone as having a bit more knowledge, responsibility, general weight, and more income than a non-senior person. The problem is that both senior and non-senior developers tend to have very similar job duties; so aside from income, the criteria of senior personnel are inherently qualitative, subjective, relative to a particular domain (read: not necessarily guaranteed to transfer being projects), and/or effectively indistinguishable from non-senior status.

The effect is that, in a matrix organization, a new project may end up with n00bs who are senior, experts who are junior, and a pay structure which reflects an old project now completely irrelevant to the current situation. Senior and non-senior developers often work together as peers, and everybody quickly figures out who the real leaders are. And that’s frequently very different from the formal structure and correlating pay grade.

The Point: “Senior” tells me that you’re expecting to make more and are probably good at something, which may or may not be relevant to me. It’s not a global implication of elevated wisdom.

Architect

Most “software architects” I’ve met do far more operational and project management than architectural design work. This isn’t to say that they don’t or aren’t capable of making significant design contributions to the project, but that all the overhead of email and meetings between business/team/customer/whomever members sucks up so much time that lower level engineers have to either make the design decisions for the architect or block indefinitely as the architect plays Inbox-fu.

The Point: If you’re an “architect” who doesn’t have time to sit down with the engineers and talk about design, you’re really a technical manager who needs to officially delegate the design work to avoid becoming a bottleneck for the team.

Resource

I shudder whenever I hear or use this word, usually in a managerial, Mythical Man Monthian context trying to quantize everyone into tiny cube shaped units. I find it so important to account for individual character when planning and estimating that I consciously use the word “people” instead of “resources”; it’s a simple trick to force yourself into remembering the undeniable human individuality of the worker bee.

The Point: People aren’t Legos, so let’s not pretend they are.

5 Roadblocks To Enterprise Rails Acceptance

rails.pngI love Rails for its pragmatic design and agile culture: two qualities not usually associated with the large, enterprisey systems of Fortune 500 companies. In my last formal position I was part of a small internal movement to drive the Rails train upward through the IT ranks, but the effort was met with limited success. The unfortunately reality is that Rails currently lacks several key qualities to which enterprise project leaders have become accustomed. Here are five reasons of varying significance to start us off.

Insane Query Support

Most documentation you read about ActiveRecord will take you through tidy, minimalistic examples which are squeaky clean and really fast. Complex queries, however, will be easier to do using Model.find_by_sql, which accepts a raw SQL query. Ordinary dynamic finds with deep loading behavior may require you to hard-code names in the query to avoid issues with the generated SQL. ActiveRecord is way easier to use, but far from Hibernate. I’d say that over 95% of the queries issued by a larger application are of trivial or medium complexity, but a lot of time and your best developers go into that last 5%, and this is where the heavier OR/M frameworks start looking better than ActiveRecord.

Distributed Transactions

The rise in SOA interest over the last couple years has led to more applications using multiple data sources. While it is possible to nest transactions, “Rails doesn’t support distributed two-phase commits (which is the jargon term for the protocol that lets databases synchronize with each other).” (From Agile Development with Rails, 2nd Edition.) In many situations, simply nesting transactions will suffice; however, many situations should really have the safely and reliability of two-phase semantics, and this factor alone could be a deal breaker.

Data Integrity

Database Designers (DBDs) like FOREIGN KEY constraints, CHECKs, high levels of normalization, and are the natural enemy of null fields. In other words, DBDs don’t like Rails. While I’m certainly no Pedantic Data Nazi (PDN?), there should at least be a basic set of built-in mechanisms for generating such simple self-defenses against naughty applications. Frankly I’m surprised that the community isn’t pushing harder for solid constraint support within migrations.

IDEs

This isn’t technically an issue with Rails itself, but a roadblock to its adoption nonetheless. Most Rails developers (including myself) appear to be using TextMate. A smaller population use RDT, Emacs, or numerous other packages. But there isn’t yet an application which comes close to the basic core feature of the popular Java and .Net IDEs. The currently broken breakpointer is another swift kick in the pants. What I can do with Eclipse on a remote application server isn’t in the same universe of functionality as the Rails breakpointer, even when it worked.

Top-Down Push

For whatever reason, CTOs and CIOs haven’t yet become seriously interested in Rails, and without this air of implicit exploratory approval, managers seem reluctant to give in to antsy developers. I would love to see Rails become a flagship of agile enterprise projects, but that’s not going to happen until management sees the real ROI of a project done by experienced Rails developers.

None of these things are insurmountable, but there are many more challenges to overcome if Rails will ever sit on the same application servers as Java and .Net. What challenges have you faced with Rails at your organization?

If A Unit Test Fails In The Woods, Does It Make A Sound?

No, it doesn’t. Unit tests that execute a large amount of code but fail to make assertions along the way give you a false sense of confidence in the code. They pass when they should fail. These problems, formally known as type 2 errors, are a huge liability for a development team because the tests are believed to be verifying the intended behavior of the software, but are really doing nothing in a really lengthy way.

For a new person maintaining the code under test, the problems worsen. The new maintainer will not understand what the code is supposed to be doing: what it’s currently written to do or what is implied by the possibly out-dated and incomplete unit tests. Good luck finding API documentation if the unit tests suck, and have fun with those future API changes when you must attend to the “unit tests” that need to be updated to successfully add no value to the project, just as they were originally written. No thanks.

Units test exist to prove that software is behaving as intended, not simply “mock” user actions. This means being particular about states of things during a process, and doing mean negative testing by passing nil into that function that clearly requires a non-nil value. The rule of thumb is this: if, for whatever reason, you cannot write, fix, or otherwise finish work for a correct and complete unit test, assert false. You have not proven the software works correctly, so it doesn’t work. Period.