Defensive Programming and proving the absence of bugs

Defensive programming and proving the absence of bugs

When I’m teaching a programming class, my students often hear me ask the question, “How do you know your program works?” I think many of them learn to dread that question, like when the dentist asks if you’ve been flossing. There are a number of ways to answer this question, but most of the responses I get when I ask it are more or less equivalent to, “Well, you know, you try it out, and you see what happens—and if it seems to do the right thing for all the cases you can think of, then you figure it is probably all right.” In other words, you test it; a nice, simple, practical solution.

No matter what kind of problem you are trying to solve, testing is an excellent idea. However, Jon Bentley quotes Edsger Dijskstra as having said:

“Testing can show the presence of bugs, but not their absence.”

Edsger Dijskstra

A moment’s consideration should convince you that Professor Dijkstra was correct—if you test your solution and it fails, surely there is a problem. A test that passes, however, says at most, “the problem is not here.” To truly verify that a computer program will always give the correct answer is a much more complicated process than simply testing it. Nevertheless, it seems to me that many programmers equate “testing” with “verification” when speaking about the correctness of their programs. This equation is false, however, and that the assumption of its truth is behind a large proportion of the bugs found in software in the wild.

A couple of months ago, I was talking with David Aucsmith from Microsoft’s Security unit, who made an interesting argument: He says that if we want to be able to write software that is secure and bug-free, then throughout the development process, we need to keep in mind that we have an adversary. Not just one adversary, in fact, but many. An adversary is someone or something that acts contrary to your best interest — at best, an unsympathetic user; at worst, a malicious programmer with mischief on his mind. As I see it, a programmer’s adversaries fall into three general categories: The problem, the users, and anybody else who wants something from the users. 

The first adversary is the problem itself. You want the computer to do some work for you, and you need to instruct it. This is what we usually think of as programming—identifying problems, choosing algorithms, and implementing them. This process can be challenging, but that’s the meat and drink of software development. But now, let’s take the adversarial view, by personifying the problem: Let’s say that the reason development is challenging is that the problem does not want to be solved. This sounds like a silly idea, but stick with me for a moment: A problem that doesn’t want to be solved is swift and clever, but not very strong—it won’t attack you, but it will twist and hide. It will resist easy definition, and throw up straw-men that expose only its simplest, easiest cases, to lead you down the garden path. It will lure you toward definitions that are clear, simple, and wrong, and tantalize you with premature optimizations.

When knowingly faced with such a problem, you have to be prepared to ask yourself, “How can I be sure I really solved this problem?” After all, the problem is trying to avoid you—did you really get the hard cases right? Were your assumptions all sound? Did you leave anything out? If you can’t answer those questions to your own satisfaction, and that of your employer and your customer, you’d better go back to the drawing board.

The second adversary is the user of your program. The user wants the computer to process some data, and your program purports to do that for her. She doesn’t want to think about the details of algorithms and implementation; she simply wants to say, “Here are my data, here’s my question…go!”

She doesn’t hate you, per se, but she has no desire to spend hours learning about XML templates and query languages; ideally, it would be enough to put your program and her data in the same location with a couple of drinks and a bag of pretzels, and let them figure out the details on their own. She knows that can’t happen, but she wants maximal results for minimal complexity.

She also expects the program to be forgiving about the format of its inputs, and to give her the chance to back up and correct mistakes before doing anything irrevocable. She’ll give your program bad data, abuse the user interface in ways you never intended, and practically never read more than about a page of the documentation you so carefully wrote.

You do write user documentation, don’t you? Well, you’d better start. More importantly, however, she’ll use your program to solve problems whose implications you never considered, and even if you wrote a careful disclaimer to protect yourself from legal liability, it’s possible you could be morally liable if something really bad happens.

Defending against users has to begin at the earliest stages of the design process. You can’t simply solve the underlying problem, then go back and slap together an interface for it (although we’re all guilty of doing so); you need to give the user a mental model for how your program works, so that she will be able to figure out how to use it without breaking it.*

The third adversary is anybody who wants something from the user of your program—and I do mean anybody. If your software gets popular enough, even advertisers might want something from it. But what I’m really talking about here is the malicious species of hacker, who wants your user to give him control of her computer so he can work his wicked will.

He is looking for the flaws in your assumptions, the errors in your design, and the bugs in your code. He’s going to read your source code, or, if the source isn’t available, he’ll disassemble your object code and reconstruct your algorithms that way.

For this adversary, it is no longer enough for you to simply state your assumptions, you will also have to check them. If you don’t, he will find a chink in your program’s armor, and use it to get inside somebody else’s computer—or maybe even your own. At a minimum, he might want to embarrass you; at worst, your software might turn out to be the back door into World War III.

So, in short, you have at least three adversaries: A problem that doesn’t want to be solved, a user who will use the solution in unexpected ways, and an attacker who is going to pick apart your solution looking for ways to break it.

This doesn’t sound like programming, it sounds like politics! And this doesn’t even begin to cover attorneys, who are professional adversaries for just about everybody (including each other), and who will happily chew your clothes off in court if you give them a reasonable excuse.

The point I’m trying to make here is that we can’t afford to write programs as if everybody out there loves us and wants us to be happy, because not everybody does. This is not a viewpoint we customarily teach our students, although you could argue that “adversarial programming” is basically just a stronger take on “defensive programming,” which we do supposedly teach. 

At the universities, we do talk a bit about defensive programming, but rarely ever really require our students to apply it in their classwork. In part, this is because education is supposed to be a cooperative effort, not a competitive one.

One unfortunate result of this neglect, however, is that university graduates in computer science leave with plenty of knowledge about data structures, algorithms, architecture, networks, and theory, but very little (if any) practise at programming defensively.

Typical example programs in a computer science course make unstated assumptions about input values, use I/O without checking for errors, make algorithmic simplifications that destroy the generality of the solution, and almost never come with a clear and precise description of what it means for the solution to be correct.

So what makes a program “correct?” There are two important features: It must do all the things it is supposed to do, and it must do none of the things it is not supposed to do. Proving that a complicated piece of software is truly correct is not at all easy.

So, how do you know your program is right? In practise, you test it. But testing is not the same as verifying its correctness. You test by creating test cases: You run your program with carefully-chosen inputs, and examine the program’s outputs. If the program produces the expected results (or behaviour), you would say the program passes the test; if not, the program fails the test. 

Generating good test cases is a dark art. Simple cases can often be generated by hand—simple inputs or workflows whose results can be easily predicted. But the point of using a computer is so you can solve big, hard problems, whose answers are not easily obtained by manual labour—so we often build simple test cases by hand, and simply hope that the more complicated cases will be okay.

Another common approach is differential testing, which basically means you run a fixed set of test cases with several different programs purporting to solve the same problem, and compare their answers. If they all agree, you consider it a good result; otherwise, you take a majority consensus (if any) as the “correct” result. But all of this sidesteps the basic problem, which is how to know you’ve chosen good test cases in the first place.

There are a few ground rules that should apply to any testing scenario:

Vary all the input parameters. If possible, choose tests that vary each input parameter independently of the others. This can easily make the combinatorics of testing grow out of hand, but you should isolate as much as possible.

Execute every path. Every possible path through your code should be tested; if a conditional has n branches, that means you need n tests targeting that conditional.

Generate every error. Every possible error condition your program generates should be tested. This may require a test framework in which “error” is the correct answer.

Cover interesting segments of the input range. There are often a variety of different “classes” of input for a problem; you should be sure to get some test cases for each important class.

Generate out-of-range inputs. Whether or not the program is designed to handle bad input, you should generate some test cases with bad input. In such cases, a “correct” program should neither crash nor give a misleading result.

This adds up to a lot of tests to write. However, if you’re serious about making sure your programs work, and you aren’t willing to invest the effort of proving your programs correct formally, this is your only real option. Plus, good test hygiene will make it easier to fix bugs you find later. Axiom: Every bug hides behind the word “should”—either the program should do something but doesn’t; or the program should not do something, yet does. 

Even if you design your tests very carefully, you can get in trouble, however: I once wrote a library to perform arbitrary-precision integer arithmetic in C. To test it, I generated a large set of test cases by choosing random values and computing the “expected” outputs with GNU bc.

As a primitive form of differential testing, I cross-checked them by running the same tests in DrScheme. From these, I chose tests that exercised every line of code in the library. Several months later, I had a bug report which turned out to be caused by a mistake in one of the assumptions I’d made in implementing integer division; although the tests had all passed, I hadn’t found this bug simply because it only affected numbers with a certain structure.

Perhaps I ought to have recognized that this was one of the “interesting” input ranges, but I had not, and so this bug escaped all my careful tests.

Some developers advocate a somewhat more extreme approach to development, known as test-driven development (TDD). The idea behind TDD is that you design your tests before writing any code, and develop your code to meet the tests.

Somewhat surprisingly, this strategy seems to work quite well in practise; it imposes a discipline of writing code that can pass interesting tests, and forces us to build programs that can be regression-tested. I think TDD is basically a good idea, so long as you program “in good faith.” As a limiting case, you could write a program that simply compares its input to the test conditions, and gives the correct response; such a program would “pass all its tests,” but would violate the spirit of TDD.

The point I’m trying to make here is that writing correct programs is difficult, and while testing might seem to be an easier way to get there than logic and proof, “there ain’t no such thing as a free lunch.” If you shortcut the process with weak testing, you’re not just getting off easy, you’re abdicating your ethical responsibilities as a programmer. In a world where software controls more and more of the critical infrastructure we depend upon, from power and banking to medical treatment, we cannot afford to ignore those responsibilities.