Session 26: Testing and debugging

Testing
White-box testing
Black-box testing
Large-scale testing (Section 10.10)
Unit testing
Integration testing
Regression testing
Debugging
Hand traces
Print statements
Debuggers

Testing

Solid testing requires the following three steps.

Generate a test case.
Determine what the results should be.
Run the program to observe its results.
Observe how the program's results differ from expectations.

This is really very similar to the classic scientific method (hypothesis, procedure, results, conclusion).

Testing falls into two categories: white-box testing and black-box testing.

White-box testing

White-box testing involves generating test cases while looking at the code. Generally, you're looking for a large enough set of test cases to hit all the cases in the code.

Hit all execution paths. Ideally, you'd have enough test cases to exercise every possible way you might hope to go through the program. This, however, is impractical for all but the smallest programs: the number of possible execution paths doubles for every if statement, and loops are even worse. It's just not reasonable.
Hit all statements. Hitting all the statements of a program is a more realistic scenario. You'd write enough test cases so that every single statement of the program is executed at least once.

For example: Consider the following piece of code to find the maximum of 5 numbers typed by the user.

int max = IO.readInt();

for(int i = 0; i < 4; i++) {
    int q = IO.readInt();
    if(q > max) {
        max = q;
    }
}

If we were trying to hit all execution paths, we'd have to cover all 16 combinations of hitting/missing the if condition. Such examples could include the following.

1 1 1 1 1     1 1 1 1 5
1 1 1 4 1     1 1 1 4 5
1 1 3 1 1     1 1 3 1 5
1 1 3 4 1     1 1 3 4 5
1 2 1 1 1     1 2 1 1 5
1 2 1 4 1     1 2 1 4 5
1 2 3 1 1     1 2 3 1 5
1 2 3 4 1     1 2 3 4 5

If we modified the program to find the the maximum of 20 numbers, we'd have 2²⁰ > 1 million different cases. This is just not reasonable.

If you settle for the second case, then there is just one case to test: And that case must have the if condition be true at some time.

1 2 1 1 1

Of course, this isn't as thorough, but it at least checks the fundamentals: That the statement wasn't a complete catastrophe.

Black-box testing

In black-box testing, the tester generates test cases without reference to the source code - that is, the tester is treating the program as a black box, into which the tester cannot look.

Beta testing obviously always involves black-box testing. But even the original software developer does this. In fact, it's probably the primary kind of testing you've been doing on your laboratories: Once you have the program coded, you run it by acting like a regular user.

Good black-box testing will include tests falling into three categories.

Hit the common cases. This is the easiest one to do: You just try using the program like a normal user, looking for anything that doesn't work right. This is important, since usually (but not always) users will behave normally, and bugs are particularly embarrassing if they are doing nothing odd.
Hit any boundary cases. Many problems have extremes. Generally you should try tests that exercise these extremes. For example, the video store could hold from 0 to 100 videos. I would certainly want to test whether the selvideo command works when there are 0 videos in the store. If I wanted to be really complete, I'd try filling up the store and adding a 101st video. (Actually, I'd change the constant to 5, recompile, and try adding a 6th video.)
Try things in odd sequences. Often programmers will assume a user will try to do things in a particular sequence, without thinking about what happens in other cases.

For example, in the video store program, you might assume that the user always selects a customer before trying to assign a video to the currently selected customer. You ought to try running the case of trying to assign a video to the currently selected customer before any customer has been selected. Chances are good that this would crash the program.

After finding a bug in black-box testing, it's often a good idea to try to prune the test case down to try to determine exactly what's going on. You'd do this before you even begin to try to debug, because the simpler test case will generally illustrate the actual problems better.

Large-scale testing

Textbook: Section 10.10

With large-scale programs (of more than 100,000 lines, built by teams of programmers), it's not appropriate to wait until the program is entirely complete to begin testing.

Unit testing

In unit testing, each piece of the program is thoroughly tested before it is accepted. In Java, the most convenient way to break up a program into pieces will be into its separate classes. For example, for the video store program, you would write individual tests of the various classes (Customer, Store, Video, and Main) before putting them together.

This necessitates writing new classes whose sole purpose is to test others. For example, you might write the following program to test various the checkOut method of the Customer class.

public class CustomerTest {
    public static void main(String[] args) {
        Video[] vids = { new Video("A"), new Video("B"), new Video("C"),
            new Video("D"), new Video("E"), new Video("F") };
        Customer test = new Customer("Me");
        for(int i = 0; i < 5; i++) {
            try {
                test.checkOut(vids[i]);
            } catch(Exception e) {
                System.err.println("Unexpected exception on " + i + ": " + e);
            }
        }
        try {
            test.checkOut(vids[5]);
            System.err.println("Exception not thrown when limit reached");
        } catch(Exception e) {}
        Customer other = new Customer("You");
        try {
            other.checkOut(vids[0]);
            System.err.println("Exception not thrown when video already checked out");
        } catch(Exception e) {}
    }
}

It's not uncommon to have the code for the unit testing to be longer than the code it is meant to test!

Unit testing is problematic when there are dependencies between pieces. For example, there may be different people in charge of the Customer and Video classes. This causes a problem for the person writing the Customer class, as it cannot even be compiled until the Video class is complete.

To get around this problem, the Customer author would write a short stub class, which simply defines non-functional methods that Video is to provide. Then at least the Customer class should be able to be compiled. But this isn't adequate for testing purposes.

Integration testing

This dependency problem is resolved by integration testing. Integration testing requires that you draw a picture of which classes use which other classes, called a dependency graph. For example, for the first Drawer lab, you might draw the following picture.

From this, you could work out an order in which individual classes can be tested. Here, we would have to start out with Rectangle, then move to Drawing, then Canvas, and finally Drawer.

The dependency graph quickly gets much more complex as you add more classes to a program. Here's a dependency graph for the second part of the drawing lab.

And, in fact, that laboratory assignment is constructed with a view toward integration testing: You will successively build in new pieces that are relatively independent of each other.

Regression testing

When a software system is relatively complete, and the designers are engaged in incrementally adding new features, they often use regression testing. In regression testing, the developers build up a large library of tests associated with the program. Preferably, these tests will be automated.

When a developer thinks a feature is complete, the developer submits the modifications. But before they are accepted as valid, all the regression tests in the library are run to test whether the modifications break any existing programs. You don't want to accept a modification if it ends up introducing bugs into the system.

In very large systems, regression testing is often an nightly job, executed every night when the developers aren't using the computers.

Debugging

Hand traces

Tracing through the code by hand, to see how variables change, is extremely common - much more common that you might initially think. It's just much easier to trace through the code than to repeatedly recompile and run a test case.

Print statements

Adding print statements is another useful technique. You may think that it's antiquated, but it's in wide use and will continue to do so. It's just so simple.

Some useful tips for deciding where to put your print statements:

Print at the top of relevant methods. This is to get an idea of which methods are actually getting called. You may often find that something is getting called unexpectedly, or that something that you expected to be called is not.
Print relevant variables after each change. If a variable seems to be getting a bad value, I would print it out after each time the variable changes. This is particularly significant with instance variables (and even more so with class variables), as the sequence in which their value changes tends to be circuituous.
in front of code that you think is wrong. If you have some code, and you think something is going wrong in there, it's often worthwhile adding surrounding code and verifying that the code is indeed going awry. Often enough, you'll be wrong about where the problem is, and all the time you would have spent search would be wasted.

Debuggers

There is another tool called a debugger. I don't want to overemphasize its usefulness - I use a debugger far less frequently than I use hand traces and print statements. But a debugger is still often useful.

Good debuggers have at least the following two features.

Go through code statement by statement. This helps trace the program's behavior better, one step at a time. You have to issue a command to tell it to go onto the next statement, and you issue this command over and over. The debugger also allows you to look at the call stack and to observe any variables' values.
Add breakpoints into the program. A breakpoint is a line of source code designated by the programmer. When you run the program using the debugger, the debugger will pause the program each time it reaches a breakpoint, allowing the programmer to take control and observe the call stack, variable values, and try tracing through the code.

Forte has a debugger built into it. I'll demonstrate it in class.