Program testing is virtually the only means utilized to assure correct programs. As we have noted elsewhere, the software industry re-establishes the practical inadequacy of this approach to creating correct programs on a daily basis. What we observe here is that this is not just due to insufficient care in testing (although this certainly adds to the problem). The approach itself simply lacks the force to guarantee correctness. We illustrate the shortcomings of testing with a sample program.
We imagine test runs of this program with the following data:

  1. A=4, B=3, C=5, D=2
  2. A=3, B=4, C=2, D=5

These two test runs both produce the correct result, M=5, and cause all the code in the program to be exercised. Hence one might conclude the program is correct -- one would be badly mistaken in doing so! There is a blatant flaw in the program's logic -- have you spotted it yet?

Perhaps the problem is that while all the code of the program has been exercised by the two test above, all the paths have not. The interaction between different program parts is determined by the path followed to reach various points. So testing really needs to exercise all the paths in a program to be thorough. This is a daunting task for large programs, but is still not sufficient. Our example program has only four paths, so we can exercise all the remaining paths with just two more tests:

  1. A=4, B=2, C=3, D=5
  2. A=3, B=4, C=5, D=2

Both these cases also produce the correct result, M=5, and now all the paths have been exercised and still the huge error is undetected! "Testing can reveal the presence of errors, but not their absence." -- E. Dijkstra