Testing¶

Based on Chp 8 of Sommerville “Software Engineering” (9th edition)

Verification and Validation¶

testing is one aspect of a larger process known as verification and validation (V & V)

Validation: “Are we building the right product?”

i.e. does the software meet the customers expectations?

Verification: “Are we building the product right?”

i.e. does the software meet its requirements?

the ultimate goal of V&V is to determine if the software is fit for purpose, i.e. if it is good enough for its intended use

Inspections¶

inspections are when a group of people carefully discuss some aspect of the system under development

almost any aspect of system could be inspected, e.g.

requirements
design documents
source code

in some software groups, no source code can be checked in until the original programmer gets at least one other programmer to read through their code

code inspections can be an extremely effective way of discovering all kinds of issues in source code

since humans are doing the inspection, they might notice things like:

missing features
unneeded complexity
poor or missing documentation
poor organization
inconsistent coding style
etc.

Testing¶

experience and reports suggest that manual code inspections are very effective at finding many kinds of code defects, and they seem to be well worth the time

however, inspections can’t completely replace testing because

testing can catch cases that can be hard for humans to notice
testing can be done automatically and more quickly than inspections

the goal of testing is to demonstrate that a program meets its requirements, and is free from defects/bugs

validation testing is when you test a system to show that it meets its requirements

defect testing is when you test a system to show that it has no defects/bugs

Unit Testing¶

during development, unit testing is testing of a single unit of code in isolation from the rest of the program

a “unit” could be a function, class, module, etc.

generally, artificial data is somehow created

usually hand-crafted, or randomly generated

then the test data is run through the unit, and the results are checked to see if they are correct

correctness could be checked by manually including correct results
or sometimes by comparing to another unit that does that same thing, e.g. you could compare a new sorting function to an older, slower one

Choosing Tests Cases¶

generally, you want test cases that tend to find defects

test cases that never find any defects are candidates for removal!

blackbox testing is testing that is done by looking just at the specification of the unit, and without looking at the implementation

thus if the implementation changes, the blackbox test cases can be re-used

whitebox testing is testing that is done looking at the implementation of the unit

whitebox testing often aims for code coverage, i.e. enough test cases to ensure that every line of code (or even path through the code) is executed at least once
when the implementation changes, whitebox test cases may need to be changed as well

testing principle: use extreme values, e.g.

extreme values for a number: 0, 1, -1, max num, min num, epsilon (smallest possible positive value), NaN (not a number), Inf (infinity)
extreme values for a string: “”, single-digit string, a very large random string, a string with all-same values, etc.
extreme values for an array: empty array, single-entry array, a very large random array, and array with all-same values, etc.

testing principle: partition testing

partition the input space for the function into categories relevant to the unit being testing
choose one candidate from each partition

for example, you might could have an input space partitioned like this, where a * represents an input chosen for testing:

+-----------------------------------+----------------------------------+
|                                   |                                  |
|                                   |          *                       |
|                                   |                                  |
|                                   |                                  |
|                                   |                                  |
|             *                     |                                  |
|                                   +----------------------------------+
|                                   |                                  |
|                                   |                                  |
|                                   |                                  |
|                                   |                                  |
+--------------+-----+--------------+                   *              |
|              |     |              |                                  |
|              |     |              |                                  |
|  *           |     |              |                                  |
|              |   * |              |                                  |
|              |     |              |                                  |
|              |     |              +----------------------------------+
|              |     |                                                 |
|              |     |                                                 |
|              |     |          *                                      |
+--------------+-----+-------------------------------------------------+

testing principle: choose test cases is to pick inputs near boundaries, e.g. the +s in the diagram would likely be good test cases

this is a generalization of extreme value testing

testing principle: test important requirements of the system

in most system, some uses case are more important than others, so it makes sense to explicitly test the most important ones

testing principle: test error messages and failure cases, not just successes

how a system handles failure is often very important, and so should be tested

testing principle: stress testing

stress testing is when you test a unit by running lots of test on it
e.g. if you might stress test a server to see how it responds to a lot of simultaneous users

Automated Testing¶

a lot of software unit testing is done using hand-crafted test-cases created by developers

this can be tedious, and even the best testers can miss defects hiding in unusual situations

also, the process of running and checking test cases is tedious and exacting work

humans can do all of this manually, but probably not for long!

people get bored and tired
management might but time constraints on development that make it impossible to spend and time on testing

so in practice it is important to automate test cases whenever possible

some developers would go so far to say that if you don’t have automated testing, you don’t really have any testing at all!

this makes easy to run (and re-run) test cases with very little effort

plus it becomes possible to keep statistics, e.g. what test cases are good at finding defects

there are various unit testing frameworks for most programming languages that automate at least some testing — use them!

Property Testing¶

one interesting and effective way of automating testing is to test properties of a unit

for example, suppose you are writing a fancy new sorting algorithm

one of the properties of any correct sorting algorithm is that sort(sort(v)) == sort(v)

there are other properties that you could test, e.g. sort(v) == v if v is already in sorted order, or sort(v).size() == v.size()
choosing the best properties to test is a bit of an art

property testing is where you test that this property holds for sort

typically, all you need to do is state that sort(sort(v)) == sort(v) holds, and the test cases are automatically generated and run

random inputs to sort are created

since the testing is random, there are no human-like biases, and so this can sometimes catch unusual errors that humans never even think to look for

these are then tested to ensure that sort(sort(v)) == sort(v) holds for them

if the test case fails, then the input is shrunk to a smaller and simpler input that is easy for a human to use for tracing through the unit

random data is often long and messy, and hard for humans to work with
the shrinking is typically done by using various simplification heuristics
can be quite helpful (and surprising!) when you find an extremely small and simple test case that fails

the idea of property testing was popularized in the QuickCheck package for Haskell

many other languages have borrowed the idea (or parts of it)
for example, the Python Hypothesis package is a very easy-to-use property testing framework, e.g.:
```
@given(st.lists(st.integers()))
def test_reversing_twice_gives_same_list(xs):
    ys = list(xs)
    ys.reverse()
    ys.reverse()
    assert xs == ys
```
- the @given annotation at the top is used to generate the right kind of test data
- then the assert is checked for 100 random lists
- the developer does not need to create the test cases or even run them — it is all automatic after this testing function is created

System Testing¶

system testing is about testing the entire (nearly) complete system, not just isolated units (as done in unit testing)

it’s often the case that systems can pass all unit tests, but then fail in unexpected ways when the units are connected

systems testing need not be done by the developers

in practice, you may ask a subset of real users to be “alpha” testers or “beta” testers
they essentially become early users of a partially complete system, and help find defects, evaluate features, etc.

systems test may ask testers to

work through a set of use cases
use it an ordinary way that reflects their regular expected usage
purposefully try crazy things in order to stress the system
- e.g. video game testers might be asked to play a racing game going reverse the entire time

another kind of user-oriented system testing is acceptance testing

the idea of acceptance testing is to ask users to try an essentially complete and finishes system with the goal of discovering if they “accept” the system, i.e. if they like using it
acceptance testing is not about finding defects, or checking if the requirements are met
- it’s possible that requirements are met but users don’t accept it!
leaving acceptance testing to the very end of software development could be a disaster
- if users don’t like the system, you want to try to figure that out as soon as possible

When Should You Test?¶

the traditional waterfall model puts testing at the very end of development

but that’s often too late

testing at the end of a project is sometimes called big bang testing
big bang testing is usually bad because if it fails there isn’t really time to go back and fix things

in practice, it is usually better to interleave testing and development

the sooner you get the feedback that testing gives you, the better

you will have more time to fix things that aren’t working

Question: Is Testing Enough?¶

suppose you have a function f(n) that takes a 32-bit integer n as input

suppose you test it by calling f on all \(2^{32}\) inputs, and you verify that f returns the correct answer for all those inputs

can you conclude that f is correct?

it would seem so — all possible inputs and outputs have been checked!

but even such exhaustive testing can miss bugs, e.g. what if f were this function:

// Pre-condition:
//    none
// Post-condition
//    returns 5
int f(n) {
  if (rand() == 0) {
    return 6;
  }
  return 5;
}

suppose the rand() function returns a non-negative integer less than \(2^32\)

that means that once in every 4.3 billion calls to f, it will return the wrong value

it’s possible that stress testing could catch this — but that would be a huge amount of testing for one very simple function

more practically, a code review would be the best way to find this sort of error
if you saw this function in a code review, you would hopefully at least ask a question about why it is the way it is

of course, this particular function f is not realistic

but it is representative of a nasty kind of error that can occur in real life
for example, race conditions can occur in concurrent systems where hard-to-reproduce bugs show up at seemingly random times
or some unusual set of circumstances could cause a strange bug, e.g. maybe date/time code goes wrong during daylight savings time in locations with non-standard time zones?

Testing¶

Verification and Validation¶

Inspections¶

Testing¶

Unit Testing¶

Choosing Tests Cases¶

Automated Testing¶

Property Testing¶

System Testing¶

When Should You Test?¶

Question: Is Testing Enough?¶

Table Of Contents

Previous topic