Exercise 3

Test Adequacy via Mutation Analysis

For this project, you shall perform mutation analysis studies of existing test suites for open-source software projects. We shall be using two different mutation testing frameworks, PIT for Java code and Stryker for TypeScript, JavaScript, or C#.

You shall work in groups of three to four students (no fewer than three), but the overall requirements scale with the group size. Part of the reason for having groups is that I find students often slip through the program without critical skills for building or running software from source code. These skills are critical for most exercises from this point on, but I will not provide explicit instructions for them within the exercise itself. You may find tools like nvm useful if you work within CSIL.

Getting the tools running will require reading a bit more documentation on top of that.

Selecting 2 Projects to Analyze

You will analyze one project with PIT and one project with Stryker together as a group. This means that you need to select one project for each tool to analyze with the respective analysis tools.

The following requirements apply to the open-source project.

It should be present in the Open Hub database ([Java], [JavaScript], [TypeScript], [C#]). Exceptions to this are welcome but must be approved by the instructor. If you are interested, ask.
The code base of the analyzed project should be at least 5,000 lines of code (5 KLoC). The sizes of most projects can easily be found via OpenHub or via tools like sloccount.
A substantial test suite must exist for the project. The assignment will be easiest if the selected Java project uses JUnit to manage its tests.

In finding which projects to analyze, you should identify and consider the following attributes:

Identification of the open-source project.
Size of the code base (lines of code).
Proposed evaluation platform (OS, language).
Build time to compile and link an executable from source code.
Test suite infrastructure (JUnit?, Jasmine?, Mocha?, …).
Number of tests, lines of code, & execution time for the test suite.

Make sure to include this information in your reported results. If you have questions about whether a particular project is a good choice, identify these attributes and ask.

A key part of this task is making sure that you can consistently compile the project (if applicable) and run its test suite. I will expect you to have or acquire competencies in building and running a project.

Running the analyses

Overall, you should find both PIT and Stryker easy to use, but the process of mutation analysis can be time consuming depending on the particular project, the nature of the test suite, and the way in which each mutation analysis infrastructure performs its analysis. Some tools are able to analyze many mutants in parallel or perform lightweight analysis to know that mutants will behave the same way on a particular test. Other tools perform the entire analysis sequentially on a single machine. These tend to incur a substantial(!) overhead.

Make sure you give yourself enough time to deal with the unknown unknowns.

You can find more detailed instructions for running PIT here. You can find more detailed instructions for running Stryker here.

Both tools will provide different ways to interface with the results and see the mutants that were not killed during the analysis. You will want to experiment with both in order to get acquainted with them and make sure that you can interpret the results meaningfully.

Group Analysis and Results

The group as a whole will perform mutation analysis for both projects in their entirety.

In reporting your results from PIT and Stryker, you should include,

The reported line coverage (numerator and denominator)
The reported mutation coverage (numerator and denominator)
The output coverage reports that provide clear evidence of usage

How do your results from PIT and Stryker compare? What do the results tell you about your test suite? Which system did you find easier to use (both integrate and interpret) and why? Discuss these issues in your write-up.

For your overall results, make sure to address the following issues.

Do the test suites exhibit weaknesses? How can they be improved?
Do the test suites exhibit strengths? How do you recognize them?
Contrast the costs and benefits of mutation analysis and testing versus what you might expect from other techniques.
What was easy and what was difficult about applying the tools to the projects you chose?
What obstacles did you face in applying mutation analysis to a real world project, and how did you overcome them?
Did you have any other interesting insights or opinions on the experience?

Note, the above points are not yes or no questions. Present some evidence and justify your answers.

Individual Analysis and Results

Each group member individually shall also choose one method from one of their open source projects and consider the results of the mutation analysis specifically for that method. Choose a complex routine that you think is likely to have errors. If fewer than 10 mutants were generated for the method, then either select a different method or select additional methods until you have 10 mutants to consider.

Examine 10 of the generated mutants for your method(s). If both killed and unkilled mutants were generated, include a mix of both. For each one, document your mutation operator. What was the type of operator used? How was it applied to the code (how did the code change)?

For your individual mutants consider the following additional questions: How many mutants are killed? How many mutants are live? For every mutant that was not killed, try to determine either (a) that it is an equivalent mutant that should not be killed, or (b) how to add a test to kill it. Note that (b) involves a test case for a function with inputs and an oracle. For all mutants, try to determine whether they are duplicates of each other. What are the challenges involved? Does it affect the results? Calculate and report the mutation score / effectiveness for your particular mutants. What do these results say about the effectiveness of the test suite and the method(s) that you condidered?

Submission

Each group shall submit a final writeup including descriptions of the selected project, the group results, and the individual results as discussed above. Include the package summaries from PIT. Include the mutation reports from Stryker.