Functional testing
Introduction
Unit testing is a great help during the development process. It will help us spot problems introduced by code changes immediately after they have been introduced. They help test the functionality at the level of individual functions and methods.
However, to test an entire application that is, e.g., run from the command line with various parameters, unit tests are not really the right tool. The literature on software development refers to this type of testing as functional testing. Tests are more coarse grained, and are likely to take longer to run than you are comfortable with during an intensive edit/test/commit development cycle.
Hence it is acceptable to run functional tests less often, for instance only when a feature has been added to the software, or a bug has been fixed that may have involved a considerable number of file edits and commits. We rely on unit testing to ensure that this process didn't break low-level integrity of the code.
Ideally, functional testing is done automatically for a release with an online
tool such as Travis CI or GitHub Actions for continuous integration.
However, you can also do functional testing locally using, e.g., shunit2
or
CTest
.
Functional testing can detect code defects that would go unnoticed by unit testing, so both testing strategies are complementing one another.
Best practices
Unit testing is an invaluable help for the developer since it catches bugs introduced when the code base changes. Tests can be executed easily and are run frequently.
However, unit tests typically concentrate on the low level functionality of the software project. They test whether individual functions behave as expected. This is white box testing, since the tests are developed with access to the "innards" of the software under test.
In some circumstances, this may be all that is required, e.g., when developing a relatively small or very focused library. In many cases though, unit testing is best supplemented by functional testing.
The point of view of functional testing is opposite to that of unit testing since functional tests will focus on the application as a whole. Are the results for a sophisticated use case reproduced as expected? Does the application's user interface, command line interface (CLI), or graphical user interface (GUI) behave as expected? Are options handled as expected? This is often called black box testing since only the user interface is accessed.
Functional testing can also be applied to third party applications that are part of a workflow. For instance, suppose that your application relies on the output of another application not developed by you. If the output format of that application changes from one version to the next, running a functional test will make clear whether there is an impact on your workflow, and you may fix problems by adapting your application.
The best way to do functional testing is by using a continuous integration workflow. When the functional tests are run, first a container is prepared with the required operating system and software stack. Next, your software is built within the container, so that the environment is completely controlled. If the build succeeds, tests are run. A report is generated to show failures if they occur.
Note that it is possible to set up a matrix of operating system versions and compiler versions to ensure that your code will build and executed cleanly on a wide range of software platforms.
Both GitHub and Gitlab support continuous integration that can be used for functional testing.
The question remains how to code the actual tests that will be executed by the continuous integration system. A convenient way is to reuse the unit test paradigm, but now on the level of the shell. In other words, the unit tests will be relatively short shell scripts that invoke your application using various parameters and input data, and verify the results.
The shunit2 framework provides a nice framework for this purpose. It provides similar functionality as the unit testing frameworks for specific programming languages. However, from the point of view of the software project this is black box, rather than white box testing.
The same concerns as for unit testing apply. For instance, it is important that the tests cover the use cases as well as possible. Here too, code coverage can be a great help to detect which application aspects are tested, and for which additional tests need to be implemented to improve the coverage.