Write good Python tests

Python testing practices aimed at producing maintainable code bases with test suites that are understandable, brings confidence to the correctness of the program, and that in general optimizes for smooth future interactions with your code.

This article lists rules to follow to achieve the above-mentioned properties, focusing on how to write good unit tests. There are generally plenty more aspects to take into account, such as using static type checking, and complementing unit tests with integrated testing. Those are aspects that are not within the scope of this article.

Organizing test modules

Unit test modules are structured to mirror the tested source code itself.
- A module named mypkg.some.mod has tests that live under tests.mypkg.some.test_mod.
- If a project exposes a single top-level package, this can be shortened to e.g. tests.some.test_mod.
- Complex units can warrant their own namespace, in that case add tests.mypkg.some.mod.test_complex_unit for a unit named mypkg.some.mod:complex_unit.
Each function or method under test maps to a single test component.
- If a function is simple, it can have a single test function.
- If a function requires more than one test function, either:
  - Wrap those tests in a dedicated class for the unit.
  - Introduce a dedicated module.

Test names reflect expected outcomes

Write test names that do not describe the input to the unit under test, instead, write test names that describe the expected outcome of the test.

The reason for this is that it much better communicates the original intent of the author of the test. If there is a bug in the code, being met with a name that clearly communicates what the original author expected the code to do, can immensely help in quickly understanding what the correct fix is.

As a logical consequence of following this rule, you are forced to introduce at least a certain number of distinct test functions given the behavior of a unit under test. If the unit has behavior such that it has two distinct expected outcomes, you write those as two distinct test functions and so on.

Examples of bad naming

def test_add_1_and_2():
    assert add_positive(1, 2) == 3


def test_add_big_numbers():
    assert add_positive(10, 20) == 30


def test_add_negative():
    with pytest.raises(NonPositiveArgument):
        add_positive(-1, 0)

Rewritten focusing on expected outcomes

# The first two tests had the same expected outcome, let's combine them.
@pytest.mark.parametrize(
    ("a", "b", "expected"),
    [(1, 2, 3), (10, 20, 30)],
)
def test_returns_sum_of_a_and_b(a, b, expected):
    assert add_positive(a, b) == expected


def test_raises_non_positive_argument():
    with pytest.raises(NonPositiveArgument):
        add_positive(-1, 0)

Use namespacing to avoid repetition in test names

Take the fully qualified name of a test into account when naming it. If the test lives in a namespace (class or module) that is dedicated to a single unit, do not repeat the name of the unit in each test. Rather prefer using the name of the test function to describe the expected outcome, and let the unit under test be clearly communicated by the name of the surrounding namespace.

Whether you use a class or a module for grouping the related tests together is not very important, but as applies to all kinds of software development, small components are preferred over large components.

All expected outcomes are tested

All behaviors of code under test are exercised by the test suite.

An exception to this is branches that are statically proven to not be reachable. This typically means they contain an assert_never(). Note that this is technically not an exception to the rule, as it is not really a behavior of the unit if it is proven that it cannot possibly be exercised.

Use tooling to enforce this, specifically enforce 100% test coverage with branch coverage enabled. Do not include non-unit testing in the coverage report.

Only the expected outcome is tested

For units with multiple expected outcomes, limit tests to only make assertions that prove the specific expected outcome. Do not test every aspect of the unit in every test.

Tests do not contain logic

Specifically, it is not OK to write tests that have conditional assertions. Split the test up into multiple test functions.

Put another way, this also means that any single test only exercises a single expected outcome of the unit under test.

Sometimes, the duplication of complicated test boilerplate code comes up as an argument against this rule. It is not a valid argument, which brings us to the next rule ...

Boilerplate is not duplicated

There's a plethora of available tools that can address this. Use them.

pytest fixtures.
Functions.
Context managers.

It's not OK to use this as an argument for clumping together distinct expected outcomes into monolithic tests per unit. However, it is also sound to accept a large degree of repetition in tests.

Prefer custom exceptions

Rather than using string matching on expected exceptions, prefer introducing custom exceptions.

Misc.

Do not catch AssertionError.
Use mocking.
- Always use specs.
- Always use spec_set=True.
Avoid patching, prefer dependency injection.
Use parametrization, but see the rule about logic in tests.
Use strict type checking in tests, this finds bugs.