Unit tests are code that is written to help ensure expected behavior of application code. Unit test code is entirely developer-facing and isolated from code delivered to the end user.
What is a “unit”?
The “unit” part of “unit testing” refers to the smallest bit of logic that you can apply and assert expectations around—which is usually best thought of as a function or class method.
The idea is to test lots of “units” of your code individually, making explicit indications about what the units are supposed to be doing, so that when everything is operating together as a whole, it should result in a more stable end product.
Each unit can have any number of individual tests that should each be entirely independent of one another that describe and assert expected behavior for a single concern, usually by artifically creating conditions, running the code within those conditions, then making assertions about the results.
What should be tested?
In short, you should test the custom business logic that you write. You should not be writing tests or making assertions for third party/dependency code (you should only install dependency code that you feel is sufficiently tested for your purposes).
You can prioritize what code gets tested by how pervasive it is in your codebase—for example base classes that get extended upon a lot, or frequently used functions—or by its complexity and/or how crucial it is to the functionality of your application.
Backend code tends to be pretty straightfoward to unit test, where on the frontend it gets a little harder to draw a line between what gets tested or not with the inclusion of display and style code. Leveraging testing tools that support snapshotting, like Jest does, is ideal, but if that option is unavailable, it’s usually best to ignore display/style code.
Where should tests go?
This is usually determined by the programming language and/or the specific framework in use. Unit tests should live in their own files, separate from the source code they are testing, and if you have the option, keeping the tests in a matching file alongside the source with a _test
or .test
suffix is recommended, as it keeps things easy to find. For example, if you have a file at src/actions/calendar.js
, make a unit test file for it at src/actions/calendar.test.js
.
An alternative—which is particularly common in the Python world—is to use a tests/
directory in the root of a project, in which case you should mirror your application code path and file names in your tests
directory. For example, if you have app/artists/views.py
, make tests/app/artists/views_test.py
.
What is TDD? BDD? Should I use them?
TDD is Test-Driven Development, and BDD is Behavior-Driven Development, both of which are approaches to development and testing that are meant to steer the design of your code interface.
These are not required to write tests, and provide different value for different developers, but they can be helpful when you know what your code needs to do, but are unsure of how to compose it.
Searching these terms will turn up lots of articles on how to leverage them if you want to give them a try.
Writing Unit Tests
Here are some guidelines for writing unit tests.
Keep it Simple
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?
— Brian Kernighan
This is good advice for all code-writing, and if you’re writing simple code, you should be able to write simple tests. If you’re finding your code and/or your tests getting very complicated or difficult to understand, there’s a very good chance another developer, or you in the future, will feel the same way, so it’s a good indicator that you should probably break out more “units” to make things simpler.
You should also never get to a point where your unit tests are complicated enough to require unit tests of their own—they should be relatively straightforward and aim toward human readability.
Focus on a Singular Concern
Keep each unit test focused on a very specific scenario, with a very specific expected outcome. Tests are typically paired with a descriptor that looks something like: “it should do X when Y”, and isolating them this way helps to keep it simple, it makes points of failure much more obvious (your test runner should show a very clear message about “it should do X when Y” failing) and it helps avoid false positives by isolating conditions.
Isolate Tests
Test suites should be isolated by their platform/language/application—for example backend tests should live alongside their application code, separately from frontend tests.
Individual unit tests should be runnable by themselves, or with every other test, on your machine, other developers’ machines on your team, in a continuous integration environment, in different timezones, etc., over and over with the same results.
No Network or Service Access
Your tests should not use network connections (like calling web-based APIs) or services like databases, etc. This de-complicates and speeds up the running of your tests, supports focusing on a singular concern, minimizes duplicate test effort, and eliminates having to worry about, for example, accruing charges for an API or services that your code uses.
For example when using a database, you probably aren’t writing a database connection client or an ORM from scratch, but you are probably using a client or ORM that is already unit tested, so you should not have to make assertions around its behavior that already exist somewhere else. Instead, you can mock or stub those things and make assertions around how they are implemented in your own code.
There are some frameworks and platforms that don’t subscribe to this precisely, like Django, which creates a test database for you, wiping and resetting it between unit tests. It’s “free” and only adds the requirement of having whatever database software installed (Postgres, MySQL, etc.) on the machine running the tests. Some other services follow a pattern of providing “dummies” for test contexts—Django Haystack being an example, so that you don’t add the requirement of Elasticsearch (and Java) or Solr for running tests.
Mocking & Stubbing
Key techniques to isolating unit tests are mocking and stubbing. There are notable differences between mocking and stubbing—intermingled with other concepts like spies and fakes—but the main purpose they serve is to provide a mechanism for swapping out and controlling dependency code from within a unit being tested—”dependency” in this context meaning installed 3rd party dependency code, as well as other “unit” code that the unit being tested is using from within your application.
For example, say we have a function to retrieve a user from the database by username that looks like this:
// users.js
const models = require('./models')
const getUserByUsername = username => models.User.findOne({
where: {
username: {
$iLike: username
},
is_active: true
}
})
This example uses a promise-based ORM, Sequelize, so we just have a simple, almost unnecessary unit test to verify it’s calling the ORM correctly:
// users.test.js
const { assert } = require('chai')
const rewire = require('rewire')
const sinon = require('sinon')
// Note the use of `rewire`, not `require`, which is a tool that allows us to
// write over things in the file in question for mocking and stubbing in tests
const users = rewire('./users')
describe('User Utilities', function () {
describe('getUserByUsername', function () {
it('should query for a user by username', function () {
// Create some fake data
const username = 'someuser'
const user = { id: 123 }
// Create a mock of `models`, with a stub of `findOne` on the User model,
// which is the only thing being called in this scenario
const models = {
User: {
findOne: sinon.stub().resolves(user)
}
}
// This uses `rewire` to swap out the `models` variable for our stubbed `models`
const revert = users.__set__({ models })
return users
.getUserByUsername(username)
.then(result => {
// Assert that the result we got is our fake user
assert.deepEqual(result, user)
// These types of assertions can sometimes be overkill, but are
// great when you want to confirm specific implementation details
sinon.assert.calledOnce(models.User.findOne)
sinon.assert.calledWith(models.User.findOne, {
where: {
username: {
$iLike: username
},
is_active: true
}
})
// This resets `models` back to what it was originally, so that our
// stubbing/mocking isn't in place in other tests
revert()
})
})
})
})
This example is in Javascript and uses the following tools:
- Mocha: A Javascript test runner that provides the
describe
andit
functions globally - Chai: An assertion library, which provides a lot of tools for confirming that output is what it is expected to be
- Sinon: This is a library for spies, stubs, and mocks.
- Rewire: This is a package that allows you to rewire
require
-ed things with, for example, stubs and mocks
This is only the tip of the iceberg of the functionality of mocks and stubs, but the basic purpose is that you can artificially create conditions that you want to test easily by faking the functionality of dependencies, without having to—in this case—have a database, or a users table, or a user with a matching username, etc.
When using mocks and stubs, you need to be mindful of changing the composition of functions/classes that may be mocked/stubbed in any tests in your codebase. If you make changes to the code being mocked/stubbed, you also need to locate and update any corresponding mocks and stubs in your tests.
Tests As Documentation
A good way to think of tests is as example code and as another form of documentation. Striving to keep things human-readable and self-documenting means you’re probably keeping it simple, making it easier to maintain your code, and making it easier for other developers to adopt and maintain your code.
Things to Avoid
Conditional logic
As a person writing code, this may feel counter-intuitive, but conditional logic in unit tests can tend to muddy the waters and make things more complicated than they need to be. If you are writing singularly focused unit tests, there is rarely a valid case for if
/else
statements.
Automation
The code you write for unit tests should be pretty straightforward, and be far from complex enough to warrant unit tests on your unit tests, so things like generating unit tests from within loops and things like that should be avoided.
Instead, leverage helper functions and such to avoid redundancy—like loading fixture data, or things like factory functions for generating model instances for testing, etc.
Lots of mocks/stubs
This is a good indicator that you might not be keeping it very simple and is a good sign that you might want to break the unit into smaller, less complex components.
Anatomy of a Unit Test
The typical flow of a single unit test should look something like this:
- Set up any required test data
- Set up any required mocks/stubs
- Run the “unit” code
- Make assertions about the result
- Clean up and revert any mocks/stubs
To re-use and mark up the example above:
it('should query for a user by username', function () {
// 1. Set up any required test data
const username = 'someuser'
const user = { id: 123 }
// 2. Set up any required mocks/stubs (these tend to use the test data above)
const models = {
User: {
findOne: sinon.stub().resolves(user)
}
}
const revert = users.__set__({ models })
// 3. Run the "unit" code
return users
.getUserByUsername(username)
.then(result => {
// 4. Make assertions about the result
assert.deepEqual(result, user)
sinon.assert.calledOnce(models.User.findOne)
sinon.assert.calledWith(models.User.findOne, {
where: {
username: {
$iLike: username
},
is_active: true
}
})
// 5. Clean up and revert any mocks/stubs
revert()
})
})
Test Coverage
Test coverage is a measure of code in your entire codebase that is run during unit testing. Test coverage requires additional tooling and configuration integrated with your test runner, and the output is in the form of a percentage, indicating the amount of your code that is run during tests, and usually is divided into categories of code covered: statements, branches, functions, and lines.
Test coverage percentages should not necessarily be perceived as an indicator of quality, and it is generally not recommended to strive for a particular coverage percentage as that can lead developers to “game” their test code to meet thresholds. As the first sentence of this section states, coverage is an indication of code run in unit testing, but doesn’t really provide insight into the quality of unit tests or assertions. As Martin Fowler suggests, you should aim for a “high level of coverage”, and coverage reports can be a good indicators of spots in your codebase that could use more testing.