Unit tests are an important scaffold for large-scale software development; they enable us to design, write, and deploy production code with confidence by validating that software will behave as expected. Even though they may not execute in live systems, their development and maintenance requires the same care as general production code. Sometimes developers do not realize this which leads to testing code with more code smells than production code. Engineers may not give enough attention to test code changes in the code review process. However, most of the time the test code reflects the health of the production code. If the test code has some code smells, this can be a sign that the production code can be improved. In this post, I’m going to mention some of the best practices to keep unit test code clean and maximize the benefits they provide.
单元测试是大规模软件开发的重要基础。 它们通过验证软件的行为是否符合预期,使我们能够放心地设计,编写和部署生产代码。 即使它们可能无法在实时系统中执行,它们的开发和维护也需要与常规生产代码相同的注意。 有时开发人员没有意识到这一点,这导致测试代码比生产代码具有更多的代码味道 。 工程师可能不会在代码审查过程中对测试代码更改给予足够的重视。 但是,大多数情况下,测试代码反映了生产代码的运行状况。 如果测试代码具有某种代码味道,则可能表明可以改进生产代码。 在本文中,我将提到一些最佳实践,以保持单元测试代码的清洁并最大程度地发挥它们的作用。
The best practices for unit testing are debated topics in the industry. In practice, however, projects and teams should align on key concepts in order to foster code consistency and ease-of-maintenance. I’m going to mention the meaning of unit testing in the Object-Oriented design world, characteristics of a unit test, naming conventions for unit tests, and when we should or should not use mocking. We can have or find many different answers/approaches for these concepts, and the relevance of different trade-offs may vary depending on the situation.
单元测试的最佳实践是业界争论不休的话题。 但是在实践中,项目和团队应该在关键概念上保持一致,以促进代码的一致性和易维护性。 我将提到面向对象设计世界中单元测试的含义,单元测试的特征,单元测试的命名约定以及何时或不应该使用模拟。 对于这些概念,我们可以找到或找到许多不同的答案/方法,不同的权衡取舍的相关性可能会根据情况而有所不同。
单元测试 (Unit Testing)
To define unit testing, we should first define the unit. Once the unit has been defined, then we can define unit testing as testing the behaviors of a unit. Let’s try it for Object-Oriented software development methodology. Classes are the main building block of software designed with the Object-Oriented design paradigm. Then we can think that the class concept is the unit of Object-Oriented software, and unit testing involves independent testing of behaviors of a class by the developer who implements these behaviors.
要定义单元测试,我们应该首先定义单元。 一旦定义了单元,我们就可以将单元测试定义为测试单元的行为。 让我们尝试一下面向对象的软件开发方法。 类是使用面向对象设计范例设计的软件的主要构建块。 然后我们可以认为类概念是面向对象软件的单元,而单元测试则涉及实现这些行为的开发人员对类行为的独立测试。
A behavior and method relation of a class may not be 1:1. Sometimes a class can have more than one method to implement a behavior that is unit tested as a whole. Sometimes more than one class can be used to implement a behavior that’s unit tested. However, sometimes it can be a sign of code smell, like temporal coupling, if you use more than one public method or class to implement a unit tested behavior.
一个类的行为和方法关系可能不是1:1。 有时,一个类可以具有多个方法来实现作为一个整体进行单元测试的行为。 有时可以使用多个类来实现经过单元测试的行为。 但是,如果您使用多个公共方法或类来实现单元测试的行为,则有时可能是代码异味的征兆,例如时间耦合 。
单元测试的特征 (Characteristics of a Unit Test)
When developing unit tests, some key considerations include: How fast should a unit test be? How often should a unit test be run? Which kind of object methods are valid or not valid for unit testing? How should we structure a unit test? What kind of assertions should we make in a unit test? Let’s think about the answers to these questions.
在开发单元测试时,一些关键的考虑因素包括:单元测试应该有多快? 单元测试应该多久运行一次? 哪种对象方法对单元测试有效或无效? 我们应该如何构建单元测试? 我们应该在单元测试中做出什么样的断言? 让我们考虑一下这些问题的答案。
Some expected characteristics of unit tests include: fast execution times in order to provide immediate feedback about implementation correctness, readability in order to clearly express the behavior that’s tested, consistency and predictability of results through the use of deterministic evaluations, and robustness to structural changes (i.e., refactoring) in the implementation.
单元测试的一些预期特征包括:快速执行时间以提供有关实现正确性的即时反馈;可读性以清楚地表达所测试的行为;通过使用确定性评估来保证结果的一致性和可预测性;以及对结构更改的鲁棒性(即重构)。
速度 (Speed)
Developers expect unit tests to run quickly because they are generally executed frequently during the development process. We typically run unit tests whenever we make a change to our code in order to get immediate feedback about whether something is broken or not. Speed is a relative concept, but as Martin Fowler said in his article, “But the real point is that your test suites should run fast enough that you’re not discouraged from running them frequently enough”.
开发人员期望单元测试能够快速运行,因为它们通常在开发过程中经常执行。 通常,只要更改代码,我们通常都会运行单元测试,以便立即获得有关是否损坏的反馈。 速度是一个相对的概念,但是正如Martin Fowler在他的文章中所说:“但真正的意义是您的测试套件应该运行得足够快,以免您不鼓励他们频繁地运行它们”。
Having fast unit tests requires continuous care along the life cycle of our codebase, but we can also have some rules that help us to create fast unit tests when we name/tag a test as a unit test. Michael Feather mentioned some of this kind of rules in his article:
拥有快速的单元测试需要在代码库的整个生命周期中进行持续的维护,但是当我们将测试命名/标记为单元测试时,我们还可以使用一些规则来帮助我们创建快速的单元测试。 迈克尔·费瑟(Michael Feather)在他的文章中提到了一些此类规则:
“A test is not a unit test if:
“在以下情况下,测试不是单元测试:
- It talks to the database它与数据库对话
- It communicates across the network它通过网络通信
- It touches the file system它触及文件系统
- It can’t run at the same time as any of your other unit tests它不能与您的任何其他单元测试同时运行
- You have to do special things to your environment (such as editing config files) to run it.您必须对环境做一些特殊的事情(例如编辑配置文件)才能运行它。
Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.”
做这些事情的测试还不错。 通常,它们很值得编写,并且可以在单元测试工具中编写。 但是,将它们与真实的单元测试分开是很重要的,这样我们就可以保留一组测试,以便我们在进行更改时可以快速运行。”
行为测试与结构测试 (Behavioral Testing vs Structural Testing)
In OOP, the structure of an object refers to the specific order and manner in which the production code implementing that object uses its dependent methods or classes. Since the structure of an object is related to the way that the production code is written, it can generally be considered an implementation detail. Structural testing involves testing these implementation details.
在OOP中,对象的结构是指实现该对象的生产代码使用其依赖方法或类的特定顺序和方式。 由于对象的结构与生产代码的编写方式有关,因此通常可以将其视为实现细节。 结构测试涉及测试这些实现细节。
Let’s see an example of structural testing vs behavioral testing: we have an Order class and orders can be cancelled, there are some rules to check if an order is cancellable or not and these rules are executed by OrderSpecification.
让我们看一个结构测试与行为测试的示例:我们有一个Order类,可以取消订单,有一些规则检查订单是否可以取消,这些规则由OrderSpecification执行。
And we have a test class to test this cancellation scenario:
我们有一个测试类来测试这种取消情况:
In the above test class we both check that the order is cancelled and that the OrderSpecification method is called exactly once. However, the specific way in which OrderSpecification is used is an internal implementation detail of the order cancel code, so the above test would be considered an example of structural testing.
在上面的测试类中,我们都检查了订单是否已取消,并且OrderSpecification方法仅被调用一次。 但是,使用OrderSpecification的特定方式是订单取消代码的内部实现细节,因此上述测试将被视为结构测试的示例。
Let’s see the behavioral test code for this scenario:
让我们看看这种情况下的行为测试代码:
In the above test class we only care about the behavior of order cancellation, not its internal implementation details.
在上面的测试类中,我们仅关注订单取消的行为,而不关注其内部实现细节。
We expect two things from production code: one is “doing the right thing”, the other one is “doing the thing right”. Unit tests should focus on the former, i.e., the behavior produced by the production code, which is one level of abstraction above the implementation details. So, as Kent Beck says in his article, “Programmer tests should be sensitive to behavior changes and insensitive to structure changes. If the program’s behavior is stable from an observer’s perspective, no tests should change.”
我们期望生产代码中有两件事:一件事是“做正确的事”,另一件事是“做正确的事”。 单元测试应该关注前者,即生产代码所产生的行为,这是实现细节之上的抽象级别。 因此,正如肯特·贝克(Kent Beck)在他的文章中所说:“程序员测试应该对行为更改敏感,而对结构更改不敏感。 如果从观察者的角度来看程序的行为是稳定的,则不应更改任何测试。”
Why? When we think of the benefit, cost, and maintenance dimensions of unit testing, it’s not hard to see that structure-sensitive tests create more friction rather than provide safety. Agile development teams change the structure of code continuously as they refactor, and fixing many brittle tests that are not related to any behavior after refactoring is a very tiring and discouraging process.
为什么? 当我们想到单元测试的好处,成本和维护维度时,不难发现对结构敏感的测试会产生更多的摩擦而不是提供安全性。 敏捷开发团队在重构时会不断更改代码的结构,并且在重构后修复许多与任何行为无关的脆弱测试是一个非常累人和令人沮丧的过程。
For example, let’s say we change the method signature for the isCancellable method within the OrderSpecification class from using the Orderclass as an argument to using the OrderStatus class:
例如,假设我们将OrderSpecification类中isCancellable方法的方法签名从使用Order类作为参数更改为使用OrderStatus类:
In such a situations, the expected behavior of our code has not really changed, but the following test will start to fail because of our verification that depends on the method’s signature:
在这种情况下,我们的代码的预期行为并没有真正改变,但是由于我们的验证取决于方法的签名,因此以下测试将开始失败:
Unfortunately, mocking libraries make this kind of structure testing very easy to write, so we should use their structure verification functions with caution. Of course, there can be some exceptional cases where we have to rely on some structural testing instead of behavioral testing to achieve some level of confidence with our system. For example, if a real implementation is too slow to use, or too complex to build, then we may use structural testing as verifying invocation of some functions with mocking. Another case can be related to orders of the function call, like checking cache hit/miss, in some cases a cache miss may have financial costs, let’s say we call a paid API in case of a cache miss, and we may use structural testing to verify cache methods are called or not. But these should be exceptional, not our default choice.
不幸的是,模拟库使这种结构测试非常容易编写,因此我们应谨慎使用其结构验证功能。 当然,在某些特殊情况下,我们必须依靠某些结构测试而不是行为测试来对我们的系统达到一定的置信度。 例如,如果实际实现使用起来太慢或构建起来太复杂,那么我们可以使用结构测试作为通过模拟来验证某些功能的调用。 另一种情况可能与函数调用的顺序有关,例如检查高速缓存命中/未命中,在某些情况下,高速缓存未命中可能会产生财务成本,假设在高速缓存未命中的情况下我们调用付费API,并可能使用结构测试验证是否调用了缓存方法。 但是这些应该是例外,而不是我们的默认选择。
我们应该为所有类编写单元测试吗? (Should we write unit tests for all classes?)
No, because classes have different kinds of behaviors. Some classes have behaviors directly for business logic related to requirements of our domain, while other classes have behaviors that are related to application/system-level requirements, like transaction, security, observability, etc.
不,因为类具有不同的行为。 一些类的行为直接与我们领域的需求相关的业务逻辑,而其他类的行为则与应用程序/系统级需求有关,例如事务,安全性,可观察性等。
We separate classes that have different kinds of behaviors using stereotypes, i.e. different categories of responsibilities. We use Domain-Driven Design (DDD) concepts in some of our projects, and DDD tactical design has some stereotypes for classes, such as aggregate root, entity, value objects, domain service, application service, repository, etc.
我们使用刻板印象(即不同类别的责任)将具有不同种类行为的类分开。 我们在某些项目中使用域驱动设计(DDD)概念,并且DDD战术设计对类有一些构造型,例如聚合根,实体,值对象,域服务,应用程序服务,存储库等。
Let’s examine the application service case; application services are like gateways to our domain model from the outside world, as we see in the diagram below:
让我们研究一下应用程序服务案例; 应用程序服务就像是从外部世界访问我们的域模型的网关,如下图所示:

Application services handle application-level requirements (e.g., security, transactions, etc.) while routing requests from the outside world (which can be anything that’s not directly related to our domain model, like the web layer, RPC layer, a storage access layer, etc.) to our domain model. There is no business logic in application services, and their code mostly consists of direct calls to our domain model. If we try to write unit tests for these classes, there is nothing to verify from a behavior perspective; we can only verify interactions between them and the domain model. But we mentioned this is structural testing, and we don’t prefer these kinds of tests generally. So, we don’t prefer writing unit tests for DDD application services.
应用程序服务在处理来自外部世界的请求(同时可以是与我们的域模型不直接相关的任何事物,例如Web层,RPC层,存储访问层)时处理应用程序级别的需求(例如,安全性,事务等)。等)添加到我们的域模型中。 应用程序服务中没有业务逻辑,它们的代码主要由直接调用我们的域模型组成。 如果我们尝试为这些类编写单元测试,那么从行为的角度来看,没有什么可以验证的。 我们只能验证它们与领域模型之间的相互作用。 但是我们提到这是结构测试,我们一般不喜欢这类测试。 因此,我们不希望为DDD应用程序服务编写单元测试。
Then, how can we test these application services? There are different kinds of testing styles other than unit testing, and we think that integration tests that use these application services in the test flow are better alternatives for the application services of DDD.
然后,我们如何测试这些应用程序服务? 除单元测试外,还有多种测试样式 ,我们认为在测试流程中使用这些应用程序服务的集成测试是DDD应用程序服务的更好替代方案。
单元测试的结构 (Structure of a Unit Test)
Generally, a unit test has three parts; setting pre-condition, taking action on the object, and making the verification. These are the Arrange/Act/Assert (or alternatively, Given/When/Then as used in Behavior Driven Development (BDD) testing). Applying this kind of structural style to our unit tests increases their readability.
通常,单元测试包含三个部分: 设置前提条件,对对象采取行动并进行验证。 这些是“安排/执行/声明”(或替代地,在行为驱动开发(BDD)测试中使用的“给定/何时/然后”)。 将这种结构样式应用于我们的单元测试可以提高其可读性。
Sometimes we can ignore the `Arrange` part if we don’t need to set anything before the `Act` part, but we should always have the `Act` and `Assert` parts when writing a unit test.
有时,如果我们不需要在Act部分之前进行任何设置,则可以忽略Arrange部分,但是在编写单元测试时,我们应该始终具有Act和Assert部分。
We can see an example of these Arrange/Act/Assert parts below:
我们可以在下面看到这些Arrange / Act / Assert部分的示例:
使用命名约定 (Using a Naming Convention)
The name of a unit test is important because it directly affects code readability. Unit tests should be readable because we should easily understand what is broken in our system when a unit test fails. We should also understand the behavior of our system when reading unit tests because people come and go in a project.
单元测试的名称很重要,因为它直接影响代码的可读性。 单元测试应该是可读的,因为当单元测试失败时,我们应该很容易理解系统中发生了什么。 在阅读单元测试时,我们还应该了解系统的行为,因为人们会来一个项目。
Some programming languages allow us to use plain language as method names, for example with Kotlin we can write below test method:
一些编程语言允许我们使用普通语言作为方法名称,例如,使用Kotlin,我们可以编写以下测试方法:
Some testing frameworks, like JUnit, provide a tag(@DisplayName) for this purpose if we can’t use the method names.
如果我们不能使用方法名称,则某些测试框架(例如JUnit)为此提供一个标记 (@DisplayName)。
There are different naming conventions to name unit tests. Teams can align on a standard naming convention that members find most readable; alternatively, other teams may allow team members to use the most appropriate names for their tests instead of using a standard naming convention. In our last Kotlin project, we used the convention “Should ExpectedBehavior When StateUnderTest”.
有不同的命名约定来命名单元测试。 团队可以按照成员最容易阅读的标准命名约定进行调整; 或者,其他团队可以允许团队成员使用最合适的名称进行测试,而不使用标准的命名约定。 在上一个Kotlin项目中,我们使用了约定“ StateUnderTest时应该有预期的行为”。
模拟 (Mocking)
We use mock objects to replace real implementations that our production code depends on in a test with the help of libraries like Mockito, MockK, Python unittest.mock, etc. Using mock objects makes it easy to write a more focused and cheap test code when our production code has a non-deterministic outside dependency.
在诸如Mockito , MockK , Python unittest.mock之类的库的帮助下,我们使用模拟对象来代替生产代码在测试中依赖的真实实现。使用模拟对象可以轻松地编写更具针对性和廉价的测试代码。我们的生产代码具有不确定的外部依赖关系。
For example, we mock a repository class that finds orders of a customer by status in the test code below:
例如,我们模拟了一个存储库类,该类通过以下测试代码中的状态查找客户的订单:
Using mocks is not a silver bullet, and overusing mocking can cause some problems. For example, when using mocking, writing the stub code needed to program the behavior the mock can expose implementation details or structure of the underlying system being tested. As we mentioned before, this makes our tests brittle when the structure is changed. Test code with mocking is harder to understand when compared to test code without mocking, because of additional code required. Mocking can also cause false-positive tests because the behavior of real implementations can change, but our mock implementations may be out of date.
使用模拟并不是灵丹妙药,过度使用模拟会导致一些问题。 例如,在使用模拟时,编写对行为进行编程所需的存根代码,从而可以暴露模拟的实现细节或底层系统的结构。 如前所述,这在更改结构时使我们的测试变得脆弱。 与没有模拟的测试代码相比,带有模拟的测试代码更难以理解,因为需要附加代码。 由于真实实现的行为可能会发生变化,因此模拟还会导致误报测试,但是我们的模拟实现可能已过时。
Mocking can be an appropriate choice for dependencies involving external systems. For example, mocking a repository class that communicates with a database, mocking a service class that calls another service/application over the network, and mocking a service class that writes/reads some files to/from disk, can make sense. If we can use a real implementation then we should use it instead of a mock.
对于涉及外部系统的依赖项,模拟可能是适当的选择。 例如,模拟与数据库通信的存储库类,模拟通过网络调用另一个服务/应用程序的服务类以及模拟将某些文件写入磁盘或从磁盘读取文件的服务类都是有意义的。 如果我们可以使用真实的实现,则应该使用它而不是模拟。
If we can’t use a real implementation, mocking is not our only option. We can also use fake objects, these are much simpler and lighter weight, only test-purpose implementations of the functionality provided by the production code. For example, implementing a test scenario that has complex conditions on its given part can be simpler with using fake objects instead of mock objects.
如果我们不能使用真实的实现,那么模拟并不是我们唯一的选择。 我们还可以使用伪造的对象 ,这些伪造的对象更加简单,轻便,仅是生产代码提供的功能的测试目的实现。 例如,使用假对象而不是模拟对象,可以简化在给定部分上具有复杂条件的测试方案。
结论 (Conclusion)
Unit tests should be considered a first-class citizen when writing production code in order to maximize their benefits. We should let our unit tests drive our production code’s design and readability by applying some best practices that we mentioned:
在编写生产代码时,单元测试应被视为一流的公民,以最大程度地发挥其作用。 我们应该让单元测试通过应用我们提到的一些最佳实践来驱动生产代码的设计和可读性:
- Align about the meaning of unit testing concepts at least within the team/project.至少在团队/项目中就单元测试概念的含义保持一致。
- Keep your unit tests fast.保持单元测试快速。
- Make behavioral tests instead of structural tests.进行行为测试而不是结构测试。
- Decide to write unit tests for a class according to responsibilities of the class.根据班级的职责决定编写班级的单元测试。
- Align about the structure of a unit test.调整单元测试的结构。
- Align about a naming convention for unit tests or use a free naming convention as depending on the code review process.对齐单元测试的命名约定,或根据代码检查过程使用免费的命名约定。
- Use mocks with caution, don’t prefer to make structural testing with them.谨慎使用模拟,不希望对它们进行结构测试。
致谢 (Acknowledgments)
Thanks to my colleagues who reviewed this post and provided invaluable feedback.
感谢我的同事审阅了这篇文章并提供了宝贵的反馈意见。
翻译自: https://medium.com/udemy-engineering/unit-testing-best-practices-f877799f6dfd