Tests
The Shopfoo.Product.Tests project demonstrates how to test domain workflows. Tests exercise workflows through the IProductApi entry point β the same boundary used by the real application β making them integration-style tests that validate the full wiring from API to data layer.
Test Stack
TUnit
Test runner
Unquote
Quotation-based assertions
NSubstitute
Mocking external HTTP clients
FsCheck
Property-based testing with custom generators
Test Organization
The project mirrors the domain structure:
π Shopfoo.Product.Tests/
βββπ Examples.fs β Shared test data (books, authors, products)
βββπ Data/
β βββπ OpenLibraryShould.fs
βββπ Workflows/
βββπ Types.fs β Test-specific types and helpers
βββπ ApiTestFixture.fs β Test harness with DI and mocks
βββπ AddProductShould.fs
βββπ DetermineStockShould.fs
βββπ GetProductShould.fs
βββπ GetPurchasePricesShould.fs
βββπ MarkAsSoldOutShould.fs
βββπ ReceiveSupplyShould.fs
βββπ SavePricesShould.fsEach workflow has its own test file β the file tree is a map of tested features, reflecting the convention used in the production code.
Test Naming
Test classes are named {Feature}Should, and each test method completes the sentence started by the class name. This produces human-readable test names when the runner displays them as {Class}.{Method}:
DetermineStockShould...deduct sales from initial stockSavePricesShould...reject invalid RetailPrice
The main benefit is that "should" is factored out β no need to repeat it in every test name. The drawback is less flexibility in phrasing test names, since they must all grammatically follow "should".
This convention works particularly well with use case tests, where the feature name starts with a verb: AddProductShould, DetermineStockShould, SavePricesShouldβ¦ In this case, the class name naturally maps to the When in BDD terms β the action being exercised. It is less literal for non-use-case classes like OpenLibraryShould, but the resulting phrases still read correctly.
Prefer "given" over "when" to describe preconditions β e.g. reject supply given quantity is zero rather than reject supply when quantity is zero. This aligns with the Given/When/Then convention from BDD (Gherkin): the use case name already captures the When, so the test method describes the expected outcome (Then) followed by the precondition (Given). It reads slightly less naturally but is more rigorous.
Here is the full test list, illustrating how the naming convention scales across the project. Parameterized tests appear as child nodes under the parent test name, with argument values in parentheses.
Test Fixture
The ApiTestFixture is the central test harness. It builds a dependency injection container that combines:
Production dependencies: the real
IProductApiwiring viaAddProductApi()Test dependencies: program mocks via
AddProgramMocks(), in-memory repositories, and NSubstitute mocks for external HTTP clients (IFakeStoreClient,IOpenLibraryClient)
Tests create a fixture with the initial data they need, then call fixture.Api to exercise workflows. This approach tests the complete chain β from the API surface through workflow execution to the data layer β while keeping external dependencies mocked.
TDD Style: Outside-In Diamond
This testing strategy follows the Outside-In Diamond approach (also called Outside-In Classicist), which combines two ideas:
Outside-In: tests enter through the outermost public boundary β here
IProductApiβ rather than testing internal components (workflows, pipelines) in isolation. This validates the full integration: instruction wiring, undo strategies, and data-layer access.Classicist / Diamond: internal collaborators use real implementations (the actual
Apiclass, real pipelines, in-memory repositories) rather than mocks. Only the system boundary is mocked β external HTTP clients (IFakeStoreClient,IOpenLibraryClient) that call third-party APIs.
This results in a diamond-shaped test distribution: few unit tests at the bottom, a thick layer of integration tests in the middle (through the API), and few end-to-end tests at the top. The shape contrasts with the classic test pyramid (many unit tests, fewer integration tests) and the London-school approach (mock every collaborator).
The Product test project illustrates this shape well:
Unit tests: only one β
OpenLibraryShouldβ which tests a pure utility (key sanitizing) in isolation.Integration tests: all the workflow tests in the
Workflows/folder, exercising the full chain throughIProductApi.End-to-end tests: none in the Shopfoo repository. E2E tests typically involve the full deployed stack (browser, server, database) and are generally better managed by a dedicated QA team β though nothing prevents the development team from writing them as well.
The main benefits are:
Confidence: tests cover the real wiring, catching integration issues that isolated unit tests would miss.
Refactoring safety: internal code can be restructured freely β only the
IProductApicontract matters.Few mocks: less test setup, less coupling to implementation details.
Assertions with Unquote
Unquote's key feature is step-by-step reduction: on failure, it evaluates the quoted expression incrementally, revealing intermediate values rather than a generic "expected X but got Y" message. It provides two assertion styles:
actual =! expectedβ reads as "should equal". Simpler and more concise, suitable for straightforward equality checks:test <@ boolean-expression @>β more verbose but more flexible: it supports combining multiple assertions in a single quoted expression (with&&):The caveat is that the reduction stops at the first
falsesub-expression β subsequent assertions are not evaluated.
Full-state and multiple assertions
For best practices on constructing full expected values and combining multiple assertions in a single check, see Better assertions in Tips & Tricks.
Example-Based Tests
Most tests follow a straightforward pattern: set up a fixture, call the API, and assert on the result using Unquote:
Parameterized Tests
TUnit offers several ways to provide data to the tests. The simplest is the [<Arguments>] attribute β equivalent to xUnit's [<InlineData>] β but it accepts only constant values. For domain types, see Parameterized tests: mirror enums with active patterns in Tips & Tricks.
Property-Based Tests
FsCheck is used for testing domain invariants that must hold for all valid inputs. For example, the purchase price average calculation is tested with mathematical properties:
Active patterns as lightweight generators
FsCheck generators can be verbose for constrained domain types. A lighter alternative is to use active patterns β see Active patterns as lightweight FsCheck generators in Tips & Tricks.
Compare this with generating a valid Book, which requires a custom generator composing multiple fields β a valid ISBN (with checksum), a list of authors (each with a valid OLID), tags, etc.:
Validation Testing
Generating invalid domain values is even harder than generating valid ones. The AddProductShould test class tackles this with a FieldIssueType discriminated union that models each possible validation error, and a FieldIssue record that pairs the expected error with a function to inject the issue into a valid product:
FsCheck generates a NonEmptySet<FieldIssueType> β an arbitrary combination of issues β and the test folds them onto a valid product to produce an invalid one, then verifies that all expected errors are reported:
This approach requires substantial scaffolding β the NullOrWhitespace and TooLong wrapper types, the Extensions module converting each issue type to a FieldIssue with the right field name, max length, and product updater β but it ensures that the validation { } applicative CE correctly collects all errors rather than stopping at the first one.
Saga Tests
Saga and cancellation scenarios are tested in the Shopfoo.Program.Tests project using a dedicated Order domain. These tests are documented in the Workflows page.
Going Further: Mutation Testing
Property-based testing assesses test quality β do the tests verify meaningful properties? Another complementary technique is mutation testing: it introduces small changes (mutations) into the production code and checks whether the test suite detects them. While code coverage is a quantitative metric (how much code is exercised), mutation testing is qualitative (how well do the tests actually catch regressions).
Stryker is the most popular mutation testing framework for .NET. In practice, mutation testing is more commonly used in C# codebases, whereas property-based testing is more prevalent in F#. The two techniques are not mutually exclusive β combining them provides stronger confidence in test quality when the domain warrants it.
Key Takeaways
Test through the API boundary: workflows are tested via
IProductApi, not by calling programs directly. This validates the full wiring including instruction preparation and undo strategies.In-memory repositories: data-layer implementations are replaced with simple in-memory stores, keeping tests fast and deterministic.
External clients are mocked: only HTTP-based external dependencies (
IFakeStoreClient,IOpenLibraryClient) use NSubstitute β everything else is a real implementation with in-memory storage.Property-based testing for domain rules: FsCheck validates mathematical properties and validation completeness, complementing example-based tests.
Last updated