[AoNW] The Test Layer: Keeping a 4X Game Honest

Testing a 4X game is strange.

A lot of the game is visible: the map, the units, the HUD, the fog of war, the city panels, the turn button. But the most important parts are not visual at all. They are rules, constraints, transitions, contracts, and boundaries.

Can this unit enter that tile?
Does fog remember discovered territory?
Does ending a turn process city production before research?
Can the server safely reject a stale command?
Can the presentation layer accidentally depend on infrastructure?
Can a save snapshot survive serialization and migration?

Those questions are where the game can quietly break.

So in Age of New Worlds, I do not think about tests as a layer added after the game. I think about them as a layer that keeps the architecture honest.

The goal is not to test every pixel. The goal is to make sure the important decisions of the game are explicit, repeatable, and protected from accidental drift.

The Shape of the Test Suite

The tests are split across the same boundaries as the project itself.

test/
├── architecture/
├── game/
├── map/
├── api/
├── editor/
├── shared/
└── l10n/

packages/aonw_core/test/
├── game/
├── map/
├── ai/
├── protocol/
└── domain/

server/test/
├── architecture/
├── auth/
├── domain/
├── http/
├── integration/
├── matchmaking/
├── persistence/
└── websocket/

That structure matters.

The Flutter app has tests for UI, presentation, local infrastructure, map rendering, editor workflows, command transport, and game-specific view models.

The shared core package has tests for rules, commands, events, AI, protocol objects, persistent game state, combat, research, city logic, and map validation.

The server has tests for authorization, projections, event filtering, persistence, routes, WebSockets, matchmaking, and full match flows.

This mirrors the architecture

flowchart TB
    Core["aonw_core tests. shared rules and protocol"]
    App["Flutter app tests. presentation, local infra, renderer"]
    Server["server tests. authority, projection, persistence"]
    Architecture["architecture tests. boundaries and constraints"]

    Core --> App
    Core --> Server
    Architecture --> Core
    Architecture --> App
    Architecture --> Server

The point is not only coverage. The point is that every layer is tested at the level where it is supposed to make decisions.

Testing Domain Rules Without Rendering

The most valuable tests are usually the least visual ones.

Movement is a good example. A player experiences movement through the map: clicking a unit, seeing reachable tiles, choosing a target, watching animation. But the rule itself is not a Flutter rule and not a Flame rule.

The movement cost rules can be tested directly:

test('uses base terrain cost for simple land tiles', () {
  expect(
    UnitMovementCostRules.costToEnter(
      const TileTerrainProfile(base: TerrainType.grassland),
    ),
    const MovementCost.passable(1),
  );

  expect(
    UnitMovementCostRules.costToEnter(
      const TileTerrainProfile(base: TerrainType.snow),
    ),
    const MovementCost.passable(3),
  );
});

That test does not need a widget tree. It does not need a renderer. It does not need a running game screen.

It only needs the rule.

This is the kind of testing I want as much as possible: small, direct, and close to the decision being made.

A 4X game has many rules that fit this style:

movement costs,
fog reveal,
city growth,
city founding,
production,
research,
combat,
territory expansion,
worker improvements,
technology unlocks,
turn processing.

If these rules are buried inside UI callbacks, they become hard to test. If they live in domain services and reducers, they become ordinary Dart code.

That is one of the reasons I care so much about architecture.

Fog of War Tests

Fog of war is a perfect example of a system that must be tested carefully because visual inspection is not enough.

The renderer can show a dark overlay, but that does not prove the rules are correct. The important questions are deeper:

Is visibility tracked separately per player?
Does discovered memory remain after a unit moves away?
Do cities reveal their center and controlled territory?
Does height affect vision range?
Are off-map units handled safely?
Are hidden tiles still hidden from the right player?

A simplified test looks like this:

test('preserves discovered memory after a unit moves away', () {
  final initial = const FogOfWarService().recompute(
    current: FogOfWarState.empty,
    mapData: map,
    playerIds: const ['player_1'],
    units: [GameUnit.startingCommander(ownerPlayerId: 'player_1')],
    cities: const [],
  );

  final moved = const FogOfWarService().recompute(
    current: initial,
    mapData: map,
    playerIds: const ['player_1'],
    units: [
      GameUnit.startingCommander(
        ownerPlayerId: 'player_1',
        col: 4,
        row: 4,
      ),
    ],
    cities: const [],
  );

  expect(
    moved.visibilityFor('player_1', const HexCoordinate(col: 0, row: 0)),
    FogVisibility.discovered,
  );
});

That test protects an important design decision: fog is not just visible or hidden. It has memory.

A player can remember discovered terrain even when it is no longer currently visible. That distinction affects rendering, inspection, multiplayer projection, enemy visibility, and AI knowledge.

This is exactly the kind of rule I do not want to validate by clicking around manually.

Testing Commands and Transitions

The game is command-driven, so command handling is one of the most important areas to test.

A command should be processed through the same path whether it comes from the HUD, the renderer, a local save, an AI player, or eventually the server.

That is why I test not only reducers, but also command transport contracts.

A local command dispatch has several responsibilities:

apply the command,
produce the next state,
append a logged command,
save the updated snapshot,
assign an event log offset,
use the clock port instead of wall-clock time.

In a test, that can be expressed with in-memory adapters:

final result = await transport.dispatch(
  saveId: save.id,
  currentState: GameState(
    units: [commander],
    activePlayerId: 'player_1',
    activePlayerCanAct: true,
  ),
  command: MoveUnitCommand(commander.id, 1, 0),
  context: const GameCommandContext(actorPlayerId: 'player_1'),
);

expect(result.offset, 1);
expect(result.state.units.single.col, 1);
expect(eventLog.commands.single.actorPlayerId, 'player_1');
expect(repository.snapshot.units.single.col, 1);

This test is not about movement alone. It is about the application contract around movement.

That is a different level of confidence.

The reducer can be correct, but the transport can still forget to append the event log. The state can update, but the snapshot can be stale. The save can work, but the timestamp can be nondeterministic.

Testing the contract catches those seams between layers.

Architecture Tests as Safety Rails

Some tests in the project do not test gameplay at all. They test architecture.

For example, the domain layer should not depend on Flutter, Flame, Riverpod, path_provider, shared_preferences, infrastructure, or presentation.

That is enforced with tests that scan imports:

test('game domain does not depend on outer game layers or UI frameworks', () {
  expect(
    _violations(
      roots: const ['lib/game/domain'],
      disallowed: const [
        _ImportRule.frameworks,
        _ImportRule.gameApplication,
        _ImportRule.gameInfrastructure,
        _ImportRule.gamePresentation,
      ],
    ),
    isEmpty,
  );
});

This kind of test may look unusual at first, but it is very useful.

Architecture usually decays slowly. Nobody wakes up and decides to ruin the boundaries. It happens through small shortcuts:

one UI import in the domain,
one infrastructure call from presentation,
one direct DateTime.now() in a reducer,
one debugPrint in game logic,
one raw modal bypassing the shared UI system.

The test suite catches those shortcuts before they become normal.

There are also tests for UI architecture rules, such as avoiding raw modals, raw box decorations, excessive callback drilling, duplicate selection widgets, missing maxLines, or raw paint usage in action palette code.

That sounds strict, but it protects a real goal: the codebase should stay coherent as it grows.

Testing the UI System

The UI layer has its own tests because the interface has its own architecture.

The shared UI system includes things like:

HudPalette,
GameUiTheme,
SurfaceElevation,
SurfaceShape,
BorderEmphasis,
ChipTone,
EpicButton,
EpicCardSurface,
GameModalScaffold.

These are not game rules, but they still matter. They keep the interface consistent.

A palette test is not about gameplay. It is about preventing accidental visual drift. A modal scaffold test is not about empire building. It is about ensuring every modal uses the same structure and behavior.

This is especially useful because Flutter makes one-off UI very easy. Without tests and shared components, the project can slowly accumulate several almost-identical visual systems.

The test layer helps prevent that.

Rendering Tests and Smoke Tests

I do not want every renderer behavior to be tested through screenshots. That would make the suite slow and brittle.

But rendering still needs coverage.

The project has tests for map rendering pieces, tile painters, marker layers, fog overlays, movement preview layers, city territory overlays, unit markers, particle layers, floating text, camera behavior, and renderer smoke tests.

These tests usually focus on specific contracts:

a layer accepts a view model,
a marker can be created,
a painter uses the expected colors,
a preview style maps reachable and unreachable states correctly,
a renderer can build without crashing,
a component updates when state changes.

The goal is not to assert every pixel. The goal is to catch broken assumptions.

Rendering is where many systems meet: state, assets, geometry, camera, visibility, and UI feedback. A thin layer of targeted tests keeps that integration from becoming mysterious.

Server Tests

The server has its own test layer because multiplayer changes the meaning of correctness.

In hotseat mode, a local command mostly needs to be valid and saved. In multiplayer, the server also has to answer questions like:

is this player allowed to act?
is this command stale?
has this tick already been processed?
should this event be visible to this player?
what should be removed from a projected snapshot?
can the command be replayed deterministically?
does the HTTP route return the right response?
does the WebSocket broadcaster send the right events?

That is why server tests cover command authorization, command reduction, snapshot projection, event visibility filtering, match routes, persistence, matchmaking, and full match flows.

The server is not just a transport detail. It is the authority for multiplayer.

So it gets tests at that level.

Shared Core Tests

The shared aonw_core package is tested separately because it is used by both the Flutter client and the Dart server.

That package contains common language: commands, events, protocol objects, persistent state, rules, AI helpers, map definitions, and game primitives.

Testing it independently gives me confidence that the client and server are not secretly speaking different dialects.

This is especially important for serialization:

command -> JSON -> command
event   -> JSON -> event
state   -> JSON -> state

If commands and events are part of the multiplayer protocol, their tests are not optional. They are compatibility tests.

Why I Avoid Testing Everything Through the UI

A common trap in application testing is to test too much through the top.

For a strategy game, that would mean opening the game screen, tapping tiles, pressing buttons, waiting for animations, and checking the visible result.

Some tests like that are useful. But if every rule is tested through the UI, the suite becomes slow, fragile, and hard to diagnose.

If a UI test fails after pressing “End Turn”, what broke?

city production?
research?
fog recomputation?
turn advancement?
selected unit refresh?
Riverpod state?
widget layout?
animation timing?

It is much better to test the turn pipeline directly, then have a smaller number of UI tests that prove the button is wired correctly.

That gives me a layered testing strategy:

flowchart TB
    Few["Few broad UI / integration tests"]
    Some["Some presentation and transport tests"]
    Many["Many domain and rule tests"]
    Guards["Architecture guard tests"]

    Few --> Some
    Some --> Many
    Guards --> Many
    Guards --> Some

The deeper the rule, the closer I want the test to be to that rule.

Tests as Design Pressure

The most useful thing about tests is not only that they catch regressions. They also pressure the design in the right direction.

If a rule is hard to test, maybe it is in the wrong place.

If a reducer needs Flutter to run, something is wrong.
If fog of war needs a renderer to be tested, something is wrong.
If movement needs a widget tree, something is wrong.
If server projection needs the client UI, something is wrong.

Good tests make bad boundaries uncomfortable.

That is one of the reasons I like writing tests for a project like this. The tests do not just verify the architecture. They shape it.

What I Still Want to Improve

The test layer is already broad, but there are areas I want to keep improving.

I want more scenario-style tests for complete turn flows: city growth, research, production, movement, fog, and events processed together.

I want more multiplayer replay tests, especially around command ordering and projections.

I want better asset validation tests so missing icons, sprites, or map files are caught early.

I want more AI simulation tests that run longer games and report balance problems, not just correctness failures.

I also want to keep UI tests focused. Flutter widget tests are useful, but I do not want the project to become dependent on fragile visual assertions for rules that belong in the domain.

The Lesson

For Age of New Worlds, the test layer is not one thing.

It is a set of protections around different kinds of truth:

domain tests protect rules,
reducer tests protect state transitions,
contract tests protect application ports,
infrastructure tests protect persistence and transport,
UI tests protect presentation behavior,
rendering tests protect visual integration,
architecture tests protect boundaries,
server tests protect multiplayer authority,
core tests protect shared language.

A 4X game has too many interconnected systems to rely on manual testing alone. Every new mechanic touches something else: movement affects fog, fog affects projection, projection affects multiplayer, multiplayer affects command validation, command validation affects UI feedback.

The test layer is how I keep that complexity from becoming guesswork.

It lets me change the game with more confidence. It lets me refactor without losing the rules. It lets the architecture stay visible.

And for a project like this, that matters as much as any single feature.

[AoNW] The Test Layer: Keeping a 4X Game Honest

The Shape of the Test Suite

Testing Domain Rules Without Rendering

Fog of War Tests

Testing Commands and Transitions

Architecture Tests as Safety Rails

Testing the UI System

Rendering Tests and Smoke Tests

Server Tests

Shared Core Tests

Why I Avoid Testing Everything Through the UI

Tests as Design Pressure

What I Still Want to Improve

The Lesson

Comments

Leave a Reply Cancel reply

More posts

[AoNW] Refactoring a Growing Flutter 4X Game Without Rewriting It

[AoNW] Shipping a Flutter Game to Multiple Platforms

[AoNW] Adding Gamepad Support to a Flutter 4X Game

[Peryhelium] Moving a 4X Game Domain From AoNW to Unity