[AoNW] Refactoring Multiplayer to Serverpod

Written by

in

, ,

The first multiplayer implementation in Age of New Worlds was not a mistake.

It was a proof of concepts.

I needed to learn where the hard parts really were: command dispatch, reconnects, snapshots, event offsets, fog of war, match lobbies, AI turns, and the strange little edge cases that only appear when a turn-based game is running across two devices instead of one.

That version used a custom Dart server, HTTP routes, WebSockets, JWT handling, PostgreSQL repositories, and a lot of glue code.

It worked well enough to answer the most important question:

Can this game have multiplayer at all?

The answer was yes.

But once the concept was proven, the shape of the problem changed.

Now I do not need “a server that works”. I need a server architecture that is boring in the right places, strict in the important places, and easy to evolve without leaving a trail of networking code across the Flutter app.

That is why I started moving multiplayer to Serverpod.

The New Shape

flowchart TB
    subgraph Flutter["Flutter / Flame Client"]
        UI["Lobby + HUD"]
        Riverpod["Riverpod providers"]
        Session["Session store"]
        Live["Live match subscription"]
        Renderer["Flame renderer"]
    end

    subgraph Core["aonw_core"]
        Commands["GameCommand"]
        Events["GameEvent"]
        Rules["Rules + reducers"]
        Protocol["Wire DTOs"]
        Save["Save + snapshot models"]
    end

    subgraph Serverpod["Serverpod Server"]
        Endpoints["Generated endpoints"]
        Streams["Two-way match streams"]
        Auth["Auth Core + custom account"]
        ORM["ORM models + migrations"]
        Reducer["Authoritative reducer"]
    end

    subgraph Data["PostgreSQL"]
        Matches["game_match"]
        Players["game_player"]
        EventsTable["game_event"]
        Snapshots["game_snapshot"]
        Accounts["account + auth user"]
    end

    UI --> Riverpod
    Riverpod --> Session
    Riverpod --> Live
    Live --> Streams
    Endpoints --> Auth
    Streams --> Auth
    Streams --> Reducer
    Reducer --> Core
    Serverpod --> ORM
    ORM --> Data
    Snapshots --> Live
    Live --> Renderer

The important split is simple:

  • Flutter owns presentation, input, local UX state, and rendering.
  • aonw_core owns deterministic game rules and protocol DTOs.
  • Serverpod owns transport, sessions, auth, ORM, migrations, health, and realtime streams.
  • PostgreSQL owns the authoritative match timeline.

This gives the project a cleaner boundary than the old implementation. The old server was full of useful ideas, but it also owned too many things manually: HTTP routing, JWT refresh, WebSocket lifecycle, SQL repositories, persistence shape, and operational glue.

Serverpod lets me delete a lot of that.

Why Serverpod Fits This Game

A 4X multiplayer game is not a chat app with a map.

A player can switch tabs, lock a phone, lose network, come back later, and still expect to see the same war, the same fog of war, the same turn, and the same pending decisions as everyone else.

That means realtime is not enough.

The architecture needs a durable answer to this question:

“What is the current authoritative match state, and which committed events led here?”

Serverpod streams are a good fit because they give me typed two-way communication without hand-rolling the WebSocket lifecycle. The client can send commands through the same live channel that receives snapshots, events, and ACKs.

A simplified endpoint shape looks like this:

Stream<MultiplayerServerMessage> connect(
  Session session,
  String matchId,
  int afterOffset,
  Stream<MultiplayerClientMessage> input,
) {
  final user = _requireUser(session);
  return _hub.connect(
    store: _store(session),
    userIdentifier: user.userIdentifier,
    matchId: matchId,
    afterOffset: afterOffset,
    input: input,
  );
}

That one method carries the actual game loop:

  • client connects with the last known offset,
  • server authenticates the session,
  • server sends the latest snapshot,
  • client sends commands,
  • server returns command ACKs,
  • other clients receive committed updates,
  • reconnect becomes a normal path, not an emergency patch.

The Reconnect Contract

sequenceDiagram
    participant A as Player A
    participant B as Player B
    participant S as Serverpod stream
    participant DB as PostgreSQL

    B->>S: connect(afterOffset: 0)
    S->>DB: load latest snapshot
    S-->>B: snapshot offset 0
    B--xS: app background / tab sleep

    A->>S: SubmitTurn command
    S->>DB: transaction: event + snapshot offset 1
    S-->>A: ACK offset 1 + snapshot
    S-->>A: live update offset 1

    B->>S: reconnect(afterOffset: 0)
    S->>DB: load latest snapshot
    S-->>B: snapshot offset 1
    Note over S,B: no duplicate replay for events already inside snapshot

This is one of the core rules of the refactor.

A reconnecting client should not need to reconstruct the world by hoping it saw every live event. It receives the latest snapshot first. Then it only needs events newer than that snapshot.

In code, that means the server does this:

emit(
  _message(
    matchId: matchId,
    offset: state.offset,
    match: state.match,
    snapshot: state.snapshot,
  ),
);

final backlogAfterOffset = afterOffset > state.offset
    ? afterOffset
    : state.offset;

final backlog = await store.listEvents(matchId, backlogAfterOffset);
for (final event in backlog) {
  emit(_message(matchId: matchId, offset: event.offset, event: event));
}

The small detail matters: after sending snapshot offset N, the server must not replay event N again as if the client still needed it.

That is the kind of bug that is easy to miss in manual testing and painful in a real game.

So now it has an integration smoke test.

Riverpod Became More Interesting Too

Serverpod changed the server side.

Riverpod changed how cleanly the Flutter side can react to the new server model.

The multiplayer UI is not just “logged in” or “not logged in”. It has several overlapping facts:

  • Is there a stored refresh token?
  • Is the current session still valid?
  • Does the session belong to the same display name?
  • Is there an active match?
  • Is the stream connected, reconnecting, or closed?
  • Did the latest server snapshot replace local state?
  • Should the UI resume a match or ask for login?

Riverpod gives me a place to model that as application state instead of spreading it across widgets.

The lobby can ask for a session:

Future<NetworkSession> ensureSession({required String displayName}) async {
  final current = currentSession();
  final stored = await loadStoredSession();

  if (_canReuseCurrentSession(
    current: current,
    stored: stored,
    displayName: displayName,
    now: now(),
  )) {
    return current!;
  }

  if (stored != null) {
    final refreshed = await _tryRefreshStoredSession(stored, now());
    if (refreshed != null) return refreshed;
  }

  throw const NetworkSignInRequiredException();
}

That exception is not a crash. It is a UI decision.

The lobby catches it and opens the account modal. After login or account creation, the same original action continues.

That is the part I like: authentication becomes part of the flow, not a separate screen that the rest of the game has to remember manually.

Command ACKs Through The Live Channel

The old implementation had a split personality:

  • HTTP command route for writes,
  • WebSocket for notifications.

That was a perfectly reasonable first design. It made retries and ACKs explicit.

But Serverpod streams give me a better option: keep the command and the resulting ACK on the same typed channel.

On the client, the live subscription keeps a queue of pending ACKs:

Future<WireCommandAck> sendCommand({
  required int afterOffset,
  required WireCommand wire,
  Duration timeout = const Duration(seconds: 10),
}) {
  final input = _input;
  if (_closed || input == null || input.isClosed) {
    throw TimeoutException('Live event stream is not ready for commands.');
  }

  final ack = Completer<WireCommandAck>();
  _pendingAcks.add(ack);

  input.add(
    sp.MultiplayerClientMessage(
      clientMessageId: _nextCommandClientMessageId(wire),
      lastSeenOffset: afterOffset,
      requestSnapshot: false,
      command: wire,
    ),
  );

  return ack.future.timeout(timeout);
}

This is calmer than opening a second transport path for the same action.

A command goes into the stream. The server validates it, reduces it, persists it, broadcasts it, and replies with an ACK.

The Server Still Owns The Truth

flowchart TD
    Command["WireCommand"] --> Auth["Authenticate session"]
    Auth --> Member["Check match membership"]
    Member --> Actor["Check actorPlayerId"]
    Actor --> Lock["Per-match queue"]
    Lock --> Snapshot["Load latest snapshot"]
    Snapshot --> Reduce["Run authoritative reducer"]
    Reduce --> Accepted{"accepted?"}
    Accepted -- no --> Reject["ACK rejected + current snapshot"]
    Accepted -- yes --> Persist["Transaction: event + snapshot"]
    Persist --> Broadcast["Broadcast committed update"]
    Persist --> Ack["ACK accepted + snapshot"]

The client can be responsive, but it does not get to invent reality.

If a command changes game state, the server must accept it first. That means:

  • authenticated account,
  • participant in the match,
  • correct player actor,
  • valid turn,
  • valid command preconditions,
  • deterministic reducer result,
  • persisted event,
  • persisted snapshot offset.

Only then is the update real.

That is the most important multiplayer rule in the project.

What Disappeared

This refactor is also a cleanup.

The new branch removes the old custom pieces that were useful during exploration but no longer belong in the final architecture:

  • custom HTTP client paths for multiplayer,
  • custom WebSocket client,
  • anonymous session store,
  • handwritten JWT service,
  • password hashing service,
  • REST match routes,
  • custom SQL repositories,
  • old WebSocket broadcaster,
  • tests for contracts that no longer exist.

That is not just deleting code for the sake of deleting code.

It removes alternate truths.

There should not be a REST version of match state, a WebSocket version of match state, and a Serverpod version of match state. There should be one multiplayer model.

The New Repository Map

aonw/
├── lib/
│   ├── api/
│   │   ├── session/      # Serverpod auth/session adapters
│   │   ├── transport/    # live stream and repository adapters
│   │   └── protocol/     # app-side codecs only
│   └── game/
├── packages/
│   ├── aonw_core/        # deterministic game rules + shared DTOs
│   └── aonw_server_client/ # generated Serverpod client
└── server/
    ├── bin/main.dart
    ├── lib/src/
    │   ├── auth/
    │   ├── generated/
    │   └── multiplayer/
    ├── migrations/
    └── test/

I especially like that lib/api/protocol is smaller now.

The canonical wire DTOs live in aonw_core. The Flutter app only keeps codecs that adapt app save/domain objects into those shared wire envelopes.

That is the kind of boundary that feels small, but prevents future mess.

Testing The Important Failure Mode

The integration test now checks the reconnect behavior directly:

test(
  'reconnects clients to the latest snapshot without duplicate replay',
  () async {
    final started = await _startTwoPlayerMatch(
      endpoints,
      ownerSession: ownerSession,
      guestSession: guestSession,
    );

    // Guest connects, receives snapshot offset 0, then disappears.
    await _connectUntilInitialSnapshot(
      endpoints.multiplayer.connect(
        guestSession,
        started.id,
        0,
        guestBeforeInput.stream,
      ),
      guestBeforeInput,
    );

    // Owner submits a command.
    final ack = await _submitTurnThroughStream(...);
    expect(ack.offset, 1);

    // Guest reconnects from old offset.
    final messages = await _connectUntilInitialSnapshot(
      endpoints.multiplayer.connect(
        guestSession,
        started.id,
        0,
        guestReconnectInput.stream,
      ),
      guestReconnectInput,
    );

    expect(messages.single.snapshot?.offset, 1);
    expect(messages.where((m) => m.event != null), isEmpty);
  },
);

This test is not about a button.

It is about the promise multiplayer makes to the player:

You can leave and come back. The game will still know where everyone is.

Operations Matter Earlier Than I Expected

With Serverpod, operational concerns become part of the architecture instead of a separate pile of scripts.

The plan now includes:

  • Serverpod health endpoints,
  • generated migrations,
  • Docker image build,
  • local Postgres integration smoke,
  • staging verification,
  • Serverpod Insights for logs and health metrics.

Serverpod Insights is useful here because multiplayer bugs are rarely just one stack trace. They are usually sequences:

  • account created,
  • login refreshed,
  • match joined,
  • stream opened,
  • command sent,
  • ACK returned,
  • client backgrounded,
  • stream closed,
  • reconnect happened,
  • snapshot offset compared.

Seeing that sequence in logs and health metrics is much better than guessing from the UI.

What I Learned

The first implementation was about proving the game could work online.

The Serverpod refactor is about making it a system I can trust.

That difference matters.

The prototype asked:

Can I push commands between players?

The refactor asks:

Can every player converge to the same authoritative state after real-world interruptions?

Phones sleep. Browsers pause tabs. Networks disappear. Players come back later. A turn-based 4X game should treat that as normal.

Serverpod gives me typed endpoints, generated clients, Auth Core, ORM models, migrations, health checks, and two-way streams.

Riverpod gives the Flutter side a clean way to coordinate account state, stream state, active matches, reconnect overlays, and snapshot application.

And aonw_core keeps the most important part honest:

The rules stay deterministic.

The server does not guess.

The client does not invent.

The match timeline is the source of truth.

That is the direction I want Age of New Worlds multiplayer to move in: less clever glue, more explicit contracts, fewer legacy paths, and a calmer architecture that can survive the boring chaos of real players.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *