RTS pathfinding: flow fields

I always wanted to build a real-time strategy game, and the part that kept pulling me back is unit movement. This post works through one idea from that world, flow field pathfinding, down to a simulation you can poke at. The simulation is Rust compiled to WebAssembly, the rendering is instanced WebGL2, and everything below runs live in the page. Claude Fable 5 did most of the work in this post. This is a second post using it. Previously I used Opus and Sonnet, like in the Gemini game art post.

loading wasm…

Drag anywhere to move the goal and the crowd reroutes mid-stride. Draw walls in their way, paint mud, switch the field views, and push the sliders around. On a phone all of it works with touch.

One field instead of ten thousand paths

The classic approach gives every unit its own path. Run A*, store the waypoints, follow them. That works until you have hundreds of units and a player who keeps clicking, because then every order is hundreds of searches and every map change is hundreds of path repairs. Elijah Emerson’s Game AI Pro chapter describes the fix used in Supreme Commander 2. Instead of giving each unit a path, the destination gets one field. Every cell of the map stores the direction to walk from there, and a unit standing anywhere just reads its cell and goes.

The cost of an order stops depending on how many units received it. The +1000 agents button above adds a thousand followers for free, because they share a field that already exists. The rest of this post is the machinery that makes the field worth following.

Three fields

What the demo calls a flow field is really three fields computed in sequence, with the exact layouts from Emerson’s chapter:

cost 8 bits · terrain cost, 255 = wall
integration 16 bits · integrated cost to goal flags · LOS, wavefront
flow 4 bits · direction LUT index flags · pathable, LOS
one entry per grid cell, per field
pub const WALL: u8 = 255;

pub const INT_COST_MASK: u32 = 0xFFFF;
pub const INT_LOS: u32 = 1 << 16;
pub const UNREACHABLE: u32 = 0xFFFF;

pub const FLOW_DIR_MASK: u8 = 0x0F;
pub const FLOW_PATHABLE: u8 = 1 << 4;
pub const FLOW_LOS: u8 = 1 << 5;
/// Direction index meaning "no direction" (goal cell, or nothing reachable).
pub const DIR_NONE: u8 = 0x0F;

The grid holding them is three flat vectors indexed by y * width + x, plus the goal:

pub struct FlowGrid {
    pub width: u32,
    pub height: u32,
    pub cost: Vec<u8>,
    pub integration: Vec<u32>,
    pub flow: Vec<u8>,
    pub goal: (u32, u32),
    /// Scratch heap, kept to avoid reallocating per compute.
    heap: std::collections::BinaryHeap<std::cmp::Reverse<(u32, u32)>>,
}

The toolbar’s Cost, Integration, and Flow views show each field directly. Cost is what you edit when you draw walls or mud. Integration is the orange gradient, the distance-to-goal landscape that everything else derives from. Flow is the arrows, and an arrow is all a unit ever sees.

Dijkstra wearing an Eikonal name tag

The integration field is computed by a wavefront expanding from the goal. Each cell’s integrated cost is the cheapest neighbor’s integrated cost plus the cell’s own cost field value:

while let Some(std::cmp::Reverse((dist, idx))) = self.heap.pop() {
    let idx = idx as usize;
    if dist > self.integration[idx] & INT_COST_MASK {
        continue;
    }
    let x = idx as i32 % w;
    let y = idx as i32 / w;
    for (dx, dy) in [(1, 0), (-1, 0), (0, 1), (0, -1)] {
        let (nx, ny) = (x + dx, y + dy);
        if nx < 0 || ny < 0 || nx >= w || ny >= h {
            continue;
        }
        let nidx = (ny * w + nx) as usize;
        let step = self.cost[nidx];
        if step == WALL {
            continue;
        }
        let nd = (dist + step as u32).min(UNREACHABLE - 1);
        if nd < self.integration[nidx] & INT_COST_MASK {
            self.integration[nidx] = nd;
            self.heap.push(std::cmp::Reverse((nd, nidx as u32)));
        }
    }
}

Here is the same loop watched from above. The wave starts at the goal, pays the cost field as it goes, and leaves the integrated cost behind:

the wavefront expanding from the goal · note the slow crawl across the mud and the diamond-ish shape of the front

Emerson’s chapter calls this an Eikonal equation method, citing Jeong and Whitaker’s fast iterative solver. The update rule it describes is the one above, which is a Dijkstra relaxation, not an Eikonal solve. The difference matters if you go looking for smoother fields. A true Eikonal solver treats the wave as a continuous front, while the discrete 4-neighbor minimum bakes the grid into the result. Worth knowing before you cite the chapter, which is otherwise the best production account of this architecture I found.

Once integration exists, the flow direction is a local decision. Each cell points at whichever 8-way neighbor has the lowest integrated cost, with one rule that prevents diagonal shortcuts through wall corners:

for (d, (dx, dy)) in DIR_OFFSETS.iter().enumerate() {
    let (nx, ny) = (x + dx, y + dy);
    if nx < 0 || ny < 0 || nx >= w || ny >= h {
        continue;
    }
    if self.is_wall(nx, ny) {
        continue;
    }
    if *dx != 0 && *dy != 0 && (self.is_wall(x + dx, y) || self.is_wall(x, y + dy))
    {
        continue;
    }
    let c = self.integration[(ny * w + nx) as usize] & INT_COST_MASK;
    if c < best {
        best = c;
        best_dir = d as u8;
    }
}
left: the center cell points at its cheapest neighbor · right: the cheapest neighbor is diagonally past a wall corner, so the pick falls to the next best

The diamond and the line of sight

A field built only from the pieces above has a visible defect near the goal. The 4-neighbor wavefront measures distance like a taxicab, so the contours come out diamond shaped and agents approach the goal in stair-step zigzags instead of straight lines. Emerson’s fix is a line-of-sight pass. Every cell with a clear Bresenham line to the goal gets a flag, and agents in flagged cells ignore the quantized arrows and steer exactly at the target.

/// Mark every reachable cell that has an unobstructed Bresenham line to
/// the goal. Agents in LOS cells steer straight at the goal, which kills
/// the diamond-shaped flow artifact near it. The line must cross only
/// cheapest-cost ground: a straight shot through mud may be longer in
/// integrated cost than the detour the flow field found, so expensive
/// terrain blocks LOS the same way a wall does.
fn line_of_sight_pass(&mut self, gx: u32, gy: u32) {
    let w = self.width;
    for y in 0..self.height {
        for x in 0..w {
            let idx = (y * w + x) as usize;
            if self.cost[idx] == WALL
                || self.integration[idx] & INT_COST_MASK == UNREACHABLE
            {
                continue;
            }
            if self.line_clear(x as i32, y as i32, gx as i32, gy as i32) {
                self.integration[idx] |= INT_LOS;
            }
        }
    }
}
the brightened cells hold the LOS flag · the wall blocks one sightline and the mud blocks another

In the Integration view of the live demo the flagged region is the same brightened area around the goal. Watch it reshape as you drag the goal past walls.

The doc comment above mentions terrain. That line comes from a bug. My first LOS pass checked only for walls, and agents inside the flagged region marched straight through mud that the planner was routing everyone else around. The straight line is only a shortcut when it crosses ground as cheap as the detour, so expensive terrain has to block line of sight just like a wall.

Mud

Walls are just the degenerate case of the cost field, the cells where cost is infinite. The Mud brush paints the in-between, cells with cost 8 that units can cross but would rather not. The wavefront passes through mud at eight times the price, the integration gradient bulges around it, and the flow arrows bend away. Paint a mud patch in front of a marching column and they part around it like a stream around a rock.

Cost changes where units walk. It should also change how fast they walk, otherwise mud is just an invisible fence. Each step samples the cost under the agent and scales its speed:

/// Movement slowdown from the terrain under the agent: expensive ground is
/// waded through at 1/sqrt(cost) pace (cost 8 mud ~ 35% speed), and leaving
/// it restores full speed since this is sampled per step from the cell.
#[inline]
fn terrain_mult(cost: u8) -> f32 {
    if cost <= 1 {
        1.0
    } else {
        1.0 / (cost as f32).sqrt()
    }
}
why the exponent is a square root: 1/cost would drop mud to 12% and look like a freeze

Wall the mud off so it is the only way through and the crowd wades in, crawls at a third of its pace, and accelerates back to full speed on the far side.

From particles to an army

Reading an arrow per cell moves dots around. Three more mechanisms make the dots read as units, and each one exists because something looked wrong without it.

The first is steering. An agent blends its velocity toward the field direction instead of snapping to it, with a separation force that keeps neighbors from stacking preemptively:

let flow = grid.flow[cell];
let (dx, dy) = if flow & FLOW_LOS != 0 && goal_dist > 1e-3 {
    (to_goal.0 / goal_dist, to_goal.1 / goal_dist)
} else {
    let dir = flow & FLOW_DIR_MASK;
    if dir == DIR_NONE {
        (0.0, 0.0)
    } else {
        DIR_VECTORS[dir as usize]
    }
};
the agent does not snap to the field, it leans toward it · press play to watch the velocity swing onto the field direction while the overlapped pair separates

The second is body collision. Steering forces alone let bodies interpenetrate the moment a crowd compresses, so after integration a position-based pass pushes every overlapping pair apart by half the overlap, sliding along walls. Without it, a column squeezing through a gap looks like ghosts passing through each other.

The third is knowing when to stop, and it is the one that surprised me. If arriving units simply brake near the goal, the units behind keep pushing at full speed and crush the crowd into the center. What the code actually implements is a three-state machine per agent. A unit is moving, becomes stalled after 0.35 seconds of getting nowhere, and becomes parked either by reaching the goal or by being stalled against a unit that is already parked. A new goal resets everyone to moving. My first version skipped the stalled state and let parking spread on contact alone. That froze entire marching columns, because units in a column touch nose to tail, so the parked state propagated backwards through units that were moving perfectly well:

// Arrival contagion: touching a parked agent that is closer to the
// goal (by integrated cost, so the chain can't jump across a wall)
// parks us too, but only once we are genuinely blocked (stalled),
// so a flowing column doesn't freeze tail-first. This is what makes
// a crowd pile up zergling-style instead of grinding into the center.
const STALL_TO_PARK: f32 = 0.35;
let ci = grid.integration
    [grid.idx(self.pos[2 * i] as u32, self.pos[2 * i + 1] as u32)]
    & INT_COST_MASK;
let cj = grid.integration
    [grid.idx(self.pos[2 * j] as u32, self.pos[2 * j + 1] as u32)]
    & INT_COST_MASK;
if self.arrived[j] == 1
    && self.arrived[i] == 0
    && self.stall[i] >= STALL_TO_PARK
    && ci >= cj
    && ci - cj <= 4
{
    self.arrived[i] = 1;
} else if self.arrived[i] == 1
    && self.arrived[j] == 0
    && self.stall[j] >= STALL_TO_PARK
    && cj >= ci
    && cj - ci <= 4
{
    self.arrived[j] = 1;
}
the arrival state machine, live · press play and watch the queue stall and park · every flash on the right is one agent taking that transition

Jdx ran into the group-movement wall from the other side. His flow field units rounding a corner “tended to cluster quickly and end up forming into a single/double file line, like a line of ants,” and he eventually shelved flow fields for waypoint paths with formation offsets. The collision and parking layers above are what keep this demo’s columns from collapsing into that ant line. StarCraft 2 layers much more on top of its fields, boids-style local avoidance and unit pushing among it, and that is a deep topic with its own literature.

The three unit types are mostly for the eye, but sizes and speeds are coupled, smaller means faster, and the variety slider widens both spreads together. Crank it and watch a marching column sort itself, small units threading ahead while the heavies fall back. The sorting is not coded anywhere, it falls out of the speed difference plus body collision.

The seam between Rust and the page

The simulation is a 65 KB wasm binary with a plain C ABI. There is no wasm-bindgen and no generated glue. The interface is small enough that hand-rolling it is less machinery than keeping a binding generator and its CLI in sync across updates. All state lives behind one opaque handle:

pub struct State {
    grid: FlowGrid,
    agents: Agents,
}

/// # Safety
/// Handles returned by `ff_new` must only be used with these functions and
/// freed exactly once with `ff_free`.
#[no_mangle]
pub extern "C" fn ff_new(width: u32, height: u32) -> *mut State {
    let state = State {
        grid: FlowGrid::new(width, height),
        agents: Agents::new(),
    };
    Box::into_raw(Box::new(state))
}

#[no_mangle]
pub unsafe extern "C" fn ff_free(s: *mut State) {
    drop(Box::from_raw(s));
}

Everything else the page needs is a pointer into wasm linear memory. The Rust side hands out the address of a Vec’s buffer:

/// Interleaved x,y per agent. Fetch after every call that may grow the agent
/// list, and re-view after any wasm memory growth.
#[no_mangle]
pub unsafe extern "C" fn ff_positions_ptr(s: *mut State) -> *const f32 {
    (*s).agents.pos.as_ptr()
}

and the JavaScript side wraps it in a typed array view over the wasm memory, without copying:

/** Writable u8 view of the cost field (255 = wall). Call compute() after edits. */
get cost() {
  return this.#view(Uint8Array, this.exports.ff_cost_ptr(this.handle), this.cells);
}

Painting a wall in the demo is JavaScript writing a byte into the cost field through that view. The doc comments carry the two caveats this approach costs. A Vec reallocates when it grows, and memory.grow detaches every existing ArrayBuffer, so views are fetched fresh instead of cached.

Rendering rides the same pointers in the other direction. Each frame, the agent positions, headings, radii, and kinds go straight from simulation memory into GPU buffers, and the whole crowd draws as one instanced call, three vertices times however many thousand agents:

const count = positions.length / 2;
if (count > 0) {
  gl.useProgram(agentProg);
  gl.bindVertexArray(agentVao);
  gl.bindBuffer(gl.ARRAY_BUFFER, posBuf);
  gl.bufferData(gl.ARRAY_BUFFER, positions, gl.STREAM_DRAW);
  gl.bindBuffer(gl.ARRAY_BUFFER, velBuf);
  gl.bufferData(gl.ARRAY_BUFFER, headings, gl.STREAM_DRAW);
  gl.bindBuffer(gl.ARRAY_BUFFER, radiusBuf);
  gl.bufferData(gl.ARRAY_BUFFER, radii, gl.STREAM_DRAW);
  gl.bindBuffer(gl.ARRAY_BUFFER, kindBuf);
  gl.bufferData(gl.ARRAY_BUFFER, kinds, gl.STREAM_DRAW);
  gl.drawArraysInstanced(gl.TRIANGLES, 0, 3, count);
}

The build is one cargo invocation with the artifact committed, so deploys never need a Rust toolchain:

#!/bin/bash
# Builds the flow field sim to wasm and copies the artifact into dist/, which
# is committed (the deploy image builds from git archive and must not need a
# Rust toolchain). Run after any change under src/.
set -euo pipefail
DIR="$(cd "$(dirname "$0")" && pwd)"

cargo build --manifest-path "$DIR/Cargo.toml" --release --target wasm32-unknown-unknown
mkdir -p "$DIR/dist"
cp "$DIR/target/wasm32-unknown-unknown/release/flowfield.wasm" "$DIR/dist/flowfield.wasm"
ls -la "$DIR/dist/flowfield.wasm"

What a click costs, and where this stops scaling

The stats line under the demo reports the field rebuild time, about 3 ms for this 128x80 grid on my desktop, most of it in the per-cell Bresenham of the LOS pass. Every goal drag and every painted wall pays it for the whole map.

That limit defines where the single-grid approach is the right tool. One field per goal, maps up to a few hundred cells across, changes that arrive at human speed. Tower defense, swarm and crowd toys, roguelike mob movement, anything where one horde chases one target. Within that envelope the approach is hard to beat for the effort. The whole implementation here is about 1,200 lines of Rust and 450 of JavaScript glue and rendering, sitting in scripts/flowfield as a self-contained MIT-licensed crate you can lift.

Past that envelope the costs compound. Bigger maps make every rebuild slower, and every simultaneous goal wants its own field. Emerson’s chapter spends most of its pages there, and the production answer is hierarchy. Split the map into sectors connected by portals, search the portal graph first, build fields only for the sectors a path crosses, and cache finished fields keyed by portal so different orders share work. If you outgrow the single grid, start with the chapter.