Text rendering without glyph bitmaps

2026-06-10 · text-rendering, harfbuzz, webgl, wasm, emscripten, fonts

Text rendering is the most complex part of many applications I have worked on. I spent years around DirectWrite and Skia, and at one point I cached the laid-out result of 256 lines at a time just to keep scrolling smooth, paying for it in memory. It is a niche topic with very high expectations, because everyone stares at text all day and notices everything.

This spring the field changed, so I built a small text editor to see how far the new approach carries. It runs below, in this page, and every glyph you see is rasterized from its outline curves by a fragment shader. There is no glyph bitmap anywhere, no atlas, no cached mask.

You can finally use this

The short version for developers: rendering glyph curves directly on the GPU was patented for twenty years, and since March 2026 it is free, including commercial use. Eric Lengyel, whose Slug library set the quality bar for text in games, dedicated his patent to the public domain and published reference shaders under MIT. The older Microsoft patent on the Loop-Blinn technique expired the same month.

Two weeks later, Behdad Esfahbod of HarfBuzz fame shipped HarfBuzz 14.0 with a new library called hb-gpu. It encodes glyph outlines into a compact texture and ships fragment shaders in GLSL, WGSL, MSL, and HLSL that compute the coverage per pixel. Everything in this post stands on those two releases.

An editor with no glyph bitmaps

Click into the canvas and type. Arrows, selection, clipboard, scrolling, and the font picker all work.

font loading wasm…

loading wasm + font…

The pipeline is short. HarfBuzz shapes the text, the same shaping every browser does. The hb-gpu encoder walks each glyph outline once, packs the quadratic curves into a blob of RGBA16I texels, and that blob goes into one texture shared by all glyphs. Drawing is one quad per glyph, and the fragment shader casts rays through each pixel to compute exact coverage at any size or transform. The whole CPU side, per glyph, is this:

shim.c › encode-glyph

EMSCRIPTEN_KEEPALIVE
int hbe_encode_glyph(hbe_ctx_t *ctx, unsigned gid) {
  hb_gpu_draw_glyph(ctx->draw, ctx->font, gid);

  hb_glyph_extents_t ext = {0, 0, 0, 0};
  hb_blob_t *blob = hb_gpu_draw_encode(ctx->draw, &ext);
  if (!blob) return -1;

  if (ctx->last_blob) hb_gpu_draw_recycle_blob(ctx->draw, ctx->last_blob);
  ctx->last_blob = blob;
  ctx->extents[0] = ext.x_bearing;
  ctx->extents[1] = ext.y_bearing;
  ctx->extents[2] = ext.width;
  ctx->extents[3] = ext.height;

  unsigned blen = 0;
  hb_blob_get_data(blob, &blen);
  return (int)blen;
}

I compile HarfBuzz with that shim using Emscripten, one emcc command over the release tarball. The result is about 620 KB of wasm that shapes, encodes, and hands me the shader source as strings. My fragment shader is just a main appended to what hb-gpu provides:

renderer.mjs › fragment-main

const FRAG_MAIN = `
uniform vec4 u_foreground;     // straight alpha
uniform float u_stem_darkening; // 0 or 1
uniform float u_gamma;
uniform float u_debug;          // 0 or 1: per-pixel curve counts

in vec2 v_texcoord;
flat in uint v_glyphLoc;

out vec4 fragColor;

void main () {
  float cov = hb_gpu_draw (v_texcoord, v_glyphLoc);
  vec4 c = vec4 (u_foreground.rgb * u_foreground.a, u_foreground.a) * cov;

  /* Adjust edge coverage only, like the reference demo. */
  if (cov > 0.0 && cov < 1.0) {
    float adj = cov;
    if (u_stem_darkening > 0.0) {
      float brightness = c.a > 0.0 ? dot (c.rgb, vec3 (1.0 / 3.0)) / c.a : 0.0;
      adj = hb_gpu_stem_darken (adj, brightness, hb_gpu_ppem (v_texcoord, v_glyphLoc));
    }
    if (u_gamma != 1.0)
      adj = pow (adj, u_gamma);
    c *= adj / cov;
  }

  if (u_debug > 0.0) {
    ivec2 counts = _hb_gpu_curve_counts (v_texcoord, v_glyphLoc);
    float r = clamp (float (counts.x) / 8.0, 0.0, 1.0);
    float g = clamp (float (counts.y) / 8.0, 0.0, 1.0);
    fragColor = vec4 (r, g, c.a, max (max (r, g), c.a));
    return;
  }

  fragColor = c;
}
`;

hb_gpu_draw is the coverage evaluation. hb_gpu_stem_darken and the gamma line matter more than they look, and they get their own section below.

The editor is an invisible textarea

Rendering is only half of an editor. The other half is input, and if you come from the native side you may expect pain: keyboard layouts, dead keys, IME composition, clipboard. The browser already solved all of that for one element, so the trick is to park an invisible 1 px textarea on top of the WebGL2 canvas and keep focus in it. This is the actual markup and CSS of the editor above:

HbEditor.astro › input-proxy-dom

<div class="editor-wrap">
  <canvas id="canvas"></canvas>
  <textarea
    id="input-proxy"
    autocomplete="off"
    autocorrect="off"
    autocapitalize="off"
    spellcheck="false"></textarea>
  <p class="hb-loading">loading wasm + font…</p>
</div>

HbEditor.astro › input-proxy-css

/* Invisible but focusable: receives keyboard, IME, and clipboard events. */
.editor-wrap { position: relative; }
#input-proxy {
  position: absolute; left: 0; top: 0; width: 1px; height: 1px;
  opacity: 0; border: none; padding: 0; resize: none; outline: none;
  overflow: hidden; pointer-events: none;
}

Clicking the canvas calls textarea.focus(), and the caret only blinks while the textarea holds focus. From then on typing, diacritics composition, and all three clipboard operations arrive as plain DOM events and get drained into the document model:

editor.mjs › hidden-textarea

// Regular typing (including dead keys resolving outside composition).
textarea.addEventListener('input', () => {
  if (composing) return;
  if (textarea.value) {
    doc.insert(textarea.value);
    textarea.value = '';
    changed();
  }
});

textarea.addEventListener('compositionstart', () => {
  composing = true;
});
textarea.addEventListener('compositionend', (ev) => {
  composing = false;
  textarea.value = '';
  if (ev.data) {
    doc.insert(ev.data);
    changed();
  }
});

// --- Clipboard (events fire on the focused textarea) ---

function selectedText() {
  const range = doc.selectionRange();
  return range ? doc.text.slice(range[0], range[1]) : '';
}

textarea.addEventListener('copy', (ev) => {
  ev.preventDefault();
  ev.clipboardData.setData('text/plain', selectedText());
});
textarea.addEventListener('cut', (ev) => {
  ev.preventDefault();
  ev.clipboardData.setData('text/plain', selectedText());
  doc.deleteBackward(); // deletes the selection
  changed();
});
textarea.addEventListener('paste', (ev) => {
  ev.preventDefault();
  const t = ev.clipboardData.getData('text/plain');
  if (t) {
    doc.insert(t.replace(/\r\n?/g, '\n'));
    changed();
  }
});

This is not my invention. It is how xterm.js and the Monaco editor in VS Code capture input too, and it is the part I would have never guessed coming from desktop text stacks.

Layout is one array per line

Between input and rendering sits a small document model, plain logic with no DOM and no GL in it, which is also why it runs under node:test. Layout splits the text on newlines, shapes each line once, and groups the shaped glyphs into cluster runs, spans of text that produced one indivisible piece of output. Advances accumulate into pen positions, and the font’s ascender and descender place each baseline:

doc.mjs › cluster-runs

// Group glyphs into cluster runs (LTR: clusters non-decreasing).
// Each: line-local [index, nextIndex) span, pen x, total advance, and
// the first gid (the ligature glyph, for GDEF caret lookup).
const clusters = [];
let x = 0;
for (const g of shaped) {
  const last = clusters[clusters.length - 1];
  if (last && last.index === g.cluster) {
    last.advance += g.xAdvance * s;
  } else {
    clusters.push({ index: g.cluster, x, advance: g.xAdvance * s, gid: g.gid });
  }
  x += g.xAdvance * s;
}
const width = x;
for (let i = 0; i < clusters.length; i++) {
  clusters[i].nextIndex = i + 1 < clusters.length ? clusters[i + 1].index : lineText.length;
}

From the cluster runs, layout produces one array per line called boundaries: every position the caret may occupy, paired with its x coordinate. All the editing behavior is a lookup in that array. Arrows take the neighboring entry, a click snaps to the nearest x, selection rectangles span two entries, Home and End are its first and last. The only interesting question left is what happens when a boundary falls inside a single glyph.

The caret has to stop inside a ligature

The editor starts in EB Garamond on purpose. Type “ffi” and it becomes a single glyph, then press ←: the caret stops inside the ligature, twice. Select half of it and one glyph renders in two colors. Here is that situation, rendered live with the caret sitting between the f and the i of one ffi glyph:

This is a known sore spot. DirectWrite’s hit testing snaps to whole clusters, so editors built on it struggle to put a caret inside a ligature, and two-color rendering of one glyph is close to impossible with its layout API. The fix is small. Caret stops come from grapheme boundaries via Intl.Segmenter, not from shaping clusters, and the x positions inside a ligature come from the font’s GDEF caret table when it has one, or from dividing the advance evenly, which is what Chromium does:

doc.mjs › ligature-carets

const boundaries = [{ index: start, x: 0 }];
let ci = 0;
for (const b of graphemeEnds) {
  while (ci < clusters.length && clusters[ci].nextIndex <= b) ci++;
  if (ci >= clusters.length || b <= clusters[ci].index) {
    // Boundary at (or before) a cluster start: x is the cluster pen x,
    // or full width when past the last cluster.
    const c = clusters[ci];
    boundaries.push({ index: start + b, x: c && b <= c.index ? c.x : width });
    continue;
  }
  // Boundary strictly inside cluster ci: j-th of n graphemes.
  const c = clusters[ci];
  const inside = graphemeEnds.filter((e) => e > c.index && e < c.nextIndex);
  const n = inside.length + 1;
  const j = inside.indexOf(b) + 1;
  const carets = font.ligCarets(c.gid);
  const offset = carets.length >= n - 1 ? carets[j - 1] * s : (c.advance * j) / n;
  boundaries.push({ index: start + b, x: c.x + offset });
}

The two-color selection costs fifteen lines: draw the text once in the normal color, then draw the same glyph buffer again in white with a GL scissor clipped to the selection rect. Shaping never reruns and the ligature never breaks apart.

The font picker holds two more data points. Inter has no f-ligatures at all, by design. Fira Code builds its programming ligatures from per-character glyphs, so clusters never merge and a caret cannot get trapped there in the first place.

Why the correct rendering looks wrong

The first time I rendered text with mathematically correct coverage it looked wrong. Thin, gray, weaker than the same text in the DOM next to it. This is a known effect rather than a bug: browsers deliberately darken text, because physically correct blending of dark-on-light text reads as anemic. Skia documents its tricks in a design doc honestly titled The Raster Tragedy, and FreeType tells the same story from the Linux side.

hb-gpu ships stem darkening for the same reason, and I added one more knob, a gamma applied to the edge coverage only. Judge it yourself: the same text and the same font file rendered by hb-gpu, canvas 2D fillText, and the DOM, at the small sizes where quality is actually contested.

font stem darkening gamma 0.75

hb-gpu (WebGL2, no atlas)

canvas 2D fillText (opaque)

DOM text

loading wasm…

On my monitor, gamma between 0.7 and 0.8 makes the hb-gpu pane nearly indistinguishable from the DOM pane, and 0.75 is now the default in this post. That is calibrated against one screen and one pair of eyes, so use the sliders.

A benchmark where a cache decides everything

Speed claims about text rendering are usually wrong by omission. A browser caches rasterized glyphs by font and size, so a benchmark where sizes repeat measures the cache, and one where they never repeat measures the miss path. I know this trap from the inside, because my 256-lines-at-once layout cache was the same bet at a different layer, and it worked until the memory bill arrived.

So this benchmark animates text size every frame and lets you pick the world. Continuous sizes never repeat and defeat the glyph cache by construction. Quantized sizes snap to whole pixels and recur every cycle, so the cache works. Pipelines run one at a time and the metric is delivered frame intervals, which is what you actually see.

My numbers below come from this setup:

Samsung Odyssey OLED G8 G80SD at 4K and 240 Hz
NVIDIA GeForce RTX 5070 Ti
AMD Ryzen 9 3900X

At 14,352 glyphs per frame:

scenario	hb-gpu	canvas 2D	DOM
quantized zoom, one size per frame	240 fps, 0% missed	240 fps, 0% missed	249 fps, 0% missed
continuous zoom, one size per frame	240 fps, 0% missed	163 fps, 45% missed	145 fps, 69% missed
quantized, 156 distinct sizes per frame	240 fps, 0% missed	42 fps, 89% missed	67 fps, 70% missed
continuous, 156 distinct sizes per frame	240 fps, 0% missed	5 fps, 100% missed	7 fps, 100% missed

The third row is the interesting one. Quantization restored the cache hits and canvas still fell to 42 fps, because 156 sizes alive at once need more glyph masks than the browser’s cache budget holds, so it keeps evicting entries it will need again a moment later. That is the memory limit I kept hitting with my own caches years ago, measured from the outside. The hb-gpu column is flat because nothing in it is keyed by size, though my first version still dropped 2% of frames to per-frame allocations in my own JavaScript before I removed them.

Two caveats. The 156-size case is adversarial, with real relatives in map labels and zoomable canvas UIs. And the table above is a desktop GPU with fragment shading to spare. On my phone the crossover moves: hb-gpu stays ahead only in the cache-hostile modes, and on SwiftShader, Chrome’s software GL fallback, the table inverts outright, because a ray-casting fragment shader on a CPU is this design’s worst case. Run the benchmark on your own device.

Where this leaves text

At normal UI scale my Perfetto traces show all three pipelines under 0.2 ms per frame, and Chromium’s CPU-rasterized glyph masks remain the quality benchmark for small static text. What changed in March is that the other architecture is simply available now, one wasm build and two shader strings away from any browser. hb-gpu ships the same shaders in WGSL, so a WebGPU follow-up is mostly a port of the buffer bindings, and I will probably not resist it for long.