Native Node.js addons: wrapping native code four ways
You found a native library you want in Node, or you have CPU work that would freeze the event loop. Either way you are writing a native addon. This post wraps the miniz deflate compressor as a clean async API.
The compression is incidental. The binding does the real work. It turns a Buffer into native bytes, runs off the event loop and returns a Promise, frees a native handle without leaking or double-freeing it, and ships without a compiler. Those jobs are the same whatever you wrap, so they are what this post compares, the same four ways: NaN, the ABI-stable Node-API in C++, napi.rs in Rust, and WebAssembly. Every snippet is lifted from the scripts/addon-example workspace, which builds and tests all four.
The API we want
miniz in C for the C++ and wasm builds, and flate2 (its default backend is miniz_oxide, miniz’s Rust port) in Rust.
import { compressAsync, Deflater } from "fast-deflate";
// one-shot, runs off the event loop
const packed = await compressAsync(input, { level: 6 });
// a streaming handle that owns native state
const d = new Deflater({ level: 9 });
const out = Buffer.concat([d.push(chunkA), d.push(chunkB), d.finish()]);
The compression sits in a plain function per language. Each binding is a thin wrapper around it.
Type marshalling
Getting bytes across the boundary. Node-API and napi.rs hide the copy behind a typed Buffer. NaN is the same in raw V8, which is why it is legacy. WebAssembly has no Buffer, so you copy through linear memory by hand.
// `info` carries the JS call: info[0] is the first argument, info.Env() the
// current environment. Pull an optional { level } from the second argument.
static int32_t read_level(const CallbackInfo &info, size_t idx) {
int32_t level = 6;
if (info.Length() > idx && info[idx].IsObject()) {
Object o = info[idx].As<Object>();
if (o.Has("level")) level = o.Get("level").As<Number>().Int32Value();
}
return level;
}
// compress(input: Buffer, opts?: { level?: number }): Buffer
Value Compress(const CallbackInfo &info) {
Env env = info.Env();
auto input = info[0].As<Buffer<uint8_t>>();
try {
auto out = fastdeflate::deflate(input.Data(), input.Length(), read_level(info, 1));
return Buffer<uint8_t>::Copy(env, out.data(), out.size());
} catch (const std::exception &e) {
throw Error::New(env, e.what());
}
} #[napi]
pub fn compress(input: Buffer, opts: Option<CompressOptions>) -> Buffer {
deflate(&input, level_of(opts)).into()
}
#[napi]
pub fn decompress(input: Buffer) -> Result<Buffer> {
inflate(&input)
.map(Into::into)
.map_err(|e| Error::from_reason(e.to_string()))
} NAN_METHOD(Compress) {
uint8_t *data = (uint8_t *) node::Buffer::Data(info[0]);
size_t len = node::Buffer::Length(info[0]);
int32_t level = 6;
if (info.Length() > 1 && info[1]->IsObject()) {
auto opts = Nan::To<v8::Object>(info[1]).ToLocalChecked();
auto key = Nan::New("level").ToLocalChecked();
if (Nan::Has(opts, key).FromJust())
level = Nan::To<int32_t>(Nan::Get(opts, key).ToLocalChecked()).FromJust();
}
try {
auto out = fastdeflate::deflate(data, len, level);
info.GetReturnValue().Set(Nan::CopyBuffer((char *) out.data(), out.size()).ToLocalChecked());
} catch (const std::exception &e) {
Nan::ThrowError(e.what());
}
} // wasm has no Buffer type: copy bytes into linear memory, pass offsets, copy out.
export function compress(input: Uint8Array, level = 6): Buffer {
const inPtr = wasm.malloc(input.length);
new Uint8Array(wasm.memory.buffer, inPtr, input.length).set(input);
const lenPtr = wasm.malloc(4);
const outPtr = wasm.compress(inPtr, input.length, level, lenPtr);
const outLen = new Int32Array(wasm.memory.buffer, lenPtr, 1)[0];
const out = Buffer.from(new Uint8Array(wasm.memory.buffer, outPtr, outLen)); // copies out
wasm.free(inPtr); wasm.free(lenPtr); wasm.free(outPtr);
return out;
} Async
Compressing a large buffer synchronously stalls the loop. C++ copies out of V8 and runs on a libuv worker thread in a Napi::AsyncWorker; napi.rs is just an async fn with spawn_blocking. WebAssembly has no threadpool, so off the main thread means a worker_thread and message-passing.
// Copy the bytes out of V8, compress on a libuv worker thread, resolve on the JS thread.
class CompressWorker : public AsyncWorker {
std::vector<uint8_t> input_, output_;
int32_t level_;
Promise::Deferred deferred_;
public:
CompressWorker(Napi::Env env, std::vector<uint8_t> in, int32_t level)
: AsyncWorker(env), input_(std::move(in)), level_(level),
deferred_(Promise::Deferred::New(env)) {}
Promise GetPromise() { return deferred_.Promise(); }
void Execute() override { // worker thread: no V8 calls allowed here
output_ = fastdeflate::deflate(input_.data(), input_.size(), level_);
}
void OnOK() override { // back on the JS thread
deferred_.Resolve(Buffer<uint8_t>::Copy(Env(), output_.data(), output_.size()));
}
void OnError(const Error &e) override { deferred_.Reject(e.Value()); }
};
// compressAsync(input: Buffer, opts?): Promise<Buffer>
Value CompressAsync(const CallbackInfo &info) {
auto input = info[0].As<Buffer<uint8_t>>();
std::vector<uint8_t> copy(input.Data(), input.Data() + input.Length());
auto *w = new CompressWorker(info.Env(), std::move(copy), read_level(info, 1));
w->Queue();
return w->GetPromise();
} // An async fn becomes a Promise on the JS side; spawn_blocking moves the CPU
// work off the runtime onto a thread.
#[napi]
pub async fn compress_async(input: Buffer, opts: Option<CompressOptions>) -> Result<Buffer> {
let level = level_of(opts);
let data = input.to_vec();
let out = tokio::task::spawn_blocking(move || deflate(&data, level))
.await
.map_err(|e| Error::from_reason(e.to_string()))?;
Ok(out.into())
} Object lifecycle
The streaming Deflater owns native state tied to a garbage-collected object. Node-API frees it in a destructor that runs as a finalizer. napi.rs lets the Drop trait do it, with the boundary lifetime managed by Node-API references, not Rust lifetimes. Both fire on GC, so the timing is non-deterministic.
// A streaming handle that owns an mz_stream. ObjectWrap frees it in the
// destructor, which the runtime runs as a finalizer when the JS object is GC'd.
static std::vector<uint8_t> drain(Napi::Env env, mz_stream &s,
const uint8_t *in, size_t len, int32_t flush) {
std::vector<uint8_t> out;
unsigned char buf[16384];
s.next_in = in;
s.avail_in = static_cast<unsigned int>(len);
int32_t rc;
do {
s.next_out = buf;
s.avail_out = sizeof(buf);
rc = mz_deflate(&s, flush);
if (rc != MZ_OK && rc != MZ_STREAM_END && rc != MZ_BUF_ERROR)
throw Error::New(env, std::string("deflate failed: ") + mz_error(rc));
out.insert(out.end(), buf, buf + (sizeof(buf) - s.avail_out));
} while (s.avail_out == 0 || (flush == MZ_FINISH && rc != MZ_STREAM_END));
return out;
}
class Deflater : public ObjectWrap<Deflater> {
mz_stream strm_{};
bool active_ = false;
public:
static Object Init(Napi::Env env, Object exports);
Deflater(const CallbackInfo &info) : ObjectWrap<Deflater>(info) {
int32_t level = 6;
if (info.Length() > 0 && info[0].IsObject()) {
Object o = info[0].As<Object>();
if (o.Has("level")) level = o.Get("level").As<Number>().Int32Value();
}
if (mz_deflateInit(&strm_, level) != MZ_OK)
throw Error::New(info.Env(), "deflateInit failed");
active_ = true;
}
~Deflater() { // finalizer: runs when the JS object is collected
if (active_) mz_deflateEnd(&strm_);
}
Napi::Value Push(const CallbackInfo &info) {
auto chunk = info[0].As<Buffer<uint8_t>>();
auto out = drain(info.Env(), strm_, chunk.Data(), chunk.Length(), MZ_NO_FLUSH);
return Buffer<uint8_t>::Copy(info.Env(), out.data(), out.size());
}
Napi::Value Finish(const CallbackInfo &info) {
auto out = drain(info.Env(), strm_, nullptr, 0, MZ_FINISH);
mz_deflateEnd(&strm_);
active_ = false;
return Buffer<uint8_t>::Copy(info.Env(), out.data(), out.size());
}
// [Symbol.dispose]: deterministic cleanup for `using`. Idempotent, so it is
// safe whether or not finish() already ran.
Napi::Value Dispose(const CallbackInfo &info) {
if (active_) { mz_deflateEnd(&strm_); active_ = false; }
return info.Env().Undefined();
}
};
Object Deflater::Init(Napi::Env env, Object exports) {
Function f = DefineClass(env, "Deflater", {
InstanceMethod("push", &Deflater::Push),
InstanceMethod("finish", &Deflater::Finish),
InstanceMethod(Symbol::WellKnown(env, "dispose"), &Deflater::Dispose),
});
exports.Set("Deflater", f);
return exports;
} #[napi]
pub struct Deflater {
enc: Option<ZlibEncoder<Vec<u8>>>,
}
#[napi]
impl Deflater {
#[napi(constructor)]
pub fn new(opts: Option<CompressOptions>) -> Self {
Deflater { enc: Some(ZlibEncoder::new(Vec::new(), Compression::new(level_of(opts)))) }
}
#[napi]
pub fn push(&mut self, chunk: Buffer) -> Result<Buffer> {
let enc = self.enc.as_mut().ok_or_else(|| Error::from_reason("finished"))?;
enc.write_all(&chunk).map_err(|e| Error::from_reason(e.to_string()))?;
let produced: Vec<u8> = enc.get_mut().drain(..).collect();
Ok(produced.into())
}
#[napi]
pub fn finish(&mut self) -> Result<Buffer> {
let enc = self.enc.take().ok_or_else(|| Error::from_reason("already finished"))?;
Ok(enc.finish().map_err(|e| Error::from_reason(e.to_string()))?.into())
}
// Explicit, idempotent cleanup: drop the encoder now instead of waiting for GC.
#[napi]
pub fn dispose(&mut self) {
drop(self.enc.take());
}
// napi.rs has no native Symbol.dispose, so bind it to dispose() from Rust.
// Call once after construction; then `using d = ...` frees deterministically.
#[napi]
pub fn register_disposer(&self, env: Env, mut this: This) -> Result<()> {
// Symbol is a function, so read it unchecked, then grab the well-known
// Symbol.dispose value and assign this[Symbol.dispose] = this.dispose.
let symbol: Object = env.get_global()?.get_named_property_unchecked("Symbol")?;
let dispose_key: Unknown = symbol.get_named_property("dispose")?;
let dispose_fn: Function = this.get_named_property("dispose")?;
this.set_property(dispose_key, dispose_fn)?;
Ok(())
}
}
// GC safety net via the Drop trait. flate2's ZlibEncoder already implements Drop,
// so the encoder frees itself when the struct is collected even if nobody called
// dispose(); for a raw FFI handle this is where you would free it.
impl Drop for Deflater {
fn drop(&mut self) {
drop(self.enc.take());
}
} using makes cleanup deterministic, freeing the handle at the end of the scope through its Symbol.dispose method. C++ registers the symbol natively (the last line of Init). napi.rs has no built-in one, so register_disposer binds it from Rust. WebAssembly has neither, so you free the offset yourself.
Using it from TypeScript
The two read almost the same, except napi.rs ships generated .d.ts types from the Rust signatures and needs registerDisposer() once, behind a factory, for using to work.
// C/C++ (Node-API or NaN): require the built .node
const { compressAsync, Deflater } = require("./build/fast_deflate.node");
// one-shot, runs off the event loop
const packed: Buffer = await compressAsync(bigText, { level: 6 });
// streaming handle. `using` calls its [Symbol.dispose] at the end of this scope,
// freeing the native state deterministically instead of waiting for the GC.
using d = new Deflater({ level: 9 });
const out: Buffer = Buffer.concat([d.push(chunkA), d.push(chunkB), d.finish()]); // napi.rs generates index.js with full TypeScript types from the Rust signatures.
import { Deflater as RsDeflater, compressAsync as rsCompress } from "fast-deflate";
const rsPacked: Buffer = await rsCompress(bigText, { level: 6 });
// napi.rs binds Symbol.dispose from Rust; call registerDisposer() once, then `using`.
function newDeflater(opts?: { level?: number }) {
const d = new RsDeflater(opts);
d.registerDisposer();
return d;
}
using rd = newDeflater({ level: 9 });
rd.push(chunkA); Testing it
The core is a plain function, so it is unit-tested without Node, then the built addons end-to-end from TypeScript with ava (which also covers wasm and checks the output is real zlib via Node’s own zlib).
#include <gtest/gtest.h>
#include "deflate.hpp"
#include <string>
#include <vector>
using namespace fastdeflate;
static std::vector<uint8_t> bytes(const std::string &s) { return {s.begin(), s.end()}; }
TEST(Deflate, RoundTrips) {
std::string s;
for (int i = 0; i < 500; i++) s += "hello miniz ";
auto in = bytes(s);
auto packed = deflate(in.data(), in.size(), 6);
EXPECT_LT(packed.size(), in.size());
EXPECT_EQ(inflate(packed.data(), packed.size()), in);
}
TEST(Deflate, EmptyInput) {
auto packed = deflate(nullptr, 0, 6);
EXPECT_TRUE(inflate(packed.data(), packed.size()).empty());
}
TEST(Deflate, HigherLevelIsNotLarger) {
std::string s;
for (int i = 0; i < 2000; i++) s += "abcdefgh";
auto in = bytes(s);
EXPECT_LE(deflate(in.data(), in.size(), 9).size(),
deflate(in.data(), in.size(), 1).size());
}
TEST(Inflate, RejectsGarbage) {
std::vector<uint8_t> junk = {0xde, 0xad, 0xbe, 0xef, 0x00, 0x11};
EXPECT_THROW(inflate(junk.data(), junk.size()), std::runtime_error);
} Running main() from /home/fkunc/Projects/FilipKuncCom/scripts/addon-example/cpp/build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 4 tests from 2 test suites.
[----------] Global test environment set-up.
[----------] 3 tests from Deflate
[ RUN ] Deflate.RoundTrips
[ OK ] Deflate.RoundTrips (0 ms)
[ RUN ] Deflate.EmptyInput
[ OK ] Deflate.EmptyInput (0 ms)
[ RUN ] Deflate.HigherLevelIsNotLarger
[ OK ] Deflate.HigherLevelIsNotLarger (0 ms)
[----------] 3 tests from Deflate (0 ms total)
[----------] 1 test from Inflate
[ RUN ] Inflate.RejectsGarbage
[ OK ] Inflate.RejectsGarbage (0 ms)
[----------] 1 test from Inflate (0 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 2 test suites ran. (0 ms total)
[ PASSED ] 4 tests.
#[cfg(test)]
mod tests {
use super::{deflate, inflate};
#[test]
fn round_trips() {
let data = b"hello miniz ".repeat(500);
let packed = deflate(&data, 6);
assert!(packed.len() < data.len());
assert_eq!(inflate(&packed).unwrap(), data);
}
#[test]
fn higher_level_is_not_larger() {
let data = b"abcdefgh".repeat(2000);
assert!(deflate(&data, 9).len() <= deflate(&data, 1).len());
}
#[test]
fn inflate_rejects_garbage() {
assert!(inflate(&[0xde, 0xad, 0xbe, 0xef]).is_err());
}
}
running 3 tests
test tests::inflate_rejects_garbage ... ok
test tests::round_trips ... ok
test tests::higher_level_is_not_larger ... ok
test result: ok. 3 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
import test from "ava";
import zlib from "node:zlib";
import { createRequire } from "node:module";
const require = createRequire(import.meta.url);
// the two real addons built by this workspace
const addons: Record<string, any> = {
cpp: require("../cpp/build/fast_deflate.node"),
rust: require("../rust/index.js"),
};
for (const [name, m] of Object.entries(addons)) {
test(`${name}: compress round-trips`, (t) => {
const input = Buffer.from("hello miniz ".repeat(500));
const packed = m.compress(input, { level: 6 });
t.true(packed.length < input.length);
t.deepEqual(m.decompress(packed), input);
});
test(`${name}: compressAsync resolves a Buffer`, async (t) => {
const input = Buffer.from("async ".repeat(1000));
const packed = await m.compressAsync(input);
t.deepEqual(m.decompress(packed), input);
});
test(`${name}: Deflater streams chunks`, (t) => {
const d = new m.Deflater({ level: 9 });
const out = Buffer.concat([d.push(Buffer.from("aaaa")), d.push(Buffer.from("bbbb")), d.finish()]);
t.deepEqual(m.decompress(out), Buffer.from("aaaabbbb"));
});
test(`${name}: output is standard zlib`, (t) => {
const input = Buffer.from("interop ".repeat(200));
t.deepEqual(zlib.inflateSync(m.compress(input)), input); // Node's own zlib reads it
});
}
// NaN exposes only the sync API (it is the legacy binding), so test just that.
const nan = require("../nan/build/Release/nan_deflate.node");
test("nan: compress round-trips and is standard zlib", (t) => {
const input = Buffer.from("hello nan ".repeat(300));
const packed = nan.compress(input, { level: 6 });
t.deepEqual(nan.decompress(packed), input);
t.deepEqual(zlib.inflateSync(packed), input);
});
// `using` calls the handle's [Symbol.dispose] at scope end: deterministic cleanup,
// no waiting for the GC finalizer.
test("cpp: using disposes the Deflater handle", (t) => {
let out: Buffer;
{
using d = new addons.cpp.Deflater({ level: 9 });
out = Buffer.concat([d.push(Buffer.from("aaaa")), d.finish()]);
}
t.deepEqual(addons.cpp.decompress(out), Buffer.from("aaaa"));
t.is(typeof new addons.cpp.Deflater()[Symbol.dispose], "function");
});
// napi.rs binds Symbol.dispose from Rust (registerDisposer), so `using` works natively.
const newDeflater = (opts?: { level?: number }) => {
const d = new addons.rust.Deflater(opts);
d.registerDisposer();
return d;
};
test("rust: using disposes via native registerDisposer", (t) => {
let out: Buffer;
{
using d = newDeflater({ level: 9 });
out = Buffer.concat([d.push(Buffer.from("bbbb")), d.finish()]);
}
t.deepEqual(addons.rust.decompress(out), Buffer.from("bbbb"));
t.is(typeof newDeflater()[Symbol.dispose], "function");
});
✔ wasm › wasm: compress round-trips
✔ wasm › wasm: output is standard zlib
✔ lib › cpp: compress round-trips
✔ lib › cpp: Deflater streams chunks
✔ lib › cpp: output is standard zlib
✔ lib › rust: compress round-trips
✔ lib › rust: Deflater streams chunks
✔ lib › rust: output is standard zlib
✔ lib › nan: compress round-trips and is standard zlib
✔ lib › cpp: using disposes the Deflater handle
✔ lib › rust: using disposes via native registerDisposer
✔ lib › cpp: compressAsync resolves a Buffer
✔ lib › rust: compressAsync resolves a Buffer
─
13 tests passed
Building and shipping it
Only the C and C++ bindings need a build system. node-gyp is the old default, built on the abandoned gyp-next and needing Python plus the MSVC build tools on Windows. Plain CMake is more portable and pulls the Node-API headers from the npm packages. The Rust side is a Cargo.toml with the compressor as an ordinary dependency.
# Node-API headers come straight from the npm packages, so no node-gyp or cmake-js.
execute_process(COMMAND node -p "require('path').dirname(require.resolve('node-api-headers/include/node_api.h'))"
OUTPUT_VARIABLE NODE_API_INC OUTPUT_STRIP_TRAILING_WHITESPACE)
execute_process(COMMAND node -p "require('path').dirname(require.resolve('node-addon-api/napi.h'))"
OUTPUT_VARIABLE NODE_ADDON_API_INC OUTPUT_STRIP_TRAILING_WHITESPACE)
add_library(fast_deflate SHARED binding.cpp deflate.cpp ../vendor/miniz.c)
set_target_properties(fast_deflate PROPERTIES PREFIX "" SUFFIX ".node")
target_include_directories(fast_deflate PRIVATE
${CMAKE_CURRENT_SOURCE_DIR} ../vendor ${NODE_API_INC} ${NODE_ADDON_API_INC})
target_compile_definitions(fast_deflate PRIVATE NAPI_VERSION=10 NAPI_CPP_EXCEPTIONS)
# Node resolves the addon's symbols at load time, so leave them undefined.
if(APPLE)
target_link_options(fast_deflate PRIVATE -undefined dynamic_lookup)
endif() [package]
name = "fast-deflate"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
napi = { version = "3", features = ["tokio_rt"] }
napi-derive = "3"
flate2 = "1"
tokio = { version = "1", features = ["rt"] }
[build-dependencies]
napi-build = "2" Building each is a one-liner, and napi.rs reaches WebAssembly from the same crate, one rustup target away.
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build # this machine
napi build --release
# or every platform from one Linux runner (cargo-zigbuild + cargo-xwin under the hood)
napi build --release --target aarch64-apple-darwin --use-napi-cross # macOS arm64
napi build --release --target x86_64-apple-darwin --use-napi-cross # macOS x64
napi build --release --target aarch64-unknown-linux-gnu --use-napi-cross # Linux arm64
napi build --release --target x86_64-unknown-linux-musl --use-napi-cross # Alpine / musl
napi build --release --target aarch64-pc-windows-msvc # Windows arm64
napi build --release --target x86_64-pc-windows-msvc # Windows x64 emcc -O3 -DMINIZ_NO_ZLIB_COMPATIBLE_NAMES -DMINIZ_NO_STDIO -Ivendor \
wasm/wasm_binding.c vendor/miniz.c -o fast_deflate.wasm \
-sSTANDALONE_WASM -Wl,--no-entry \
-sEXPORTED_FUNCTIONS=_compress,_decompress,_malloc,_free -sINITIAL_MEMORY=67108864 # the same crate compiled to wasm via emnapi, one target away
rustup target add wasm32-wasip1-threads
napi build --release --target wasm32-wasip1-threads On a user’s machine, nothing should compile, so ship prebuilt binaries and let a loader like node-gyp-build pick one. The Rust tab above is the whole distribution story: from one Linux runner, cargo-zigbuild and cargo-xwin cross-compile every platform, Windows and arm64 included, and each .node publishes as its own package wired into optionalDependencies so npm install pulls only the user’s binary. C++ reaches the same end with prebuilds, but one CI job per operating system, because you cannot cross-compile a macOS or Windows addon from Linux. WebAssembly skips the matrix entirely, one artifact everywhere, and the same Rust crate compiles to wasm through emnapi.