Every image decoder in the Rust ecosystem does the same thing: it takes compressed bytes and gives you pixels. That is exactly right for 99.9% of use cases. But when you need to reach inside a JPEG and manipulate the quantized DCT coefficients directly – without ever touching the pixel domain – you discover a gap in the ecosystem that no amount of clever API usage can bridge.

This is the story of building a pure-Rust JPEG coefficient codec for Phasm, a steganography app that hides encrypted text messages inside JPEG photos. The codec is roughly 1,000 lines of Rust, has zero external crate dependencies, produces byte-for-byte identical output on round-trip, and compiles to native code for iOS/Android and WebAssembly for the browser.

The Problem: Why Standard Decoders Are Not Enough

What steganography needs

JPEG steganography operates in the transform domain. When a JPEG is created, the encoder splits the image into 8x8 pixel blocks, applies the Discrete Cosine Transform to each block, quantizes the resulting coefficients against a quality-dependent quantization table, and then Huffman-encodes the quantized coefficients into a compressed bitstream.

Modern adaptive steganography algorithms like J-UNIWARD work by carefully modifying these quantized DCT coefficients – nudging selected values by +1 or -1 in regions where the change is statistically hardest to detect. The key requirement is direct, lossless access to the coefficients:

  1. Read the quantized DCT coefficients from an existing JPEG.
  2. Modify specific coefficients in-place (typically by +1 or -1).
  3. Write the modified coefficients back into a valid JPEG, preserving all other structure.

If at any point the coefficients pass through a decode-to-pixels-then-re-encode cycle, the re-quantization step introduces rounding errors that destroy both the original statistical properties of the image and any previously embedded data. The codec must operate entirely in the coefficient domain.

What existing Rust crates provide

We surveyed every JPEG-related crate in the Rust ecosystem. Here is what we found:

Crate Type Exposes DCT coefficients? Round-trip capable? Notes
image (image-rs) High-level image library No No Decodes to pixel buffers only
jpeg-decoder Baseline JPEG decoder No No (decode only) In maintenance mode; no encoder
zune-jpeg Fast decoder (SSE/AVX) No No (decode only) Targets pixel output; coefficients are internal
mozjpeg (Rust bindings) Encoder via C FFI Partially (via libjpeg API) Requires C FFI Wraps C library; cannot compile to wasm32-unknown-unknown
turbojpeg (Rust bindings) Encoder/decoder via C FFI Partially (via libjpeg-turbo API) Requires C FFI Same C FFI constraint

The pattern is clear. Pure Rust crates decode JPEGs to pixels and discard the coefficients. C-based libraries like libjpeg-turbo do expose coefficient access (this is how tools like jpegtran perform lossless transformations), but they require C FFI – which is not available on the wasm32-unknown-unknown target we need for browser deployment.

The WASM constraint

Phasm’s architecture is a shared Rust core that compiles to three targets: native libraries for iOS and Android, and WebAssembly for the web client. The WASM target uses wasm32-unknown-unknown, which is designed for JavaScript interoperability via wasm-bindgen – not for linking C libraries. The ABI definitions for this target are minimal and do not match what clang produces for C code, making libjpeg-turbo linkage fragile at best and broken at worst.

This left us with two options:

  1. Fork zune-jpeg and surgically expose the coefficient arrays before the dequantization and IDCT stages.
  2. Build a custom JPEG coefficient codec from scratch.

We chose option 2. A fork of zune-jpeg would have given us decode-side access, but we also needed to encode modified coefficients back into a valid JPEG bitstream – and zune-jpeg is a decoder only. Building from scratch gave us full control over both directions and kept the dependency count at zero.

JPEG Internals: A Crash Course

To understand what the codec must do, you need a working model of how a JPEG file is structured. This section is the mental model that guided our implementation.

File structure: markers and segments

A JPEG file is a sequence of markers, each introduced by the byte 0xFF followed by a marker type byte. Between markers are segments containing the marker’s payload data. The structure looks like this:

                graph TD
                    A["SOI (0xFFD8)
Start of Image"] --> B["APP0 (0xFFE0)
JFIF Header"] B --> C["DQT (0xFFDB)
Quantization Tables"] C --> D["SOF0 (0xFFC0)
Start of Frame
(dimensions, components)"] D --> E["DHT (0xFFC4)
Huffman Tables"] E --> F["SOS (0xFFDA)
Start of Scan
(scan header)"] F --> G["Entropy-Coded Data
(compressed coefficients)"] G --> H["EOI (0xFFD9)
End of Image"]

For a JPEG file, the core markers we must parse are:

Marker Hex Purpose
SOI 0xFFD8 Signals the start of a JPEG file
APP0-APP15 0xFFE0-0xFFEF Application-specific metadata (JFIF, EXIF, ICC profiles)
DQT 0xFFDB Defines one or more quantization tables (up to 4)
SOF0 0xFFC0 Start of Frame (baseline): image width, height, number of components, sampling factors
SOF2 0xFFC2 Start of Frame (progressive): same fields as SOF0, but data arrives in multiple scans
DHT 0xFFC4 Defines one or more Huffman tables (DC and AC, up to 4 of each)
SOS 0xFFDA Start of Scan: which components and Huffman tables to use, followed by the compressed data
EOI 0xFFD9 Signals the end of the image

Everything between SOI and SOS is header data that we can largely preserve verbatim. The entropy-coded data after SOS is where the coefficients live, and that is where all the complexity resides.

The compression pipeline

JPEG compression follows a well-defined pipeline. For steganography, we need to intercept this pipeline at a very specific point:

                graph LR
                    subgraph "Standard Decoder Path"
                        A1["JPEG bytes"] --> B1["Parse markers"]
                        B1 --> C1["Huffman decode"]
                        C1 --> D1["Dequantize"]
                        D1 --> E1["Inverse DCT"]
                        E1 --> F1["Pixels"]
                    end

                    subgraph "Steganography Codec Path"
                        A2["JPEG bytes"] --> B2["Parse markers"]
                        B2 --> C2["Huffman decode"]
                        C2 --> D2["Quantized DCT
coefficients"] D2 --> E2["Modify coefficients
(embed message)"] E2 --> F2["Huffman encode"] F2 --> G2["Reassemble JPEG"] G2 --> H2["JPEG bytes"] end style D2 fill:#2d5016,stroke:#4a8c2a,color:#fff style E2 fill:#2d5016,stroke:#4a8c2a,color:#fff

A standard decoder runs the full pipeline from compressed bytes to pixels. Our codec stops at the quantized DCT coefficients, allows modification, and then runs the pipeline in reverse. The dequantization and IDCT stages are never executed.

The DCT transform

The 2D Discrete Cosine Transform converts an 8x8 block of pixel values into 64 frequency coefficients. The forward transform is:

$$F_{u,v} = \frac{1}{4} \alpha(u) \alpha(v) \sum_{x=0}^{7} \sum_{y=0}^{7} f(x,y) \cos\left[\frac{(2x+1)u\pi}{16}\right] \cos\left[\frac{(2y+1)v\pi}{16}\right]$$

where $\alpha(0) = \frac{1}{\sqrt{2}}$ and $\alpha(k) = 1$ for $k > 0$.

The coefficient at position $(0, 0)$ is the DC coefficient – the average brightness of the block. All other positions are AC coefficients representing progressively higher spatial frequencies. Steganography algorithms typically modify only AC coefficients, since DC changes affect the entire block’s brightness and are more easily detected.

Quantization

After the DCT, each coefficient is divided by a corresponding entry in the quantization table and rounded to the nearest integer:

$$F^q_{u,v} = \text{round}\left(\frac{F_{u,v}}{Q_{u,v}}\right)$$

This is the primary lossy step in JPEG compression. Higher quality factors produce smaller quantization values (less division, more precision preserved). The quantized coefficients $F^q_{u,v}$ are exactly the values our codec reads and writes. They are integers, typically in the range of -1024 to +1023 for baseline JPEG.

Zigzag ordering

The 64 coefficients in each 8x8 block are not stored in row-major order. Instead, they follow a zigzag scan pattern that traverses the block from low frequencies (top-left) to high frequencies (bottom-right):

$$\text{zigzag}: (0,0) \to (0,1) \to (1,0) \to (2,0) \to (1,1) \to (0,2) \to (0,3) \to (1,2) \to \cdots \to (7,7)$$

This ordering groups low-frequency coefficients (which tend to be non-zero) at the beginning and high-frequency coefficients (which are often zero after quantization) at the end. This arrangement enables efficient run-length encoding of trailing zeros during entropy coding.

Huffman coding of coefficients

The quantized, zigzag-ordered coefficients are entropy-coded using Huffman coding. JPEG uses a clever scheme that separates the category of a value (how many bits it needs) from the value itself:

  • DC coefficients are differentially coded: each DC value is stored as the difference from the previous block’s DC value. A Huffman code specifies the number of additional bits needed, followed by those bits.
  • AC coefficients are run-length coded: each Huffman symbol encodes a (run, size) pair, where run is the number of preceding zeros and size is the category of the next non-zero coefficient. Special symbols mark end-of-block (all remaining coefficients are zero) and runs of 16 zeros.

The information-theoretic entropy of the Huffman coding can be expressed as:

$$H = -\sum_{i} p_i \log_2 p_i$$

where $p_i$ is the probability of symbol $i$. JPEG’s Huffman tables are designed to approach this bound, typically achieving within 5-10% of the theoretical minimum for natural images.

Design Decisions

Build from scratch vs. fork

The decision to build from scratch rather than fork an existing decoder was driven by three factors:

Bidirectional operation. We need both decode (JPEG bytes to coefficients) and encode (coefficients to JPEG bytes). All existing pure-Rust JPEG crates are decode-only. Bolting an encoder onto a decoder fork would mean understanding and modifying someone else’s internal data structures – often more work than a clean-room implementation for a tightly scoped problem.

Minimal scope. We initially scoped the codec to baseline sequential JPEG with Huffman coding, which covers the vast majority of real-world photos. Progressive JPEG support was added later (see below), but arithmetic coding, CMYK color spaces, and other JPEG extensions remain out of scope. Scoping down to the essentials dramatically reduces complexity.

Byte-for-byte round-trip. Our correctness criterion is strict: reading a JPEG, modifying zero coefficients, and writing it back must produce identical bytes. This is much easier to achieve when you control every byte of the output yourself, rather than hoping a forked codebase preserves structure you did not write.

What to preserve, what to recompute

On a round-trip through our codec, the following are preserved byte-for-byte:

  • The SOI marker
  • All APP markers (JFIF, EXIF, ICC profiles) – passed through verbatim
  • DQT segments (quantization tables)
  • SOF0 segment (image dimensions and component definitions)
  • DHT segments (Huffman tables)
  • The SOS header (component selectors and table assignments)
  • The EOI marker

The only bytes that change are the entropy-coded data after the SOS header. Even when no coefficients are modified, we must re-encode this data to verify correctness – and our codec does produce identical entropy-coded bytes for unmodified coefficients, confirming that our Huffman encode and decode paths are exact inverses.

Zero external dependencies for the codec module

The codec module itself has no external crate dependencies – it is entirely self-contained within the Phasm core. This was a deliberate choice for several reasons:

  • WASM binary size. Every dependency adds to the compiled WASM module size, which directly affects page load time. Our codec contributes roughly 15-20 KB to the gzipped WASM binary.
  • Compilation target flexibility. Zero dependencies means zero chance of a transitive dependency pulling in something that does not compile on wasm32-unknown-unknown.
  • Auditability. For a security-focused application, being able to audit every line of code that handles the image data matters. The Phasm core is open source on GitHub (GPL-3.0), so the codec can be independently reviewed.

This zero-dependency philosophy extends beyond the codec. Other performance-critical modules in Phasm follow the same principle: the project includes an in-house FFT implementation (Cooley-Tukey for power-of-two sizes, Bluestein’s algorithm for arbitrary sizes) that replaced the rustfft crate, and a det_math module providing FDLIBM-based deterministic implementations of sin, cos, atan2, and hypot. These exist because standard Math.* functions in WASM are not guaranteed to produce bit-identical results across browsers and platforms – a requirement for Phasm’s J-UNIWARD cost computation, which depends on wavelet filter convolutions. By owning these implementations, we guarantee identical embedding decisions on every platform.

Implementation

The following sections describe the key implementation components. The code examples are illustrative and simplified – we do not want to publish the exact production code, but the concepts are faithful to the real implementation.

Marker parsing

Parsing JPEG markers is the most straightforward part of the codec. The structure is regular: every marker starts with 0xFF, followed by the marker type, followed by (for most markers) a two-byte big-endian length and then that many bytes of payload.

struct JpegMarker {
                    marker_type: u8,
                    data: Vec<u8>,
                }

                fn parse_markers(input: &[u8]) -> Result<Vec<JpegMarker>, CodecError> {
                    let mut pos = 0;
                    let mut markers = Vec::new();

                    // Verify SOI
                    if input[pos] != 0xFF || input[pos + 1] != 0xD8 {
                        return Err(CodecError::InvalidSoi);
                    }
                    pos += 2;

                    loop {
                        // Skip any padding 0xFF bytes
                        while pos < input.len() && input[pos] == 0xFF {
                            pos += 1;
                        }
                        let marker_type = input[pos];
                        pos += 1;

                        if marker_type == 0xD9 {
                            break; // EOI
                        }
                        if marker_type == 0xDA {
                            // SOS: the rest until EOI is entropy-coded data
                            // (handle specially)
                            break;
                        }

                        // Read segment length (includes the 2 length bytes)
                        let length = u16::from_be_bytes([input[pos], input[pos + 1]]) as usize;
                        let data = input[pos + 2..pos + length].to_vec();
                        pos += length;

                        markers.push(JpegMarker { marker_type, data });
                    }

                    Ok(markers)
                }
                

The SOS marker requires special handling because the entropy-coded data that follows it does not have a length prefix. Instead, it continues until the EOI marker (0xFFD9), with a byte-stuffing convention: any 0xFF byte within the data is followed by a 0x00 byte to distinguish it from a marker.

Huffman table construction

JPEG Huffman tables are defined by two arrays: BITS[1..16], which gives the count of codes at each bit length, and HUFFVAL, which lists the symbol values in order of increasing code length. From these, we construct a lookup structure for fast decoding.

struct HuffmanTable {
                    /// For each code length 1..=16, the minimum code value
                    min_code: [i32; 17],
                    /// For each code length, the maximum code value
                    max_code: [i32; 17],
                    /// Index into the values array for the first code of each length
                    val_ptr: [usize; 17],
                    /// The symbol values
                    values: Vec<u8>,
                }

                impl HuffmanTable {
                    fn from_spec(bits: &[u8; 16], huffval: &[u8]) -> Self {
                        let mut min_code = [0i32; 17];
                        let mut max_code = [-1i32; 17];
                        let mut val_ptr = [0usize; 17];

                        let mut code = 0i32;
                        let mut val_idx = 0;

                        for length in 1..=16 {
                            let count = bits[length - 1] as usize;
                            if count > 0 {
                                min_code[length] = code;
                                val_ptr[length] = val_idx;
                                code += count as i32;
                                max_code[length] = code - 1;
                                val_idx += count;
                            }
                            code <<= 1;
                        }

                        HuffmanTable {
                            min_code,
                            max_code,
                            val_ptr,
                            values: huffval.to_vec(),
                        }
                    }
                }
                

Huffman decode loop

Decoding the entropy-coded data is where the codec earns its keep. The decoder reads bits from the bitstream, matches them against Huffman tables to recover the (category, value) or (run, size) symbols, and reconstructs the 64-element coefficient array for each 8x8 block.

fn decode_block(
                    reader: &mut BitReader,
                    dc_table: &HuffmanTable,
                    ac_table: &HuffmanTable,
                    prev_dc: &mut i16,
                ) -> Result<[i16; 64], CodecError> {
                    let mut coeffs = [0i16; 64];

                    // Decode DC coefficient (differential)
                    let dc_category = decode_huffman_symbol(reader, dc_table)?;
                    let dc_diff = if dc_category > 0 {
                        read_signed_bits(reader, dc_category)?
                    } else {
                        0
                    };
                    *prev_dc += dc_diff;
                    coeffs[0] = *prev_dc;

                    // Decode AC coefficients (zigzag positions 1..63)
                    let mut k = 1;
                    while k < 64 {
                        let symbol = decode_huffman_symbol(reader, ac_table)?;
                        let run = (symbol >> 4) as usize;   // upper nibble: zero run
                        let size = (symbol & 0x0F) as u8;    // lower nibble: category

                        if size == 0 {
                            if run == 0 {
                                break; // EOB: remaining coefficients are zero
                            }
                            // run == 15, size == 0: skip 16 zeros (ZRL)
                            k += 16;
                            continue;
                        }

                        k += run;
                        if k >= 64 {
                            return Err(CodecError::InvalidCoefficient);
                        }
                        coeffs[k] = read_signed_bits(reader, size)?;
                        k += 1;
                    }

                    Ok(coeffs)
                }
                

The BitReader handles the byte-stuffing protocol (skipping 0x00 after 0xFF), bit-level reading, and tracking the current position within the entropy-coded segment.

Coefficient modification and re-encoding

After decoding, the steganography engine receives the full grid of quantized DCT coefficients along with the quantization tables. It modifies selected coefficients according to the embedding algorithm (J-UNIWARD cost function + Syndrome-Trellis Codes in Phasm’s case) and hands them back to the codec for re-encoding.

The encoder is essentially the decode loop in reverse:

fn encode_block(
                    writer: &mut BitWriter,
                    coeffs: &[i16; 64],
                    dc_table: &HuffmanTable,
                    ac_table: &HuffmanTable,
                    prev_dc: &mut i16,
                ) -> Result<(), CodecError> {
                    // Encode DC coefficient (differential)
                    let dc_diff = coeffs[0] - *prev_dc;
                    *prev_dc = coeffs[0];
                    let dc_category = bit_category(dc_diff);
                    write_huffman_symbol(writer, dc_table, dc_category)?;
                    if dc_category > 0 {
                        write_signed_bits(writer, dc_diff, dc_category)?;
                    }

                    // Encode AC coefficients
                    let mut zero_run = 0;
                    for k in 1..64 {
                        if coeffs[k] == 0 {
                            zero_run += 1;
                            continue;
                        }

                        // Emit ZRL symbols for runs of 16+ zeros
                        while zero_run >= 16 {
                            write_huffman_symbol(writer, ac_table, 0xF0)?; // ZRL
                            zero_run -= 16;
                        }

                        let size = bit_category(coeffs[k]);
                        let symbol = ((zero_run as u8) << 4) | size;
                        write_huffman_symbol(writer, ac_table, symbol)?;
                        write_signed_bits(writer, coeffs[k], size)?;
                        zero_run = 0;
                    }

                    if zero_run > 0 {
                        write_huffman_symbol(writer, ac_table, 0x00)?; // EOB
                    }

                    Ok(())
                }

                /// Returns the number of bits needed to represent a signed value
                fn bit_category(value: i16) -> u8 {
                    if value == 0 { return 0; }
                    let abs_val = value.unsigned_abs();
                    16 - abs_val.leading_zeros() as u8
                }
                

The BitWriter mirrors the BitReader: it accumulates bits, flushes complete bytes, and applies byte-stuffing (inserting 0x00 after any 0xFF byte in the output).

Reassembly

The final step is reassembling the JPEG file. We write the SOI marker, replay all the header markers (APP, DQT, SOF0, DHT) in their original order with their original bytes, write the SOS header, write the newly encoded entropy data, and close with EOI. Because we preserve the header markers verbatim, the only bytes that differ between input and output are in the entropy-coded segment – and those only differ if coefficients were actually modified.

WASM Considerations

Memory constraints

For a JPEG codec, WASM memory is manageable but not trivial – a typical 12-megapixel smartphone photo produces roughly 150,000 8x8 blocks for the luminance channel (fewer for chroma with 4:2:0 subsampling). At 2 bytes per coefficient and 64 coefficients per block, the full coefficient grid for all channels requires roughly 36 MB. This fits within WASM’s memory model but requires thoughtful pre-allocation. Memory is allocated in 64 KB pages and growing is expensive, so we pre-allocate based on the image dimensions parsed from SOF0, avoiding repeated allocations during decode.

The codec is not the only memory-conscious component. J-UNIWARD’s wavelet-based cost computation originally required 651 MB of peak memory for a 12-megapixel image – far beyond WASM’s practical limits. Optimizing the wavelet computation pipeline (processing one filter direction at a time instead of materializing all intermediate arrays simultaneously) reduced peak memory by 71% to 187 MB, making J-UNIWARD feasible on mobile devices and in the browser. This kind of memory optimization is invisible in the final API but essential for the pure-Rust, single-codebase approach to work across all targets.

Cross-platform numeric behavior

One subtle issue we encountered involves pseudorandom number generation. Phasm uses a ChaCha20-based CSPRNG for key-derived operations (coefficient permutation, STC submatrix generation). The PRNG must produce identical sequences across native ARM64, native x86-64, and WASM.

We constrain the PRNG to u32 operations only. The reason is subtle: usize is 32 bits on WASM but 64 bits on native targets (ARM64, x86-64). Higher-level PRNG methods like rng.gen_range() consume different amounts of PRNG entropy per step depending on usize width, producing completely different Fisher-Yates permutation sequences on WASM vs. native. By using only rng.next_u32() with manual modular reduction, we guarantee identical permutations across all platforms.

Cross-platform determinism extends beyond the PRNG. Phasm’s J-UNIWARD cost function requires wavelet filter convolutions that involve floating-point trigonometric operations. Standard libm functions like sin, cos, atan2, and hypot are not guaranteed to produce identical results across WASM engines (V8, SpiderMonkey, JavaScriptCore) or between WASM and native targets. To solve this, we built a det_math module with FDLIBM-based implementations of these functions, ensuring bit-identical results everywhere. Similarly, the FFT required for J-UNIWARD’s wavelet decomposition uses an in-house Cooley-Tukey + Bluestein implementation rather than the rustfft crate, again to guarantee cross-platform reproducibility. These are the same kinds of “build the minimal thing you need” decisions that drove the codec itself.

API design and binary size

The codec’s API accepts and returns &[u8] byte slices – no file I/O, no streaming, no buffered readers. This makes it trivially compatible with WASM (where the filesystem does not exist) and simplifies native mobile integration.

Compiled as part of the full Phasm WASM module, the codec contributes approximately 15-20 KB to the gzipped binary – a fraction of what statically linking libjpeg-turbo would cost.

Performance

Benchmarks

We benchmarked the codec on three representative images at different resolutions, measuring the round-trip time (decode + encode with no modifications) on native and WASM targets:

Image Resolution Blocks Native (M1 Mac, release) WASM (Chrome, M1 Mac) WASM / Native ratio
Portrait 1920x1080 32,400 8 ms 19 ms 2.4x
Landscape 4032x3024 152,208 34 ms 78 ms 2.3x
Document 640x480 4,800 1.5 ms 3.8 ms 2.5x

The WASM overhead of roughly 2.3-2.5x over native is consistent with published benchmarks for compute-bound Rust-to-WASM workloads. For context, the full Phasm embed operation (which includes J-UNIWARD cost computation and STC embedding on top of the codec) takes 50-200 ms total on WASM depending on image size – well within interactive latency.

Where the time goes

Profiling reveals that Huffman decoding and encoding dominate – the entropy-coded data for a 12-megapixel image is hundreds of kilobytes of bit-level operations, while the header is a few hundred bytes of straightforward parsing. On WASM, the branch-heavy nature of Huffman decoding does not vectorize, but V8 and SpiderMonkey JIT-compile the i32 shift-and-mask sequences reasonably well.

We considered multi-bit Huffman lookup tables (mapping 8-12 bits at a time to symbols) as libjpeg-turbo does, but opted for the simpler code-length-walking approach. Our codec is 2-5x slower than libjpeg-turbo for raw decode throughput, but this comparison misses the point: libjpeg-turbo decodes to pixels, is unavailable on wasm32-unknown-unknown, and for steganography the codec is never the bottleneck – the cost function and STC embedding dominate total time by 5-10x.

Lessons Learned

1. Byte-stuffing is the first trap

The 0xFF/0x00 byte-stuffing convention is simple to describe and tricky to get right. Every 0xFF byte in the entropy-coded data is followed by 0x00 to distinguish it from a marker. The trap is edge cases: what happens when a Huffman code boundary falls on a 0xFF byte? We caught three separate byte-stuffing bugs, each triggered by different images with high-entropy content.

Lesson: Test with photos of static noise and other high-entropy content that maximize 0xFF bytes in the compressed data.

2. DC differential coding has global state

DC coefficients are differentially coded: each value is stored as the difference from the previous block’s DC. A single misread bit early in the decode corrupts every subsequent DC coefficient. The symptom (wrong brightness) appears hundreds of blocks away from the cause (a misread bit).

Lesson: Assert that the DC accumulator stays within a reasonable range at MCU row boundaries. Dramatic drift means a decode error upstream.

3. Restart markers are your friend (and your complication)

JPEG supports optional restart markers (0xFFD0-0xFFD7) that periodically reset the DC accumulators and realign the bit reader. They add codec complexity but provide a powerful debugging tool: if round-trip fails, you can isolate the problem to a specific restart interval.

Lesson: Support restart markers from day one. Many camera phones produce JPEGs with them, and they will appear in production.

4. Zigzag order vs. natural order is a constant source of confusion

Throughout the codec, you must be meticulous about whether coefficients are in zigzag order (as they appear in the bitstream) or natural (row-major) order (as they appear in the 8x8 block). Mixing these up produces images that decode without errors but have bizarre visual artifacts – horizontal smearing, diagonal banding, or color shifts.

We defined a ZigzagIndex newtype wrapper and a NaturalIndex newtype wrapper to make the type system enforce the distinction. This caught multiple bugs at compile time that would have been painful to debug at runtime.

Lesson: Use the type system aggressively when you have domain-specific index semantics. A usize that means “position in zigzag order” and a usize that means “position in the 8x8 block” are fundamentally different types.

5. Exact round-trip is a powerful correctness oracle

Our primary test: decode and re-encode with no modifications must produce identical bytes. This catches Huffman encode/decode mismatches, byte-stuffing errors, bit-alignment problems after restart markers, off-by-one errors in coefficient indexing, and missing EOB symbols. We test against a corpus of roughly 200 JPEG files from iPhones, Android phones, Photoshop, GIMP, and adversarially-constructed images.

Lesson: If your problem domain has a natural round-trip property, test it exhaustively. It is the highest-leverage test you can write.

6. The ecosystem gap is also an opportunity

No pure-Rust crate exposes quantized DCT coefficients because demand is small. But this gap means you are building something genuinely novel. The narrow scope (baseline JPEG, coefficient access, bidirectional) keeps the implementation tractable – roughly 1,000 lines that one person can hold in their head.

Lesson: When the ecosystem does not serve your use case, build the minimal component you need, keep it focused, and own it fully.

7. Pure Rust pays compound interest across platforms

By keeping the codec in pure Rust with zero C FFI, we get automatic cross-compilation to every target Rust supports. The same code runs on:

  • iOS (ARM64, as a static library linked into the Swift app)
  • Android (ARM64 and x86-64, via JNI through cargo-ndk)
  • Web (WASM, via wasm-bindgen)
  • Development machines (x86-64 macOS/Linux, for testing)

Any C dependency would require per-platform build scripts, cross-compilation toolchains, and CI configuration for each target. With pure Rust, cargo build --target <whatever> just works. Over the lifetime of a project, this saved time compounds significantly.

What We Would Do Differently

Progressive JPEG support. We initially scoped to baseline sequential JPEG, which covers the vast majority of smartphone photos. We have since added progressive JPEG decode support to the codec, handling the multi-scan structure and spectral selection/successive approximation passes that progressive encoding uses. This was motivated by real-world images encountered during testing – notably, WhatsApp sometimes re-encodes images as progressive JPEGs (SOF2) rather than baseline (SOF0), and the codec needed to handle both for reliable Armor mode decoding.

Arithmetic coding. Rarely encountered in the wild, but it exists. Another 200-300 lines.

Streaming decode. Our codec requires the entire JPEG in memory. Fine for 2-8 MB user photos; limiting for server-side processing of very large images.

Conclusion

Building a JPEG coefficient codec from scratch was not our first choice. We would have preferred to use an existing crate and focus on the steganography algorithms. But the requirement for coefficient-level access, bidirectional operation, zero C FFI, and WASM compatibility left no existing option that fit.

The result is a codec that does exactly one thing well: it reads quantized DCT coefficients from a JPEG, lets you modify them, and writes them back – in roughly 1,000 lines of pure Rust, with zero external crate dependencies and byte-for-byte round-trip fidelity across native and WASM targets.

For the Rust and WASM community, this highlights both a strength and a gap. The strength is that Rust makes it tractable to build low-level binary format codecs with confidence – the type system, the absence of undefined behavior, and the compiler output quality mean that once code passes your test suite, it tends to be correct across all platforms. The gap is that the image processing ecosystem is optimized for pixel output, and anyone working at a lower abstraction level needs to bring their own tools.

If you are building something that requires non-standard access to a well-known file format: understand the format deeply, scope to the minimum viable subset, invest heavily in round-trip testing, and embrace pure Rust for cross-platform benefits. The upfront investment is real, but the ongoing maintenance cost is low and the architectural flexibility is worth it.


References

Standards

  1. ITU-T Recommendation T.81, “Information Technology – Digital Compression and Coding of Continuous-Tone Still Images – Requirements and Guidelines,” International Telecommunication Union, 1992. (ISO/IEC 10918-1:1994)

  2. G. K. Wallace, “The JPEG Still Picture Compression Standard,” IEEE Transactions on Consumer Electronics, vol. 38, no. 1, pp. xviii–xxxiv, 1992.

  3. N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete Cosine Transform,” IEEE Transactions on Computers, vol. C-23, no. 1, pp. 90–93, 1974.

Software

  1. Independent JPEG Group, “libjpeg,” ijg.org. Reference implementation of the JPEG standard.

  2. libjpeg-turbo Contributors, “libjpeg-turbo,” libjpeg-turbo.org. SIMD-accelerated JPEG codec compatible with the IJG API.

  3. Mozilla, “MozJPEG.” Production JPEG encoder with trellis quantization and progressive scan optimization.


Phasm is a free steganography app that hides encrypted text messages inside JPEG photos. It runs on iOS, Android, and the web. All processing happens on your device. The JPEG coefficient codec described in this post is the foundation that makes both Ghost mode (stealth) and Armor mode (robust) possible.