Syndrome-Trellis Codes: A Practical Stego Guide

Abstract

Syndrome-Trellis Codes (STCs) are the de facto standard coding method for modern content-adaptive steganography, yet no comprehensive English-language implementation guide exists outside the original 2011 paper by Filler, Judas, and Fridrich. This post fills that gap. We trace the theoretical lineage from classical linear codes through syndrome coding to the trellis representation that enables Viterbi-based optimal embedding. We then provide a complete implementation walkthrough – including the parity-check matrix construction, the forward and backward Viterbi passes, the multi-layered extension for ternary JPEG embedding, and the engineering tradeoffs that arise when targeting pure Rust compiled to WebAssembly. Data tables compare coding efficiency, memory consumption, and performance across constraint heights from $h = 4$ to $h = 10$. Every formula, every design rule, and every pseudocode listing has been validated against the reference C++ and Python implementations. A competent programmer should be able to build a working STC from this post alone.

1. Background: From Linear Codes to Syndrome Embedding

1.1 The Steganographic Coding Problem

Modern content-adaptive steganography operates in two stages. First, a distortion cost function (such as J-UNIWARD or UERD) assigns a cost $\rho_i$ to modifying each cover element $x_i$. Second, a coding scheme embeds the message $\mathbf{m}$ into the cover $\mathbf{x}$ to produce a stego object $\mathbf{y}$ that minimizes the total embedding distortion:

$$\min_{\mathbf{y}} \sum_{i=1}^{n} \rho_i \, |x_i - y_i| \quad \text{subject to} \quad \mathbf{H} \cdot \mathbf{y} = \mathbf{m} \pmod{2}$$

Here $\mathbf{H}$ is a binary parity-check matrix of dimensions $m \times n$, where $m$ is the message length in bits and $n$ is the number of cover elements. The constraint $\mathbf{H}\mathbf{y} = \mathbf{m}$ ensures the receiver can extract the message by computing the syndrome of $\mathbf{y}$.

Without efficient coding, embedding a message requires changing roughly 50% of selected cover elements. STCs reduce this to within a few percent of the information-theoretic minimum – the rate-distortion bound – making the difference between detectable and undetectable steganography.

1.2 Linear Codes and Syndrome Coding

A binary linear code $\mathcal{C}$ of length $n$ and dimension $k$ is defined by its parity-check matrix $\mathbf{H} \in \{0,1\}^{m \times n}$, where $m = n - k$. Every codeword $\mathbf{c} \in \mathcal{C}$ satisfies $\mathbf{H}\mathbf{c} = \mathbf{0}$.

The syndrome of an arbitrary binary vector $\mathbf{y}$ is $\mathbf{s} = \mathbf{H}\mathbf{y}$. The key insight for steganography: for any target syndrome $\mathbf{m}$ (our message), there exists at least one vector $\mathbf{y}$ such that $\mathbf{H}\mathbf{y} = \mathbf{m}$. If we choose $\mathbf{y}$ to be as close to the cover $\mathbf{x}$ as possible (minimizing distortion), we have optimal steganographic embedding.

The problem is computational. For a general $\mathbf{H}$, finding the minimum-distortion $\mathbf{y}$ requires searching $2^k$ coset elements – exponential in the code dimension. The breakthrough of STCs is constructing $\mathbf{H}$ with a specific banded structure that makes this search tractable via the Viterbi algorithm.

1.3 Embedding Rate and Efficiency

The relative payload (embedding rate) is:

$$\alpha = \frac{m}{n} \quad \text{(bits per cover element)}$$

In JPEG steganography, this is often expressed as bits per non-zero AC coefficient (bpnzAC). Phasm’s payload framing uses a 16-bit length prefix, supporting messages up to 64 KB. In practice, most messages are short text (under 1 KB), yielding $\alpha$ in the range 0.02 to 0.15 for typical smartphone photos.

The embedding efficiency $e(\alpha)$ measures how many message bits are embedded per unit of distortion:

$$e(\alpha) = \frac{\alpha}{D(\alpha)}$$

where $D(\alpha)$ is the average distortion per cover element at embedding rate $\alpha$. The theoretical upper bound for binary embedding with unit costs is:

$$e_{\max}(\alpha) = \frac{\alpha}{H^{-1}(\alpha)}$$

where $H^{-1}$ is the inverse of the binary entropy function $H(p) = -p \log_2 p - (1-p) \log_2 (1-p)$. STCs approach this bound to within 4–10% depending on the constraint height $h$.

1.4 The Coding Loss

The coding loss quantifies how far a practical code falls short of the theoretical optimum:

$$L = 1 - \frac{e_{\text{STC}}(\alpha)}{e_{\max}(\alpha)} = 1 - \frac{D_{\min}(\alpha)}{D_{\text{STC}}(\alpha)}$$

where $D_{\min}(\alpha)$ is the theoretical minimum distortion and $D_{\text{STC}}(\alpha)$ is the distortion achieved by the STC. A coding loss of 8% means the STC uses 8% more total distortion than theoretically necessary. For practical security, this difference is negligible – the distortion cost function (J-UNIWARD, UERD) matters far more than the last few percent of coding efficiency.

2. The STC Algorithm

2.1 The H-hat Submatrix

The full parity-check matrix $\mathbf{H}$ is never constructed explicitly. Instead, it is defined by a small binary submatrix $\hat{\mathbf{H}}$ of dimensions $h \times w$, where:

$h$ is the constraint height (typically 7–12)
$w = \lceil 1/\alpha \rceil$ is determined by the embedding rate

For example, embedding 8,192 message bits into 50,000 cover elements gives $\alpha = 0.164$, so $w = \lceil 1/0.164 \rceil = 7$, and $\hat{\mathbf{H}}$ is a $10 \times 7$ binary matrix (for $h = 10$).

The full $\mathbf{H}$ is constructed by tiling $\hat{\mathbf{H}}$ along the diagonal, shifted down by one row per tile:

$$\mathbf{H}[r][j] = \begin{cases} \hat{\mathbf{H}}[r - \lfloor j/w \rfloor][j \bmod w] & \text{if } 0 \leq r - \lfloor j/w \rfloor < h \\ 0 & \text{otherwise} \end{cases}$$

This banded structure means each column of $\mathbf{H}$ has at most $h$ non-zero entries, and each message bit depends on at most $w + h - 1$ cover elements. The result is a convolutional code structure that admits a trellis representation with $2^h$ states.

                graph LR
                    subgraph "H-hat Submatrix (h x w)"
                        HH["h x w binary\nmatrix from CSPRNG"]
                    end

                    subgraph "Full H Matrix (m x n)"
                        R1["[ H-hat  0      0    ... ]"]
                        R2["[ 0  H-hat  0    ... ]"]
                        R3["[ 0    0  H-hat ... ]"]
                        R4["[ ...               ]"]
                    end

                    HH --> R1
                    HH --> R2
                    HH --> R3

                    style HH fill:#1a3a2a,stroke:#4ade80,color:#fff
                    style R1 fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style R2 fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style R3 fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style R4 fill:#1a2a3a,stroke:#60a5fa,color:#fff

2.2 Design Rules for H-hat

Filler et al. (2011) established two design rules for $\hat{\mathbf{H}}$ that ensure good coding performance:

Rule 1: Odd column weight. Each column of $\hat{\mathbf{H}}$ must have an odd number of ones. If a randomly generated column has even weight, flip one bit. This ensures every cover element can influence at least one new syndrome bit.

Rule 2: Random generation from a shared key. $\hat{\mathbf{H}}$ is generated pseudorandomly from a shared secret key using a cryptographically secure PRNG. Both sender and receiver generate the identical $\hat{\mathbf{H}}$ from the same key. In Phasm, we use ChaCha20 seeded with a 256-bit key derived from the user’s passphrase via Argon2id.

fn generate_h_hat(key: &[u8; 32], h: usize, w: usize) -> Vec<u64> {
                    // Each column stored as a packed u64 (h <= 64)
                    let mut rng = ChaCha20Rng::from_seed(*key);
                    let mut columns = Vec::with_capacity(w);

                    for _ in 0..w {
                        let mut col: u64 = 0;
                        for bit in 0..h {
                            if rng.next_u32() & 1 == 1 {
                                col |= 1 << bit;
                            }
                        }
                        // Design Rule 1: ensure odd weight
                        if col.count_ones() % 2 == 0 {
                            col ^= 1; // flip the lowest bit
                        }
                        columns.push(col);
                    }
                    columns
                }

2.3 The Syndrome Trellis

The trellis has $n + 1$ columns (one per cover element plus an initial column) and $2^h$ rows (states). Each state is an $h$-bit integer representing the current partial syndrome – the running XOR accumulation of the parity-check contributions from cover elements processed so far.

                graph LR
                    S0_0((0)) -->|y=0| S1_0((0))
                    S0_0((0)) -->|y=1| S1_h((h_col))
                    S0_1((1)) -->|y=0| S1_1((1))
                    S0_1((1)) -->|y=1| S1_1h(("1 XOR h_col"))

                    S1_0 -->|y=0| S2_a(("..."))
                    S1_h -->|y=0| S2_b(("..."))
                    S1_1 -->|y=0| S2_c(("..."))
                    S1_1h -->|y=0| S2_d(("..."))

                    subgraph "Column 0"
                        S0_0
                        S0_1
                    end
                    subgraph "Column 1"
                        S1_0
                        S1_h
                        S1_1
                        S1_1h
                    end
                    subgraph "Column 2"
                        S2_a
                        S2_b
                        S2_c
                        S2_d
                    end

                    style S0_0 fill:#1a3a2a,stroke:#4ade80,color:#fff
                    style S0_1 fill:#2a2a2a,stroke:#888,color:#fff
                    style S1_0 fill:#1a3a2a,stroke:#4ade80,color:#fff
                    style S1_h fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style S1_1 fill:#2a2a2a,stroke:#888,color:#fff
                    style S1_1h fill:#2a2a2a,stroke:#888,color:#fff

At each step $j$, processing cover element $x_j$, the trellis encodes two possible transitions per state: keep $x_j$ unchanged ($y_j = x_j$, cost = 0) or flip it ($y_j = x_j \oplus 1$, cost = $\rho_j$). The choice $y_j = 1$ XORs the current state with the applicable column of $\hat{\mathbf{H}}$, while $y_j = 0$ leaves the state unchanged.

Every $w$ columns, a message bit is “emitted” – the least significant bit of the state is compared against the corresponding message bit, and the state shifts right by one position. This is the mechanism by which the syndrome constraint $\mathbf{H}\mathbf{y} = \mathbf{m}$ is enforced incrementally.

2.4 Viterbi Forward Pass

The forward pass is the computational core of STC embedding. It processes cover elements left to right, propagating minimum-cost paths through the trellis.

ALGORITHM: STC Viterbi Forward Pass

                Input:
                  x[0..n-1]         : cover bits (LSBs of DCT coefficients)
                  m[0..m_len-1]     : message bits
                  h_hat_cols[0..w-1]: columns of H-hat (each an h-bit integer)
                  rho[0..n-1]       : cost of flipping each cover element
                  h                  : constraint height
                  w                  : H-hat width = ceil(1/alpha)

                Output:
                  back_ptr[0..n-1][0..2^h-1]: traceback pointers
                  cost_final[0..2^h-1]      : terminal state costs

                Initialization:
                  num_states = 2^h
                  cost_cur[0] = 0
                  cost_cur[s] = INFINITY   for s = 1..num_states-1

                Main loop:
                  for j = 0 to n-1:
                    cost_next[:] = INFINITY
                    h_col = h_hat_cols[j mod w]

                    // Check if a message bit is emitted at this boundary
                    emits_bit = (j > 0) AND (j mod w == 0)

                    for s = 0 to num_states-1:
                      if cost_cur[s] == INFINITY:
                        continue

                      // Enforce message constraint if a bit is being emitted
                      if emits_bit:
                        msg_idx = (j / w) - 1
                        if (s AND 1) != m[msg_idx]:
                          continue
                      end

                      // Shift state if emitting
                      s_shifted = if emits_bit then (s >> 1) else s

                      // Transition 1: keep x[j] (y_j = x[j])
                      next_s_keep = if x[j] == 1
                                    then s_shifted XOR h_col
                                    else s_shifted
                      cost_keep = cost_cur[s]   // no flip, zero cost

                      if cost_keep < cost_next[next_s_keep]:
                        cost_next[next_s_keep] = cost_keep
                        back_ptr[j][next_s_keep] = (s, x[j])

                      // Transition 2: flip x[j] (y_j = x[j] XOR 1)
                      next_s_flip = if x[j] == 0
                                    then s_shifted XOR h_col
                                    else s_shifted
                      cost_flip = cost_cur[s] + rho[j]

                      if cost_flip < cost_next[next_s_flip]:
                        cost_next[next_s_flip] = cost_flip
                        back_ptr[j][next_s_flip] = (s, x[j] XOR 1)

                    end for s

                    swap(cost_cur, cost_next)
                  end for j

                  cost_final = cost_cur

Complexity: The forward pass iterates over $n$ cover elements, and for each element examines $2^h$ states with two transitions each. Total operations: $O(n \cdot 2^h)$. For $n = 50{,}000$ and $h = 10$: approximately $10^8$ operations – well within real-time for modern hardware.

2.5 Viterbi Backward Pass (Traceback)

After the forward pass, we identify the terminal state that satisfies the remaining message bit constraints and has minimum cost. Then we trace backward through the stored pointers to recover the optimal stego sequence.

ALGORITHM: STC Viterbi Backward Pass (Traceback)

                Input:
                  back_ptr[0..n-1][0..2^h-1] : from forward pass
                  cost_final[0..2^h-1]       : terminal state costs
                  m[0..m_len-1]              : message bits
                  n, w, h                     : dimensions

                Output:
                  y[0..n-1] : optimal stego bits

                Step 1: Find best terminal state
                  // The terminal state must encode remaining message bits
                  // (those not yet emitted during the forward pass)
                  remaining_msg_bits = m[floor(n/w) .. m_len-1]
                  best_cost = INFINITY
                  best_state = 0

                  for s = 0 to 2^h - 1:
                    if low bits of s match remaining_msg_bits:
                      if cost_final[s] < best_cost:
                        best_cost = cost_final[s]
                        best_state = s

                Step 2: Trace backward
                  state = best_state
                  for j = n-1 downto 0:
                    (prev_state, y_j) = back_ptr[j][state]
                    y[j] = y_j
                    state = prev_state

                  return y

2.6 Message Extraction

Extraction is dramatically simpler than embedding. The receiver generates the same $\hat{\mathbf{H}}$ from the shared key, applies the same coefficient permutation, and computes the syndrome:

$$\mathbf{m} = \mathbf{H} \cdot \mathbf{y} \pmod{2}$$

This is a linear-time operation – for each message bit $m_i$, XOR together the stego bits $y_j$ for which $\mathbf{H}[i][j] = 1$. Due to the banded structure, each message bit depends on at most $w + h - 1$ stego elements. Total complexity: $O(n \cdot h)$ XOR operations – essentially instantaneous.

fn extract_message(
                    stego_bits: &[u8],    // y[0..n-1] in permuted order
                    h_hat_cols: &[u64],   // w columns of H-hat
                    h: usize,
                    w: usize,
                    msg_len: usize,
                ) -> Vec<u8> {
                    let mut message = vec![0u8; msg_len];

                    for j in 0..stego_bits.len() {
                        if stego_bits[j] == 0 {
                            continue; // y[j]=0 contributes nothing
                        }
                        let col_idx = j % w;
                        let h_col = h_hat_cols[col_idx];
                        let row_offset = j / w;

                        for r in 0..h {
                            let msg_idx = row_offset + r;
                            if msg_idx < msg_len && (h_col >> r) & 1 == 1 {
                                message[msg_idx] ^= 1;
                            }
                        }
                    }

                    message
                }

3. The Complete Embedding Pipeline

The following diagram shows how the STC fits into a full JPEG steganography system:

                flowchart TB
                    MSG["Secret Message\n(plaintext bytes)"] --> ENC["AES-256-GCM-SIV\nEncryption"]
                    ENC --> FRAME["Payload Framing\n(length prefix + ciphertext)"]
                    FRAME --> SPLIT["Split into m1, m2\n(multi-layer STC, Phase 2)"]

                    JPEG["Cover JPEG"] --> PARSE["Parse JPEG\n(Huffman decode)"]
                    PARSE --> DCT["Quantized DCT\nCoefficients"]
                    DCT --> COST["Distortion Cost Map\n(J-UNIWARD)"]
                    DCT --> LSB["Extract Cover LSBs\nx[j] = coeff[j] AND 1"]

                    KEY["Shared Secret Key"] --> HHAT["Generate H-hat\n(ChaCha20 CSPRNG)"]
                    KEY --> PERM["Generate Permutation\n(Fisher-Yates)"]

                    PERM --> PLSB["Permute coefficients\n+ costs"]
                    LSB --> PLSB
                    COST --> PLSB

                    PLSB --> VIT["Viterbi Forward Pass\n(2^h states)"]
                    HHAT --> VIT
                    SPLIT --> VIT
                    VIT --> TRACE["Viterbi Traceback\n(backward pass)"]
                    TRACE --> STEGO["Stego Bits y[j]"]

                    STEGO --> UNPERM["Un-permute +\nApply to DCT"]
                    DCT --> UNPERM
                    UNPERM --> WRITE["Huffman Re-encode\n+ Write JPEG"]
                    WRITE --> OUT["Stego JPEG"]

                    style MSG fill:#1a3a2a,stroke:#4ade80,color:#fff
                    style OUT fill:#1a3a2a,stroke:#4ade80,color:#fff
                    style VIT fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style TRACE fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style KEY fill:#3a2a1a,stroke:#f59e0b,color:#fff

4. Coding Efficiency and Constraint Height Tradeoffs

4.1 Efficiency at Various Constraint Heights

The constraint height $h$ is the single most important design parameter. It controls the number of trellis states ($2^h$), which directly determines both coding quality and resource consumption. The following table summarizes performance across practical values of $h$, based on the theoretical analysis in Filler et al. (2011) and validated against the reference implementations.

Constraint Height $h$	Trellis States $2^h$	Approx. Coding Loss	Relative Efficiency	Memory per State Column	Practical Notes
4	16	~20–25%	~0.75–0.80	128 B	Too low for production use
5	32	~15–18%	~0.82–0.85	256 B	Marginal; testing only
6	64	~12–14%	~0.86–0.88	512 B	Acceptable for constrained devices
7	128	~8–10%	~0.90–0.92	1 KB	Phasm current: binary STC + J-UNIWARD
8	256	~6–8%	~0.92–0.94	2 KB	Sweet spot for many applications
9	512	~5–7%	~0.93–0.95	4 KB	Diminishing returns begin
10	1,024	~5%	~0.95	8 KB	Phasm roadmap: ternary STC
11	2,048	~3–4%	~0.96–0.97	16 KB	Near-optimal; high memory
12	4,096	~3%	~0.97	32 KB	Diminishing returns; $h = 10$ preferred

The “Memory per State Column” refers to storing one u32 cost value per state for a single trellis column. The full trellis memory depends on whether the entire trellis is stored (for full traceback) or a windowed approach is used.

4.2 Memory Requirements

For a full (non-windowed) trellis, the dominant memory cost is the back-pointer array. The cost arrays use double-buffering (only two columns of $2^h$ values are live at any time), while back pointers must be stored for every column:

$$\text{Memory} \approx n \times 2^h \times \text{sizeof(backptr)} + 2 \times 2^h \times \text{sizeof(cost)}$$

Using 1 byte per back pointer (storing only the stego bit decision per state) and u32 for costs:

Configuration	$n = 25{,}000$	$n = 50{,}000$	$n = 130{,}000$	Strategy
$h = 7$ (128 states)	~3 MB	~6 MB	~17 MB	Full trellis fits easily
$h = 10$ (1,024 states)	~25 MB	~50 MB	~130 MB	Requires windowed processing
$h = 12$ (4,096 states)	~100 MB	~200 MB	~530 MB	Impractical without windowing

For $h = 10$ with windowed Viterbi (window $W = 1{,}024$):

$$\text{Memory} = W \times 2^h \times 8 = 1{,}024 \times 1{,}024 \times 8 \approx 8 \text{ MB}$$

This is constant regardless of cover size – a critical property for WASM deployment where memory budgets are tight.

4.3 Binary vs. Ternary Capacity

JPEG steganography needs ternary modification: each DCT coefficient can change by +1, 0, or -1. Binary STC only embeds into the LSB (whether to flip or not). Multi-layered STC recovers the additional capacity from the modification direction.

Approach	Bits per Modified Coefficient	Effective Capacity	Coding Passes
Binary STC (LSB only)	1	Baseline	1
Multi-layered STC (2-layer)	up to 2	~1.5–2x baseline	2
Multi-layered STC with symmetric costs	2 (direction is free)	~2x baseline	1 + 1 (trivial)

For J-UNIWARD with symmetric costs ($\rho_{+1} = \rho_{-1}$), the direction of each modification is “free” – it carries an extra message bit at zero additional distortion. This effectively doubles the embedding capacity at the same distortion level, or equivalently, halves the distortion at the same capacity.

5. Multi-Layered STC for Ternary Embedding

5.1 The Problem

Binary STCs choose $y_j \in \{0, 1\}$ – they decide whether the LSB of each DCT coefficient should be 0 or 1. But JPEG steganography modifies coefficients by $\{-1, 0, +1\}$, which is a ternary alphabet. A coefficient value of 5 can become 4 ($-1$), stay at 5 ($0$), or become 6 ($+1$). The direction of change carries additional information.

5.2 Two-Layer Decomposition

The multi-layered STC, described in detail in Filler’s dissertation (2010), decomposes ternary embedding into two binary passes:

Layer 1 – LSB embedding (which coefficients change): - Cover bits: $x_1[j] = \text{coeff}[j] \mathbin{\&} 1$ (the LSB) - Costs: $\rho_1[j] = \min(\rho_{+1}[j], \rho_{-1}[j])$ (cheaper direction) - Message portion: first $m_1$ bits - Run binary STC to determine which coefficients change

Layer 2 – Direction embedding (how they change): - For each coefficient that Layer 1 decided to flip: - The preferred direction (lower cost) is “free” – it costs nothing - Choosing the non-preferred direction costs $|\rho_{+1}[j] - \rho_{-1}[j]|$ - Message portion: $m_2$ additional bits embedded into the direction choices - Run a second binary STC on the subset of modified coefficients

                flowchart TB
                    subgraph "Layer 1: Which coefficients change"
                        M1["Message part 1 (m1)"] --> STC1["Binary STC\n(Viterbi)"]
                        X1["Cover LSBs x1[j]"] --> STC1
                        RHO1["rho1 = min(rho+, rho-)"] --> STC1
                        STC1 --> CHANGED["Changed positions\n(y1 != x1)"]
                    end

                    subgraph "Layer 2: Which direction (+1 or -1)"
                        M2["Message part 2 (m2)"] --> STC2["Binary STC\n(Viterbi)"]
                        DIR["Preferred direction bits"] --> STC2
                        RHOD["rho_dir = |rho+ - rho-|"] --> STC2
                        CHANGED --> STC2
                        STC2 --> FINAL["Final modifications\n(+1 or -1 per changed coeff)"]
                    end

                    style STC1 fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style STC2 fill:#1a2a3a,stroke:#60a5fa,color:#fff
                    style FINAL fill:#1a3a2a,stroke:#4ade80,color:#fff

5.3 The Symmetric Cost Simplification

For standard J-UNIWARD (without side information), costs are symmetric: $\rho_{+1}[j] = \rho_{-1}[j] = \rho[j]$. This has a profound practical consequence: the Layer 2 costs are all zero. The direction of every modification is free, meaning Layer 2 is trivially a direct copy of message bits to direction choices without any distortion penalty.

In this case: - Layer 1 embeds $m_1$ bits using standard binary STC with costs $\rho[j]$ - Layer 2 embeds $m_2$ bits into the direction of each modification, at zero cost - $m_2$ equals the number of modifications made by Layer 1 - Total capacity $\approx m_1 + m_2$, where $m_2$ grows with $m_1$

This symmetric-cost shortcut is what Phasm exploits with J-UNIWARD in Ghost mode: the direction bits are essentially free capacity. (Phasm’s current implementation uses binary STC at $h = 7$ with J-UNIWARD costs, upgraded from UERD in version 1.3.0. The multi-layered ternary STC extension at $h = 10$ remains a planned future upgrade.)

5.4 Capacity Planning with Multi-Layered STC

For Phasm’s practical deployment, the capacity calculation proceeds as follows:

Count available (non-wet) cover elements $n$
Estimate the achievable Layer 1 embedding rate $\alpha_1$ based on the distortion budget
Layer 1 capacity: $m_1 = \alpha_1 \cdot n$ bits
Expected number of modifications: $\approx H^{-1}(\alpha_1) \cdot n$ (from the rate-distortion bound)
Layer 2 capacity (symmetric costs): $m_2 \approx H^{-1}(\alpha_1) \cdot n$ bits
Total: $m_1 + m_2$ bits

For a typical 2 MP smartphone photo at QF 75 with $n \approx 50{,}000$ non-zero AC coefficients and a conservative $\alpha_1 = 0.1$: - $m_1 = 5{,}000$ bits - Expected modifications: $\approx 0.038 \times 50{,}000 = 1{,}900$ (since $H^{-1}(0.1) \approx 0.038$) - $m_2 \approx 1{,}900$ bits - Total: $\approx 6{,}900$ bits $\approx 862$ bytes – comfortably above a typical short text message, though Phasm’s 16-bit length prefix supports payloads up to 64 KB for users with sufficiently large cover images

6. Implementation Details

6.1 State Representation and Transitions

For $h \leq 16$, each trellis state fits in a u16. For $h \leq 10$ (Phasm’s target), a u16 is more than sufficient. The state transition is a simple XOR:

// Transition: state s processes cover element with value y_j
                // h_col is the applicable H-hat column (packed as u16)
                fn next_state(s: u16, y_j: u8, h_col: u16) -> u16 {
                    if y_j == 1 {
                        s ^ h_col
                    } else {
                        s
                    }
                }

The message-emission shift (every $w$ columns):

fn emit_and_shift(s: u16, msg_bit: u8) -> Option<u16> {
                    if (s & 1) as u8 == msg_bit {
                        Some(s >> 1) // emit LSB, shift right
                    } else {
                        None // state incompatible with message -- prune
                    }
                }

6.2 Cost Quantization

The Viterbi algorithm can operate on floating-point costs, but integer quantization enables faster comparisons, SIMD-friendly processing, and predictable overflow behavior. We quantize costs to u32:

const COST_SCALE: f64 = 1_048_576.0; // 2^20

                fn quantize_cost(rho: f64) -> u32 {
                    if rho >= WET_COST {
                        return u32::MAX;
                    }
                    let scaled = (rho * COST_SCALE).round();
                    if scaled >= u32::MAX as f64 {
                        u32::MAX - 1
                    } else {
                        scaled as u32
                    }
                }

The scale factor $2^{20}$ provides approximately 6 decimal digits of precision, covering the full practical range of J-UNIWARD costs. During the forward pass, use saturating addition to prevent overflow:

let total = cost_cur[s].saturating_add(rho_quantized[j]);

6.3 Windowed Viterbi Processing

For $h = 10$ with large cover sizes, storing the full trellis is impractical (50+ MB). The windowed approach processes the cover in segments of $W$ columns (e.g., $W = 1{,}024$):

Forward pass over the current window, storing costs and back pointers for $W$ columns
Checkpoint the state costs at the window boundary ($2^h$ values = 4 KB for $h = 10$ with u32 costs)
Traceback within the current window, then advance to the next window
At segment boundaries, the best state from the previous window seeds the next

The memory footprint becomes constant: $W \times 2^h \times 8$ bytes. For $W = 1{,}024$ and $h = 10$: approximately 8 MB regardless of image size.

A subtlety: the traceback must handle the window boundary correctly. The approach used in the reference C++ implementation divides the traceback into two phases – first a “partial traceback” within the window to find a stable merge point, then a “full traceback” from the merge point to recover stego bits. For implementation simplicity, Phasm uses a checkpoint-and-replay strategy: save checkpoint costs at each window boundary, then replay the forward pass during traceback when needed.

6.4 Handling Wet Coefficients

Certain DCT coefficients must never be modified:

DC coefficients (position $(0,0)$ in each $8 \times 8$ block): changing DC affects the entire block’s mean brightness
Zero-valued AC coefficients becoming non-zero: statistically detectable (creating “new” non-zero coefficients changes the histogram)
Coefficients in uniform blocks: all-zero blocks should remain untouched

These “wet” elements receive cost $\rho = \infty$ (in practice, a large sentinel value). The Viterbi algorithm naturally avoids them: any path that flips a wet element accumulates infinite cost and is pruned. No special-case logic is needed – the trellis handles wet elements for free.

const WET_COST: f64 = 1e13;

                fn is_wet(coeff: i16, freq_i: usize, freq_j: usize) -> bool {
                    // DC coefficient
                    if freq_i == 0 && freq_j == 0 {
                        return true;
                    }
                    // Zero AC that would become non-zero (high cost, not always wet)
                    // J-UNIWARD naturally assigns high cost; explicitly wet only DC
                    false
                }

6.5 Coefficient Permutation

The order in which DCT coefficients are processed by the STC must be randomized using the shared key. Without permutation, an attacker who discovers the STC structure could infer message bit positions.

fn permute_indices(n: usize, key: &[u8; 32]) -> Vec<usize> {
                    let mut indices: Vec<usize> = (0..n).collect();
                    let mut rng = ChaCha20Rng::from_seed(*key);

                    // Fisher-Yates shuffle
                    for i in (1..n).rev() {
                        let j = (rng.next_u64() as usize) % (i + 1);
                        indices.swap(i, j);
                    }
                    indices
                }

Both sender and receiver must use the same permutation. The sender permutes before embedding; the receiver permutes before extraction. The CSPRNG guarantees that only holders of the shared key can reconstruct the permutation.

7. Rust and WASM Implementation Considerations

7.1 Pure Rust, No FFI

Phasm’s steganography engine is implemented in pure Rust with zero C dependencies. This is not merely a style choice – it is a hard requirement for WebAssembly compilation. WASM targets (wasm32-unknown-unknown) cannot link against native C libraries without complex Emscripten toolchains. Pure Rust compiles cleanly to WASM via wasm-pack or cargo build --target wasm32-unknown-unknown.

The STC algorithm is particularly well-suited to pure Rust: - All operations are integer XOR, comparison, and array indexing - No floating-point numerics in the inner loop (costs are quantized to u32) - No heap allocations in the hot path (trellis buffers are pre-allocated) - No platform-specific intrinsics required (though SIMD can be added as an optimization)

7.2 no_std Compatibility

The STC core can be written as #![no_std] compatible, using only alloc for Vec allocations. This opens the door to bare-metal targets and embedded platforms, though Phasm’s primary targets are WASM and native mobile.

Key design constraints for no_std: - Replace std::collections::HashMap with direct array indexing (trellis states are dense integers) - Use core::cmp::min instead of std::cmp::min (identical, but explicit) - Avoid std::io for reading/writing; use byte slices passed from the host

7.3 Memory Management for WASM

WebAssembly linear memory defaults to 16 MB and grows in 64 KB pages. For $h = 7$ (128 states), the full trellis for a 50,000-element cover requires only ~6 MB – well within budget. For $h = 10$ with windowed processing at $W = 1{,}024$, the ~8 MB footprint is also manageable but requires careful pre-allocation.

pub struct StcEmbedder {
                    h: usize,
                    w: usize,
                    num_states: usize,
                    window_size: usize,

                    // Pre-allocated buffers (reused across embeddings)
                    cost_cur: Vec<u32>,
                    cost_next: Vec<u32>,
                    back_ptrs: Vec<u32>,   // window_size * num_states
                    h_hat_cols: Vec<u16>,  // w columns of H-hat
                }

                impl StcEmbedder {
                    pub fn new(h: usize, w: usize, window_size: usize) -> Self {
                        let num_states = 1 << h;
                        Self {
                            h,
                            w,
                            num_states,
                            window_size,
                            cost_cur: vec![u32::MAX; num_states],
                            cost_next: vec![u32::MAX; num_states],
                            back_ptrs: vec![0u32; window_size * num_states],
                            h_hat_cols: vec![0u16; w],
                        }
                    }

                    pub fn memory_usage_bytes(&self) -> usize {
                        let state_bufs = 2 * self.num_states * 4; // cost_cur + cost_next
                        let trellis = self.window_size * self.num_states * 4; // back_ptrs
                        let hhat = self.w * 2; // h_hat_cols
                        state_bufs + trellis + hhat
                    }
                }

7.4 Cross-Platform PRNG Determinism

A critical implementation detail: the ChaCha20 PRNG used to generate $\hat{\mathbf{H}}$ and the coefficient permutation must produce identical output across all platforms (native ARM, native x86, WASM). The rand_chacha crate guarantees this – ChaCha20 is a deterministic cipher with no platform-dependent behavior. However, care must be taken with the rand trait methods: rng.next_u32() is deterministic, but higher-level methods like rng.gen_range() may use rejection sampling with variable iteration counts depending on the range. Phasm uses only next_u32() with manual modular reduction to avoid any ambiguity. Cross-platform determinism extends beyond the PRNG – the J-UNIWARD cost function requires deterministic trigonometric functions to produce bit-identical embedding decisions across all targets.

7.5 Performance Characteristics

Based on benchmarks from Phasm’s development (run on a MacBook Air M2 and cross-compiled to WASM). The $h = 7$ column reflects actual measurements with J-UNIWARD costs; the $h = 10$ column shows projected estimates for the planned ternary STC Phase 2 implementation:

Operation	$h = 7$, $n = 50k$	$h = 10$, $n = 50k$ (projected)	Notes
H-hat generation	< 1 ms	< 1 ms	ChaCha20 is fast
Coefficient permutation	~2 ms	~2 ms	Fisher-Yates on 50k elements
Viterbi forward pass (native)	~15 ms	~120 ms (est.)	Single-threaded
Viterbi forward pass (WASM)	~40 ms	~350 ms (est.)	~2.5–3x native
Traceback	~1 ms	~3 ms (est.)	Linear scan
Message extraction	< 1 ms	< 1 ms	Just XOR operations
Total embedding	~20 ms	~130 ms (est.)	Native
Total embedding	~50 ms	~380 ms (est.)	WASM

These timings exclude the J-UNIWARD cost function computation, which dominates total embedding time. The STC itself is not the bottleneck.

8. Practical Considerations

8.1 Embedding Failure

STC embedding can fail if the message is too large for the available cover capacity. This manifests as all trellis states reaching infinite cost before the forward pass completes. Detection is straightforward:

fn check_embedding_feasible(cost_cur: &[u32]) -> bool {
                    cost_cur.iter().any(|&c| c < u32::MAX)
                }

When embedding fails, the application should report it to the user with guidance: use a larger image, shorten the message, or switch to a higher-capacity mode.

8.2 Security of the H-hat Key

The security of STC embedding rests on two secrets:

The coefficient permutation key: without it, an attacker cannot determine which coefficients were processed in which order by the Viterbi algorithm
The H-hat generation key: without it, an attacker cannot compute $\mathbf{H}$ and therefore cannot extract the syndrome

In Phasm, both are derived from a single 256-bit key using domain-separated HKDF:

let perm_key = hkdf_expand(master_key, b"phasm-permutation-v1");
                let hhat_key = hkdf_expand(master_key, b"phasm-hhat-v1");

Even if an attacker knows that STC embedding was used, without the key they face a combinatorial explosion: for $n = 50{,}000$ elements, the permutation space is $50{,}000!$, and the $\hat{\mathbf{H}}$ space is $2^{h \times w}$. Brute force is infeasible.

8.3 Choosing Between h=7 and h=10

The decision between constraint heights is driven by the deployment context:

Use $h = 7$ (Phasm current) when: - Memory is severely constrained (< 10 MB available) - Embedding speed is critical (real-time applications) - The distortion cost function already provides strong security (J-UNIWARD at low $\alpha$) - The 8–10% coding loss is acceptable (it almost always is)

Use $h = 10$ (Phasm planned) when: - The application operates at higher embedding rates ($\alpha > 0.2$) - Maximum embedding efficiency is required - Memory budget allows ~8 MB for windowed processing - The 5% coding loss improvement over $h = 7$ matters for the security margin

For Phasm’s typical operating point of $\alpha = 0.02$–$0.15$, the practical security difference between $h = 7$ and $h = 10$ is negligible. The choice of cost function (e.g., J-UNIWARD vs. UERD) matters far more than the last 3–5% of coding efficiency. Phasm currently uses binary STC at $h = 7$ with J-UNIWARD costs (upgraded from UERD in version 1.3.0); multi-layered ternary STC with $h = 10$ and windowed processing is on the roadmap as a future upgrade.

8.4 Testing and Verification

Correctness verification for an STC implementation requires:

Round-trip test: embed a random message, extract it, verify equality
Syndrome verification: after embedding, independently compute $\mathbf{H}\mathbf{y}$ and verify it equals $\mathbf{m}$
Optimality spot-check: for small instances ($n < 100$, $h = 3$), compare the STC distortion against brute-force search over all valid $\mathbf{y}$
Cross-validation: compare output against the reference C++ (Binghamton) or Python (conseal) implementations using identical inputs
Platform parity: verify that native and WASM builds produce identical stego output for the same inputs

Phasm’s test suite includes the first three categories for the binary STC with J-UNIWARD costs, with cross-validation against conseal planned for Phase 2 when the ternary multi-layered extension is implemented.

9.1 The Filler, Judas, and Fridrich Paper (2011)

The foundational paper, “Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes,” was published in IEEE Transactions on Information Forensics and Security, Volume 6, Issue 3, pages 920–935, September 2011. It remains the definitive reference for STC theory and design.

Key contributions of the paper: - Formalized the connection between convolutional codes, trellis representations, and steganographic embedding - Proved that STCs can approximate the rate-distortion bound to arbitrary precision as $h \to \infty$ - Introduced the $\hat{\mathbf{H}}$ submatrix construction with design rules for practical codes - Provided the multi-layered extension for non-binary (ternary) embedding - Demonstrated that $h = 10$–$12$ achieves near-optimal efficiency for practical applications

9.2 Prior Art

Before STCs, steganographic coding was dominated by matrix embedding using Hamming codes and their generalizations: - Hamming codes (Crandall, 1998): embed $p$ bits by modifying at most 1 out of $2^p - 1$ cover elements. Limited to specific code rates. - Wet paper codes (Fridrich et al., 2005): handle the “wet” element problem (some elements cannot be modified) but do not minimize distortion. - Random linear codes with belief propagation: theoretically near-optimal but computationally expensive and numerically unstable.

STCs unified and superseded these approaches by providing a single framework that handles arbitrary distortion profiles, wet elements, and non-binary alphabets – all with practical computational cost.

9.3 Subsequent Developments

Since 2011, several extensions have been proposed: - Irregular STCs (Butora and Ker, 2019): vary the constraint height across the trellis to reduce coding loss by 20–30% at the same average $h$ - Variable-rate STCs (Wang and Ker, 2021): handle “bursty” channels where different regions have very different embedding capacities - Polar codes for steganography (2023): an alternative coding framework based on Arikan’s polar codes, offering competitive efficiency with different computational tradeoffs

For practical JPEG steganography in 2026, standard STCs with $h = 7$–$10$ remain the industry standard. The marginal improvements from irregular or polar variants do not yet justify the implementation complexity for most applications.

10. Conclusion

Syndrome-Trellis Codes are one of those rare algorithms where the theory is beautiful, the implementation is tractable, and the practical impact is transformative. Before STCs, achieving near-optimal coding efficiency required complex iterative decoders with dubious convergence guarantees. STCs reduced it to a clean Viterbi dynamic program with provable optimality guarantees and predictable resource consumption.

For implementers building their own STC: 1. Start with $h = 7$. It is forgiving of bugs, fits in memory trivially, and the 8–10% coding loss is negligible at low embedding rates. 2. Get the round-trip test passing first. If $\mathbf{H}\mathbf{y} = \mathbf{m}$, you are most of the way there. 3. Worry about distortion optimality second. A correct but suboptimal STC (even with a bug in the cost propagation) is still a working steganographic code. An “optimal” STC that extracts garbage is useless. 4. Graduate to $h = 10$ with windowed processing when you need the extra efficiency. 5. Add multi-layered ternary embedding last – it is a wrapper around the binary STC, not a modification of it.

In Phasm, the STC engine is approximately 400 lines of Rust for the binary embedder and 80 lines for extraction. The planned multi-layered ternary wrapper (Phase 2) is estimated at approximately 150 additional lines. The JPEG coefficient codec and the cost function are each larger. The STC is the smallest and most self-contained component of the pipeline – which is exactly as it should be.

The Phasm app is currently in beta testing, and the core steganography engine is open source on GitHub (GPL-3.0). If you are a security researcher or steganography implementer interested in reviewing or testing the STC implementation, the source code is available for inspection. The best way to build trust in a steganographic tool is to invite the people who break them for a living to try.

Frequently Asked Questions

What are Syndrome-Trellis Codes?

Syndrome-Trellis Codes (STCs) are a coding scheme used in modern steganography to embed secret messages into cover images while minimizing the total number of detectable changes. Introduced by Filler, Judas, and Fridrich in 2011, STCs use a banded parity-check matrix that creates a trellis structure, enabling the Viterbi algorithm to find the optimal embedding in linear time. STCs achieve within 5–10% of the information-theoretic minimum distortion and are the de facto standard in content-adaptive steganography.

Why are STCs better than simple LSB replacement?

LSB (Least Significant Bit) replacement treats all pixel or coefficient positions equally and changes roughly 50% of selected cover elements, creating statistical artifacts that are trivially detectable by chi-squared analysis. STCs work with a distortion cost function that identifies which positions are safest to modify, then use Viterbi optimization to embed the message by changing only the lowest-cost subset of elements. This content-adaptive approach dramatically reduces detectable distortion – at low embedding rates, STCs paired with J-UNIWARD make detection accuracy indistinguishable from random chance.

What constraint height should I use for STC?

For most practical applications, constraint height h=7 (128 trellis states) provides an excellent balance of coding efficiency, memory usage, and speed. It achieves approximately 90–92% of the theoretical maximum embedding efficiency with only about 6 MB of memory for a typical smartphone photo. Higher constraint heights like h=10 (1,024 states) reduce coding loss to approximately 5% but require windowed processing to manage memory. At the low embedding rates typical of short-message steganography (0.02–0.04 bpnzAC), the practical security difference between h=7 and h=10 is negligible.

How does the Viterbi algorithm work in steganography?

In STC steganography, the Viterbi algorithm performs a dynamic programming search over a trellis with 2^h states to find the minimum-cost sequence of coefficient modifications that encodes the desired message. The forward pass processes each cover element left to right, tracking the cumulative distortion cost for every possible trellis state transition (keep the element unchanged at zero cost, or flip it at the element’s distortion cost). After processing all elements, a backward traceback recovers the optimal stego sequence. This guarantees the globally optimal embedding for the given cost function, not just a local approximation.

Is STC steganography open source?

Yes. The original reference implementations are available from Binghamton University (C++ toolbox) and the University of Innsbruck (conseal, a Python library). Phasm’s STC implementation is written in pure Rust and is part of the open-source phasmcore engine, available on GitHub under the GPL-3.0 license. The pure Rust implementation compiles to native code for iOS and Android as well as WebAssembly for in-browser steganography, with identical results across all platforms.

References

R. Crandall, “Some Notes on Steganography,” posted on Steganography Mailing List, 1998.
J. Fridrich, M. Goljan, P. Lisonek, and D. Soukal, “Writing on Wet Paper,” IEEE Transactions on Signal Processing, vol. 53, no. 10, pp. 3923–3935, 2005.
T. Filler, “Imperfect Stegosystems – Asymptotic Laws and Near-Optimal Practical Constructions,” Ph.D. dissertation, Binghamton University, 2010. PDF
T. Filler, J. Judas, and J. Fridrich, “Minimizing Embedding Impact in Steganography Using Trellis-Coded Quantization,” Proceedings of SPIE, vol. 7541, Media Forensics and Security II, 2010. PDF
T. Filler, J. Judas, and J. Fridrich, “Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011. IEEE Xplore
V. Holub, J. Fridrich, and T. Denemark, “Universal Distortion Function for Steganography in an Arbitrary Domain,” EURASIP Journal on Information Security, vol. 2014, no. 1, 2014. Springer
M. Butora and A. D. Ker, “Reducing Coding Loss with Irregular Syndrome Trellis Codes,” Electronic Imaging, 2019. Mathematical Institute, Oxford
B. Lorch, “Off-by-One Implementation Error in J-UNIWARD,” arXiv preprint, arXiv:2305.19776, 2023. arXiv

Reference Implementations

Binghamton University, “Syndrome-Trellis Codes Toolbox.” dde.binghamton.edu/download/syndrome/
University of Innsbruck, “conseal: Simulators of Image Steganography Methods.” GitHub
D. Lerch, “stegolab STC.py.” GitHub

This post is part of the Phasm Research Paper series, covering the algorithms and engineering behind the Phasm steganography app. Previous posts in this series cover adaptive steganography detection benchmarks, JPEG recompression survival, how 15 platforms process your photos, and building a pure Rust JPEG coefficient codec.