Abstract
Steganographic payloads embedded in JPEG images must survive a noisy channel – JPEG recompression – that introduces bit error rates (BER) ranging from 0.5% (same-QF, same-library) to 19% (aggressive cross-platform recompression such as WhatsApp). Standard Reed-Solomon codes alone cannot correct beyond approximately 6% symbol error rate for practical codeword sizes. This paper presents a concatenated coding architecture that bridges the gap: an inner repetition code with LLR-weighted soft majority voting reduces the effective BER from 19% to below 1%, after which an outer adaptive Reed-Solomon code corrects residual errors to zero. We derive the LLR formulas for both STDM (Spread Transform Dither Modulation) and BA-QIM (Block Average Quantization Index Modulation) extraction, analyze the post-voting BER as a function of raw channel BER and repetition factor, describe the adaptive parity tier selection and brute-force decode search, and validate the architecture against real WhatsApp round-trip experiments. The system operates without channel state information – the decoder does not know the repetition factor, RS parity, or even whether the image has been recompressed – and recovers both the coding parameters and the message through exhaustive candidate search.
1. The Steganographic Channel
1.1 JPEG Recompression as a Noisy Channel
When a JPEG image is opened and re-saved – whether by a photo editor, a messaging platform, or an operating system’s image pipeline – it passes through a pixel-domain round-trip:
$$\text{Entropy decode} \to \text{Dequantize} \to \text{IDCT} \to \text{Pixel clamp} \to \text{DCT} \to \text{Requantize} \to \text{Entropy encode}$$
The pixel clamping step (float-to-uint8 conversion) introduces rounding errors that propagate back into the DCT domain, flipping embedded bits. This process can be modeled as a binary symmetric channel (BSC) with crossover probability $p$, where $p$ – the raw BER – depends on the quality factor drop, the JPEG library implementation, and the embedding domain.
For a comprehensive treatment of the recompression pipeline and its impact on DCT coefficients, see our companion post on Armor mode’s recompression survival architecture.
1.2 Measured BER Values
Experimental measurements across three real JPEG encoders (sips/AppleJPEG, libjpeg-turbo, MozJPEG) and multiple test images establish the following raw BER values for the two embedding domains used in practice:
STDM (AC coefficient projection):
| Scenario | Raw BER |
|---|---|
| Same QF, same library | < 0.5% |
| Same QF, cross-library (delta=8x) | 2.5–2.7% |
| Mild QF drop (85 to 75) | 1–5% |
| Moderate QF drop (85 to 65) | 5–12% |
| Aggressive QF drop (85 to 50) | 12–22% |
BA-QIM (DC block average, QIM step=12):
| Scenario | Raw BER |
|---|---|
| Same QF, same library | 0% |
| Cross-library, QF 80 | < 1% |
| Cross-library, QF 53 (WeChat) | ~5% |
| WhatsApp standard (cross-platform) | ~17–19% |
The WhatsApp channel is the critical design point: at 19% BER, a naive RS(255, 191) code – which corrects up to $t = 32$ symbol errors out of 255 symbols, tolerating a symbol error rate of $32/255 \approx 12.5\%$ – is overwhelmed. Even under the optimistic assumption that bit errors distribute uniformly across symbols, a 19% bit-level BER translates to approximately $(1 - (1 - 0.19)^8) \approx 81\%$ of symbols containing at least one bit error, far exceeding any practical RS correction capability.
1.3 The Solution: Concatenated Inner + Outer Coding
The coding theory literature offers a well-known solution to this gap: concatenated codes, first proposed by Forney (1966). The architecture is:
- Inner code (repetition + soft majority voting): Reduces BER from 19% to below 1% by exploiting redundant copies and soft extraction confidence.
- Outer code (Reed-Solomon with adaptive parity): Corrects residual symbol errors from the inner code’s output, achieving zero residual errors.
The key innovation in our implementation is that the inner code operates in the continuous-valued domain rather than the binary domain – the decoder has access to log-likelihood ratios (LLRs) from the extraction process, not just hard bit decisions. This soft information provides 1–2 dB of effective coding gain over hard majority voting, which is the difference between success and failure at the 19% BER operating point.
2. Log-Likelihood Ratios from QIM Extraction
2.1 Hard Decision vs. Soft Decision
In a hard-decision decoder, the extractor produces a binary estimate $\hat{b} \in \{0, 1\}$ for each embedded bit and discards all information about confidence. In a soft-decision decoder, the extractor produces a real-valued log-likelihood ratio (LLR):
$$\Lambda = \ln \frac{P(b = 0 \mid \text{observation})}{P(b = 1 \mid \text{observation})}$$
The sign of $\Lambda$ gives the bit estimate ($\Lambda > 0 \Rightarrow \hat{b} = 0$, $\Lambda < 0 \Rightarrow \hat{b} = 1$), while the magnitude $|\Lambda|$ quantifies confidence. A large $|\Lambda|$ means the observation strongly favors one hypothesis; a small $|\Lambda|$ means the bit is ambiguous.
For the BSC with crossover probability $p$, an observation $\hat{b}$ has LLR:
$$\Lambda_{\text{BSC}} = (1 - 2\hat{b}) \cdot \ln\frac{1 - p}{p}$$
But when the extractor operates in a continuous domain – measuring distances rather than making binary decisions – the LLR carries much richer information than this binary model suggests.
2.2 LLR from BA-QIM Extraction
In Block Average QIM, each 8x8 block carries one bit embedded by quantizing the block’s average brightness to one of two interleaved grids:
- Bit 0 grid: $\{\ldots, -\Delta, 0, \Delta, 2\Delta, \ldots\}$
- Bit 1 grid: $\{\ldots, -\Delta/2, \Delta/2, 3\Delta/2, \ldots\}$
where $\Delta$ is the QIM step size (12 pixel-level units in the production system, adapted per-block by Watson perceptual masking factors). For background on Watson masking and its role in optimizing QIM step sizes, see our post on Watson perceptual masking for QIM steganography.
Given a received block average $\bar{x}$ (after recompression), the soft extractor computes:
$$d_0 = \left|\bar{x} - Q_\Delta(\bar{x})\right|, \quad d_1 = \left|\bar{x} - Q_{\Delta,1}(\bar{x})\right|$$
where $Q_\Delta(\bar{x}) = \mathrm{round}(\bar{x}/\Delta) \cdot \Delta$ is the nearest bit-0 grid point and $Q_{\Delta,1}(\bar{x}) = \mathrm{round}((\bar{x} - \Delta/2)/\Delta) \cdot \Delta + \Delta/2$ is the nearest bit-1 grid point.
The LLR is then:
$$\Lambda_{\text{QIM}} = d_1 - d_0$$
This is positive when the observation is closer to the bit-0 lattice (favoring bit 0) and negative when closer to the bit-1 lattice (favoring bit 1). The maximum magnitude is $\Delta/2$ (when the observation falls exactly on a lattice point), and the minimum is 0 (when the observation falls exactly on the decision boundary midway between lattices).
Under Gaussian noise with variance $\sigma^2$, this distance-based LLR is proportional to the true posterior LLR:
$$\Lambda \propto \frac{\Delta}{2\sigma^2}(d_1 - d_0)$$
In practice, we omit the scaling constant because only the sign and relative magnitudes matter for soft majority voting.
2.3 LLR from STDM Extraction
STDM embeds a bit by quantizing the projection of a coefficient vector $\mathbf{x} = (x_1, \ldots, x_L)$ onto a unit-norm spreading vector $\mathbf{u}$:
$$p = \mathbf{x} \cdot \mathbf{u} = \sum_{i=1}^{L} x_i u_i$$
The projection $p$ is then quantized to one of two lattices:
- Bit 0 lattice ($Q_0$): $\{n \cdot \Delta : n \in \mathbb{Z}\}$
- Bit 1 lattice ($Q_1$): $\{(n + \tfrac{1}{2}) \cdot \Delta : n \in \mathbb{Z}\}$
On extraction, the received projection $p'$ yields:
$$d_0 = |p' - \mathrm{round}(p'/\Delta) \cdot \Delta|, \quad d_1 = |p' - (\mathrm{round}(p'/\Delta - \tfrac{1}{2}) + \tfrac{1}{2}) \cdot \Delta|$$
$$\Lambda_{\text{STDM}} = d_1 - d_0$$
The geometric interpretation is identical to the QIM case: the LLR measures the signed distance between the two nearest lattice points, with the sign indicating which lattice is closer. The spreading mechanism averages noise across $L$ coefficients, reducing the effective noise variance by approximately $1/L$ and increasing the expected $|\Lambda|$ by $\sqrt{L}$ relative to scalar QIM. For $L = 8$ (the production spreading length), this provides a factor of $\sqrt{8} \approx 2.83$ improvement in LLR magnitude.
2.4 Reference LLR and Signal Strength
For a pristine (unrecompressed) embedding, the expected LLR magnitude at each extraction unit is:
$$|\Lambda_{\text{ref}}| = \frac{\Delta}{2}$$
This is the maximum possible confidence – the observation falls exactly on the embedded lattice point, at distance $\Delta/2$ from the opposite lattice. After recompression, the observation shifts by some noise $\eta$, reducing $|\Lambda|$. The ratio:
$$S = \frac{\overline{|\Lambda|}}{\Lambda_{\text{ref}}}$$
where $\overline{|\Lambda|}$ is the average LLR magnitude across all extraction units, serves as a normalized signal strength metric. For pristine images $S \approx 1.0$; for heavily recompressed images $S$ may drop to 0.3–0.5.
3. Repetition Coding with Soft Majority Voting
3.1 Sequential Layout
The RS-encoded bitstream of length $n_{\text{RS}}$ bits is repeated $r$ times across $N$ available embedding positions. The layout is sequential: copy $j$ ($0 \leq j < r$) of RS bit $i$ ($0 \leq i < n_{\text{RS}}$) is placed at embedding position:
$$\text{pos}(i, j) = j \cdot n_{\text{RS}} + i$$
This means each copy forms a contiguous block in the embedding stream. The repetition factor is:
$$r = \left\lfloor \frac{N}{n_{\text{RS}}} \right\rfloor$$
forced to an odd value (for clean majority voting) and capped at 255 (to fit in a single byte). For the Fortress sub-mode, the minimum repetition factor is $r_{\min} = 15$.
3.2 Hard Majority Voting
In hard majority voting, each copy provides a binary estimate $\hat{b}_{i,j} \in \{0, 1\}$, and the final decision is the majority:
$$\hat{b}_i = \begin{cases} 0 & \text{if } \sum_{j=0}^{r-1} \hat{b}_{i,j} < r/2 \\ 1 & \text{otherwise} \end{cases}$$
For a BSC with crossover probability $p$, each copy is independently correct with probability $1 - p$. The post-voting error probability for odd $r$ is:
$$P_{\text{hard}}(r, p) = \sum_{k=\lceil r/2 \rceil}^{r} \binom{r}{k} p^k (1 - p)^{r - k}$$
This is the tail probability of a binomial distribution. It decreases exponentially in $r$ when $p < 0.5$, but the rate of decrease depends critically on how far $p$ is from 0.5.
3.3 Soft Majority Voting: Summing LLRs
In soft majority voting, we sum the LLRs across all copies:
$$\Lambda_i^{(\text{voted})} = \sum_{j=0}^{r-1} \Lambda_{i,j}$$
The final bit decision is:
$$\hat{b}_i = \begin{cases} 0 & \text{if } \Lambda_i^{(\text{voted})} \geq 0 \\ 1 & \text{otherwise} \end{cases}$$
This is the optimal combining rule for independent observations under any noise model where the LLR is a sufficient statistic (which includes Gaussian noise and, approximately, the quantization noise from JPEG recompression).
The advantage over hard voting is that uncertain copies – those with small $|\Lambda_{i,j}|$ – contribute less to the sum, while confident copies dominate the decision. A single highly confident copy can outweigh several ambiguous ones. In hard voting, every copy counts equally regardless of confidence, so a barely-correct decision and a strongly-correct decision carry the same weight.
3.4 Why Soft Beats Hard: An Illustrative Example
Consider $r = 5$ copies of a bit where the true value is 0:
| Copy $j$ | Hard decision $\hat{b}$ | LLR $\Lambda$ |
|---|---|---|
| 0 | 0 (correct) | +4.5 |
| 1 | 1 (error) | -0.3 |
| 2 | 0 (correct) | +3.8 |
| 3 | 1 (error) | -0.2 |
| 4 | 0 (correct) | +2.1 |
Hard voting: 3 correct out of 5 – majority is 0 (correct). But if one more copy had flipped, it would fail.
Soft voting: $\Lambda^{(\text{voted})} = 4.5 - 0.3 + 3.8 - 0.2 + 2.1 = +9.9$. The sum is strongly positive because the two errors had tiny LLR magnitudes (the decoder was already “unsure” about those copies). The system would need far more – and more confident – errors to flip the sign.
This resilience to low-confidence errors is precisely what matters in JPEG recompression, where the noise is spatially non-uniform: some blocks shift significantly (near quantization bin boundaries) while others barely change.
3.5 Post-Voting BER Analysis
For hard majority voting on a BSC($p$), the post-voting BER is given by the incomplete regularized beta function:
$$P_{\text{hard}}(r, p) = I_p\!\left(\left\lceil \frac{r}{2} \right\rceil, r - \left\lceil \frac{r}{2} \right\rceil + 1\right)$$
For soft voting under Gaussian noise $\mathcal{N}(0, \sigma^2)$ with QIM step $\Delta$, the summed LLR $\Lambda^{(\text{voted})} \sim \mathcal{N}\!\left(r \cdot \frac{\Delta}{2\sigma^2} \cdot (\Delta - 2d_{\text{avg}}),\ r \cdot \text{Var}(\Lambda)\right)$ leads to a post-voting error probability:
$$P_{\text{soft}}(r, p) \approx Q\!\left(\sqrt{r} \cdot \frac{\mu_\Lambda}{\sigma_\Lambda}\right)$$
where $Q(\cdot)$ is the complementary Gaussian CDF, $\mu_\Lambda = \mathbb{E}[\Lambda]$ is the mean LLR (positive for correct bit), and $\sigma_\Lambda = \sqrt{\text{Var}(\Lambda)}$.
The following table compares post-voting BER for hard and soft voting at various $(p, r)$ combinations, computed from the binomial distribution (hard) and derived from the Gaussian LLR model (soft):
| Raw BER $p$ | $r$ | Hard post-voting BER | Soft post-voting BER (approx.) |
|---|---|---|---|
| 5% | 3 | $7.2 \times 10^{-3}$ | $3.1 \times 10^{-3}$ |
| 5% | 7 | $1.9 \times 10^{-4}$ | $8.7 \times 10^{-6}$ |
| 5% | 15 | $1.8 \times 10^{-7}$ | $< 10^{-12}$ |
| 10% | 7 | $2.7 \times 10^{-3}$ | $8.4 \times 10^{-4}$ |
| 10% | 15 | $3.4 \times 10^{-5}$ | $3.6 \times 10^{-6}$ |
| 10% | 31 | $6.9 \times 10^{-9}$ | $< 10^{-12}$ |
| 15% | 15 | $6.1 \times 10^{-4}$ | $3.5 \times 10^{-4}$ |
| 15% | 31 | $2.0 \times 10^{-6}$ | $6.1 \times 10^{-7}$ |
| 15% | 61 | $6.1 \times 10^{-11}$ | $< 10^{-14}$ |
| 19% | 15 | $3.0 \times 10^{-3}$ | $8.5 \times 10^{-4}$ |
| 19% | 31 | $4.6 \times 10^{-5}$ | $3.8 \times 10^{-6}$ |
| 19% | 61 | $2.3 \times 10^{-8}$ | $< 10^{-10}$ |
The critical row is $p = 0.19$, $r = 15$ (the Fortress minimum): hard voting yields 0.3% post-voting BER – marginal for RS alone at short block lengths, but soft voting reduces this further to below 0.1%, providing comfortable margin for the outer RS code. At $r = 61$ (the adaptive cap for Fortress), even hard voting achieves $2.3 \times 10^{-8}$, and soft voting is effectively error-free.
3.6 Capacity Cost of Repetition
Repetition coding trades capacity for reliability at rate $R_{\text{inner}} = 1/r$. For a message of $m$ plaintext bytes with frame overhead $F$ bytes and RS parity $P$ symbols per 255-symbol block:
$$N_{\text{required}} = r \cdot 8 \cdot \left\lceil \frac{m + F}{255 - P} \right\rceil \cdot 255$$
For the Fortress sub-mode (BA-QIM, 1 bit per 8x8 block), a 1200x1600 image provides approximately 30,000 payload blocks (after subtracting the 56-block magic header). With RS parity $P = 64$ and frame overhead $F = 50$ bytes:
$$r = \left\lfloor \frac{30\text{,}000}{8 \cdot (50 + 64)} \right\rfloor = \left\lfloor \frac{30\text{,}000}{912} \right\rfloor = 32$$
This provides $r = 31$ (forced odd), which handles 19% BER with post-voting BER below $10^{-4}$. The capacity is limited to approximately 50 plaintext bytes – but for a passphrase, a short URL, or a brief instruction, this suffices.
4. The Concatenated Code Architecture
4.1 Architecture Overview
The concatenated code operates in two layers:
graph LR
subgraph "Encode Pipeline"
A["Plaintext"] --> B["Encrypt
(AES-256-GCM-SIV)"]
B --> C["Frame
(length + salt + nonce + CRC)"]
C --> D["RS Encode
(adaptive parity)"]
D --> E["Repetition Encode
(r copies, sequential)"]
E --> F["QIM / STDM Embed
(into JPEG DCT)"]
end
subgraph "Decode Pipeline"
G["Stego JPEG"] --> H["QIM / STDM Extract
(soft: LLR per unit)"]
H --> I["Soft Majority Vote
(sum LLRs, sign decides)"]
I --> J["RS Decode
(Berlekamp-Massey)"]
J --> K["Parse Frame + Decrypt"]
K --> L["Plaintext"]
end
The inner code (repetition + soft voting) operates in the continuous-valued LLR domain. The outer code (Reed-Solomon) operates in the symbol domain on the hard-decision output of the inner decoder. This separation is suboptimal compared to a fully soft outer decoder (which would pass LLR information to the RS decoder), but it is far simpler to implement and sufficient for the operating points of interest.
4.2 Outer Code: Adaptive Reed-Solomon
The outer code is RS($n$, $k$) over GF($2^8$), with adaptive parity selected from four tiers:
| Parity Tier $P$ | Code Parameters | Correction Capacity $t$ | Code Rate $k/n$ | Max Symbol Error Rate |
|---|---|---|---|---|
| 64 | RS(255, 191) | 32 | 0.749 | 12.5% |
| 128 | RS(255, 127) | 64 | 0.498 | 25.1% |
| 192 | RS(255, 63) | 96 | 0.247 | 37.6% |
| 240 | RS(255, 15) | 120 | 0.059 | 47.1% |
These four tiers – 64, 128, 192, and 240 parity symbols – are the only values the decoder needs to search, limiting the brute-force space. The encoder selects the smallest parity tier where the resulting RS-encoded bitstream, repeated $r \geq r_{\min}$ times, fits within the available embedding capacity.
For shortened codes (payload smaller than $k$ bytes), the RS block is zero-padded at the front to form a virtual full-length block during syndrome computation, then the padding is removed. This preserves the full error correction capability of the code for short messages.
4.3 Parameter Selection: Binary Search
On the encode side, the optimal (parity, $r$) pair is selected by iterating through parity tiers in ascending order and performing a binary search for the maximum frame size that fits:
For each parity tier $P$:
- Compute the RS-encoded length: $n_{\text{RS}} = \lceil \text{frame\_len} / (255 - P) \rceil \cdot 255$
- Compute $r = \lfloor N / (8 \cdot n_{\text{RS}}) \rfloor$, forced odd
- Accept if $r \geq r_{\min}$ (15 for Fortress, 3 for STDM)
The first tier that satisfies the minimum repetition constraint is chosen, giving the lowest parity (highest code rate) that still provides sufficient redundancy.
4.4 Brute-Force Decode: Trying All Candidates
A critical design decision is that no coding parameters are transmitted in the embedding. The repetition factor $r$ and parity tier $P$ are not encoded in the stego image – they are recovered by exhaustive search on decode.
For each parity tier $P \in \{64, 128, 192, 240\}$, the decoder:
- Computes all distinct $r$ values that could arise from valid frame lengths
- For each $(P, r)$ pair, extracts LLRs, performs soft majority voting to produce $n_{\text{RS}} = N / r$ RS-encoded bits
- Converts bits to bytes and attempts RS decode with parity $P$
- If RS succeeds, parses the frame, verifies CRC, decrypts, and returns the message
In the Fortress sub-mode, the adaptive QIM step depends on $r$ (via the adaptive_params function that interpolates between step 12 at $r = 15$ and step 6.5 at $r = 61$), so the LLR extraction itself varies per candidate. The decoder pre-computes block averages once and reuses them across candidates, making each trial computationally cheap (a sum over $r$ LLR values per bit position, followed by byte-level RS decode).
The total candidate count is typically 20–40 across all four parity tiers. With Rayon parallelism, the decoder evaluates candidates concurrently and returns the first successful match.
4.5 Why No Parameter Header?
Transmitting $r$ and $P$ in a header would seem simpler, but it introduces a fundamental fragility: the header bits themselves are subject to the same channel noise as the payload. A header corrupted by recompression would cause the decoder to attempt recovery with wrong parameters, guaranteeing failure.
The brute-force approach eliminates this single point of failure. The decoder tries all plausible parameter combinations, relying on the RS decode’s syndrome check and the frame’s CRC-32 to validate correct recovery. The computational cost is modest – tens of RS decode attempts, each taking microseconds – compared to the LLR extraction itself (which dominates decode time).
This “headerless” design is particularly important for the Fortress sub-mode, where the QIM step adapts based on $r$. If $r$ were transmitted and misread, the decoder would extract LLRs with the wrong step size, corrupting every bit position.
5. The LLR-Based Integrity Score
5.1 Beyond Binary Success/Failure
Traditional error correction provides binary feedback: either the RS decode succeeds (syndromes are zero after correction) or it fails. But for a user-facing application, this binary outcome is insufficient – a pristine image and a barely-recovered image both report “success,” but the latter is operating at the edge of its correction margin and may fail under slightly different conditions.
The LLR-based integrity score provides a continuous 0–100% confidence metric by combining two independent signals:
- Signal strength (70% weight): How strong is the embedded signal relative to the pristine reference?
- RS error margin (30% weight): How much of the RS correction capacity was consumed?
5.2 Signal Strength Component
The signal strength is computed from the average absolute LLR per copy per bit position during soft majority voting. For each bit position $i$:
$$S_i = \frac{|\Lambda_i^{(\text{voted})}|}{r}$$
This normalizes out the repetition factor, so the metric is comparable across different $r$ values. The average across all bit positions gives:
$$\bar{S} = \frac{1}{n_{\text{RS}}} \sum_{i=0}^{n_{\text{RS}}-1} S_i$$
The signal score is then:
$$\text{score}_{\text{signal}} = \min\!\left(\frac{\bar{S}}{\Lambda_{\text{ref}}},\; 1.0\right)$$
where $\Lambda_{\text{ref}} = \Delta/2$ is the expected LLR magnitude for pristine extraction ($\Delta$ is the QIM step for Fortress, or the STDM delta for standard Armor). For pristine images, $\bar{S} \approx \Lambda_{\text{ref}}$ and the score approaches 1.0. For recompressed images, $\bar{S}$ drops as noise reduces LLR magnitudes.
5.3 RS Error Margin Component
The RS error margin measures how much of the outer code’s correction budget was consumed:
$$\text{score}_{\text{RS}} = 1 - \frac{e_{\text{corrected}}}{e_{\text{capacity}}}$$
where $e_{\text{corrected}}$ is the total number of RS symbol errors corrected across all blocks and $e_{\text{capacity}} = t \cdot B$ is the maximum correctable errors ($t = P/2$ per block, $B$ blocks total).
When repetition coding absorbs all channel damage (as it typically does at the $r$ values used in practice), $e_{\text{corrected}} = 0$ and $\text{score}_{\text{RS}} = 1.0$. This is why the signal strength component receives 70% weight – it provides meaningful differentiation even when the RS score is uninformative.
5.4 Combined Integrity
$$\text{integrity} = 0.7 \cdot \text{score}_{\text{signal}} + 0.3 \cdot \text{score}_{\text{RS}}$$
Scaled to 0–100% for display. Empirical ranges:
| Condition | Integrity |
|---|---|
| Pristine (no recompression) | 95–100% |
| Mild recompression (same QF, cross-library) | 75–90% |
| WhatsApp standard (QF ~60–70, resize) | 55–75% |
| Severely degraded (barely recoverable) | 30–50% |
6. Comparison with Advanced Coding Schemes
6.1 Why Not LDPC, Turbo, or Polar Codes?
The coding theory literature offers codes far more powerful than repetition: LDPC codes approach channel capacity within a fraction of a dB, turbo codes achieve similar performance with iterative decoding, and polar codes are provably capacity-achieving. Why use the seemingly primitive repetition code as the inner layer?
The answer lies in the specific constraints of steganographic decoding:
-
No channel state information. The decoder does not know the channel BER, the repetition factor, or even whether recompression occurred. Advanced codes require at least approximate knowledge of the channel SNR for their iterative decoders to converge.
-
Variable rate. The repetition factor $r$ varies from 3 to 255 depending on the message-to-capacity ratio. Designing and storing LDPC parity-check matrices for hundreds of different rates is impractical.
-
Brute-force parameter search. The decoder tries dozens of $(P, r)$ candidates. Each trial requires decoding the inner code – if this takes milliseconds (as for repetition), the search completes in tens of milliseconds; if it takes seconds (as for iterative LDPC), the search becomes prohibitive.
-
Sufficient performance. At the operating points of interest ($p \leq 0.19$, $r \geq 15$), repetition + soft voting already achieves post-voting BER below $10^{-3}$, which the outer RS code handles easily. There is no gap to close.
6.2 Comparison Table
| Coding Scheme | Inner Code | Outer Code | Decoding | Max Correctable BER | Requires CSI? | Steganographic Use |
|---|---|---|---|---|---|---|
| This work | Repetition ($r \leq 255$) + soft voting | RS (adaptive, 4 tiers) | LLR sum + Berlekamp-Massey | ~25% (at $r = 61$) | No | Designed for stego |
| Concatenated RS | RS inner | RS outer | Hard algebraic | ~15% | No | Rare in practice |
| Turbo code + RS | Turbo inner | RS outer | BCJR iterative + BM | ~30% | Yes (SNR) | Research only |
| LDPC + RS | LDPC inner | RS outer | Belief propagation + BM | ~30% | Yes (SNR) | Research only |
| Steganographic Polar Codes (Butora 2024) | Polar code | None (integrated) | Successive cancellation | Matched-QF only | Yes (QF matched) | Errorless stego |
| Convolutional + RS | Conv code ($k=7$) | RS outer | Viterbi + BM | ~20% | Partial | Classic telecom |
Steganographic Polar Codes (SPC), proposed by Butora et al. (2024), represent the state of the art for matched-QF recompression, achieving truly errorless embedding when the target QF is known. However, SPC requires exact knowledge of the target quantization tables – it is fundamentally a matched-channel technique. Our concatenated architecture, by contrast, is designed for the mismatched case (unknown encoder, unknown QF, unknown library) and recovers from channels that SPC cannot address.
For a comparison of Ghost mode’s alternative coding scheme – syndrome-trellis codes (STCs) optimized for minimal distortion rather than error correction – see our post on practical STC implementation.
6.3 Convolutional Inner Code Alternative
A natural alternative to repetition coding would be a convolutional code with soft Viterbi decoding as the inner code. A rate-$1/r$ convolutional code with constraint length $k = 7$ can achieve 2–3 dB better coding gain than repetition at the same rate. However:
- Variable rate: Convolutional codes at rate $1/r$ for $r \in \{3, 5, 7, \ldots, 255\}$ require puncturing tables for each rate.
- Trellis memory: The Viterbi decoder requires $2^{k-1} = 64$ states, which is manageable but adds complexity to the brute-force search.
- Marginal gain: At $p = 0.19$ and $r = 15$, repetition + soft voting already achieves post-voting BER below $10^{-3}$, which RS(255, 63) corrects to zero. The convolutional code would reduce this to $\sim 10^{-5}$, but the RS outer code already handles $10^{-3}$ comfortably. The 2 dB gain translates to requiring $r = 11$ instead of $r = 15$ for the same performance – a modest capacity improvement that does not justify the implementation complexity.
7. Empirical Validation: Real WhatsApp Round-Trips
7.1 Experimental Setup
The concatenated code architecture was validated through end-to-end WhatsApp round-trip tests, sending stego images through actual WhatsApp iOS and measuring decode success after the platform’s full processing pipeline.
Test procedure: 1. Capture a photo with iPhone camera (4032x3024, HEIC) 2. Convert and pre-size to 1200x1600 JPEG for WhatsApp standard 3. Encode a short message using Fortress (BA-QIM) 4. Send the stego JPEG through WhatsApp iOS to self 5. Save the received image 6. Decode with Phasm
7.2 Test 1: WhatsApp Standard with QT DC Change from 3 to 8
| Stage | File | Size | Dimensions | Format | QT DC |
|---|---|---|---|---|---|
| Original photo | IMG_3520.HEIC | 1,811,334 | 4032x3024 | HEIC | – |
| Encoded stego | IMG_3523.JPG | 400,131 | 1200x1600 | SOF0 Baseline | 3 |
| After WhatsApp | IMG_3524.JPG | 265,658 | 1200x1600 | SOF2 Progressive | 8 |
WhatsApp transformed the image from baseline to progressive JPEG and changed the luminance DC quantization step from 3 to 8 – a factor of 2.67x increase. All 64 quantization table positions changed. The BA-QIM embedding survived because block-average brightness is invariant to quantization table changes (the dequantized DC value is preserved to within < 2 pixel levels regardless of the QT).
Decode result: Success. Message recovered correctly with Fortress sub-mode.
7.3 Test 2: WhatsApp Standard with QT DC Change from 8 to 6
| Stage | File | Size | Dimensions | Format | QT DC |
|---|---|---|---|---|---|
| Original photo | IMG_3554.HEIC | 1,806,984 | 4032x3024 | HEIC | – |
| Encoded stego | IMG_3555.JPG | 285,101 | 1200x1600 | SOF0 Baseline | 8 |
| After WhatsApp | IMG_3556.JPG | 279,683 | 1200x1600 | SOF0 Baseline | 6 |
In this test, WhatsApp preserved baseline JPEG format but still changed all quantization table values. The QT DC step decreased from 8 to 6, and the image size decreased by only 2% (compared to 34% in Test 1).
Decode result: Success. Message recovered correctly with Fortress sub-mode.
7.4 Key Observations
-
WhatsApp’s encoder is non-deterministic. The same sending workflow produced progressive JPEG (SOF2) in Test 1 and baseline JPEG (SOF0) in Test 2, with different quantization tables. The concatenated code’s brute-force decode search handles this transparently – it does not depend on knowing the encoder or QT.
-
No dimension changes. Pre-sizing the image to 1200x1600 prevented WhatsApp from resizing, preserving the 8x8 block grid. This is critical: resizing destroys block alignment and would require a fundamentally different synchronization mechanism. For a survey of how 15 major platforms process images, see How 15 Platforms Process Your Photos.
-
BA-QIM is QT-invariant. The block-average brightness embedding domain eliminates the quantization table matching problem that plagues AC-coefficient methods. The decoder does not need to know – or guess – the encoder’s quantization tables.
-
Adaptive Watson masking survived. The per-block QIM step adaptation (based on AC energy ratios) remained consistent across recompression because the computation uses only AC coefficients with $|c| \geq 2$, which are stable under recompression. The Watson factors for each block remained aligned between encode and decode despite the complete QT change.
7.5 Invariant Foundation
The success of the concatenated code architecture rests on the invariant properties of block-average brightness discovered during systematic experimentation:
- Maximum brightness shift < 2 pixel levels across all tested encoders, quality factors (including QF 53), and images
- Zero ordering flips in relative block brightness across 69,606 tested adjacent pairs
- QIM step of 12 provides a 4-level margin against a maximum observed shift of 1.875 levels
These invariants are the physical foundation that makes the 19% raw BER a reliable 19% – the errors are distributed according to a predictable noise model, not adversarially placed. This is why the Gaussian LLR approximation works well in practice.
For a deeper analysis of the three JPEG recompression invariants (block-average brightness, inter-block ordering, and coefficient sign stability) that underpin the Fortress embedding domain, see our post on JPEG recompression invariants.
8. Conclusion
The concatenated code architecture presented here solves a practical problem: bridging the gap between the 19% raw BER of the WhatsApp steganographic channel and the zero residual errors required for reliable message recovery. The solution combines three elements:
-
Soft extraction: LLR computation from QIM/STDM distances provides continuous-valued confidence information, not just binary bit estimates.
-
Inner repetition code with soft majority voting: Summing LLRs across $r$ copies reduces effective BER exponentially, from 19% to below $10^{-3}$ at $r = 15$ and below $10^{-7}$ at $r = 61$.
-
Outer adaptive Reed-Solomon code: Four parity tiers (64, 128, 192, 240 symbols) cover the range from mild recompression (where minimal ECC suffices) to aggressive cross-platform channels (where maximum parity is needed).
The architecture operates without any channel state information – the decoder discovers the coding parameters through exhaustive search, validated by RS syndrome checks and CRC integrity verification. When the image has been geometrically transformed (rotated, scaled, or cropped), a separate DFT template-based recovery stage estimates and inverts the transform before applying the concatenated decode pipeline described here. This “headerless” design eliminates the single point of failure that would arise from transmitting fragile side information through the same noisy channel.
The LLR-based integrity score extends the system’s utility beyond binary success/failure, providing a continuous quality metric that differentiates pristine images from recompressed ones even when error correction succeeds completely. The 70/30 weighting of signal strength to RS margin ensures meaningful feedback when repetition coding absorbs all damage (the common case at high $r$).
Real-world validation through WhatsApp round-trips confirms that the system handles the non-determinism of platform image processing – variable JPEG formats, unpredictable quantization tables, and inconsistent compression ratios – without any platform-specific tuning. The combination of QT-invariant BA-QIM embedding, soft LLR extraction, and concatenated coding transforms a hostile channel into a reliable one.
For practitioners considering more sophisticated inner codes (LDPC, turbo, polar), the key insight is that the brute-force decode search – necessary because coding parameters are unknown – imposes a hard constraint on inner decoder complexity. Repetition + soft voting is fast enough to support dozens of trials per decode; iterative codes are not. At the operating points where steganographic repetition factors are practical ($r \geq 15$), this simple inner code is already sufficient, and the complexity budget is better spent on expanding the outer code’s search space.
References
-
Forney, G. D. (1966). Concatenated Codes. MIT Press.
-
Chen, B., and Wornell, G. W. (2001). “Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding.” IEEE Trans. Information Theory, 47(4), 1423–1443.
-
Comesana, P., and Perez-Gonzalez, F. (2006). “On the capacity of stego-systems.” Proc. ACM Workshop on Multimedia and Security, 15–24.
-
Butora, J., and Fridrich, J. (2023). “Errorless Robust JPEG Steganography using Outputs of JPEG Coders.” IEEE Trans. Dependable and Secure Computing.
-
Butora, J., et al. (2024). “Errorless Robust JPEG Steganography Using Steganographic Polar Codes.” EURASIP J. Information Security.
-
Watson, A. B. (1993). “DCTune: A Technique for Visual Optimization of DCT Quantization Matrices for Individual Images.” Society for Information Display Digest, 24, 946–949.
-
Filler, T., Judas, J., and Fridrich, J. (2011). “Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes.” IEEE Trans. Information Forensics and Security, 6(3), 920–935.
-
Reed, I. S., and Solomon, G. (1960). “Polynomial Codes over Certain Finite Fields.” J. Society for Industrial and Applied Mathematics, 8(2), 300–304.
-
Berlekamp, E. R. (1968). Algebraic Coding Theory. McGraw-Hill.
-
Proakis, J. G. (2001). Digital Communications. McGraw-Hill, 4th edition.