Abstract

JPEG recompression – triggered by every re-save, platform transit, and format conversion – can destroy steganographic payloads embedded in DCT coefficients. This paper presents a quantitative analysis of DCT-domain robust steganography designed to survive re-quantization. We examine a layered architecture combining Spread Transform Dither Modulation (STDM) for data embedding, coefficient stability selection, and Reed-Solomon error correction coding (ECC). Across same-quality-factor, cross-quality-factor, and cross-library recompression scenarios, we report survival rates exceeding 99%, 95%, and 95% respectively. We derive capacity-robustness tradeoff curves for multiple sub-mode configurations – including the Fortress sub-mode, which achieves end-to-end survival through WhatsApp standard recompression for short messages – and discuss the fundamental limits imposed by pixel-domain round-trips and image resizing. We also describe the DFT template embedding system that provides geometric resilience against rotation, scaling, and moderate cropping. Our results demonstrate that carefully engineered DCT-domain steganography achieves high reliability for JPEG-to-JPEG channels, with the Fortress sub-mode extending feasibility to several social media platforms previously considered infeasible.

1. Background

1.1 The Recompression Problem

JPEG compression is a lossy process. An image compressed at quality factor (QF) 85 and then recompressed at QF 85 is not bitwise identical to the original – the re-quantization of DCT coefficients introduces rounding errors that accumulate with each cycle. For ordinary photography, these errors are imperceptible. For steganography, they are catastrophic.

The standard JPEG pipeline works as follows:

  1. Divide the image into 8x8 pixel blocks
  2. Apply the forward Discrete Cosine Transform (DCT) to each block
  3. Quantize the resulting coefficients by dividing by the quantization matrix and rounding
  4. Entropy-encode the quantized coefficients (Huffman or arithmetic coding)

When a JPEG is opened and re-saved, the decoder reconstructs pixel values, then the encoder re-quantizes with potentially different tables. The pixel-domain round-trip introduces rounding errors that propagate back into the DCT domain. Even at identical quality factors, some coefficients change value – and for steganographic embedding that modifies coefficients by +/-1, each change can flip an embedded bit. Without error correction, a raw BER of even 2-3% corrupts a text message beyond recovery.

1.2 Quantization Index Modulation (QIM)

Quantization Index Modulation, introduced by Chen and Wornell (2001), provides the theoretical foundation for quantization-based data embedding. The core idea is elegant: embed data by quantizing the host signal using one of multiple quantizers, each indexed by the message bit.

In scalar Dither Modulation QIM (DM-QIM), the embedding rule for a single coefficient $x$ carrying message bit $m \in \{0, 1\}$ is:

$$y = Q_\Delta(x - d_m) + d_m$$

where $Q_\Delta(\cdot)$ rounds to the nearest multiple of $\Delta$ (the quantization step size), and $d_m$ is the dither value indexed by message bit $m$. The two dither values are typically:

$$d_0 = -\frac{\Delta}{4}, \quad d_1 = \frac{\Delta}{4}$$

Extraction is blind – it requires only the quantization step $\Delta$ and the dither values, not the original image:

$$\hat{m} = \arg\min_{m \in \{0,1\}} |y' - Q_\Delta(y' - d_m) - d_m|$$

where $y'$ is the received (possibly recompressed) coefficient value.

The key parameter is $\Delta$. Larger values increase robustness (the embedded quantization structure survives larger perturbations) but also increase visual distortion. The relationship between $\Delta$ and the JPEG quantization step $q$ for a given coefficient position determines survival: when $\Delta \geq 2q$, the QIM embedding structure is robust enough that same-QF recompression yields BER below 1% (Nikolaidis and Pitas, 2006).

1.3 Spread Transform Dither Modulation (STDM)

STDM, developed by Comesana and Perez-Gonzalez (2006), extends QIM by projecting a group of $L$ host coefficients onto a pseudo-random spreading direction before quantizing. This spreading mechanism is the key innovation that makes STDM preferable to scalar QIM for robust steganography.

Given a vector of $L$ DCT coefficients $\mathbf{x} = (x_1, x_2, \ldots, x_L)$ and a unit-norm spreading vector $\mathbf{u} = (u_1, u_2, \ldots, u_L)$ where $\|\mathbf{u}\| = 1$, the STDM embedding proceeds as:

  1. Project: Compute the projection $p = \mathbf{x} \cdot \mathbf{u} = \sum_{i=1}^{L} x_i u_i$

  2. Quantize: Apply DM-QIM to the projection:

$$p' = Q_\Delta(p - d_m) + d_m$$

  1. Adjust: Modify the coefficient vector proportionally:

$$\mathbf{y} = \mathbf{x} + (p' - p) \cdot \mathbf{u}$$

Extraction computes $p' = \mathbf{y'} \cdot \mathbf{u}$ on the received coefficients and applies the same DM-QIM decision rule to the scalar projection.

The spreading provides two critical advantages. First, the projection averages out per-coefficient noise from recompression across $L$ coefficients, reducing the effective noise variance by approximately $1/L$. Second, STDM is inherently resistant to amplitude scaling attacks (gain attacks) that destroy scalar QIM – because the spreading vector $\mathbf{u}$ defines a direction, not an absolute magnitude.

Typical spreading lengths range from $L = 8$ to $L = 64$. The tradeoff is capacity: each group of $L$ coefficients carries only one message bit, so the embedding rate is reduced by a factor of $L$ relative to scalar QIM.

1.4 Transport Channel Matching (TCM)

TCM, as formalized in the errorless steganography work of Butora and Fridrich (2023), exploits a fundamental property of iterated JPEG compression: after several rounds of recompression at the same quality factor, most DCT coefficients reach a stable “fixed point” where further recompression no longer changes their values.

The TCM approach to coefficient selection works as follows:

  1. Compress the cover image at each of several target quality factors (e.g., QF 50, 60, 70, 75, 80, 85)
  2. For each QF, recompress 3-5 times and identify coefficients that stabilize
  3. Select only coefficients that are stable across all tested QFs – the conservative robust set
  4. Embed data exclusively in these stable coefficients

This conservative multi-QF approach sacrifices some capacity (only ~70% of non-zero AC coefficients qualify as universally stable) but provides robustness against recompression at any quality factor within the tested range, even when the exact target QF is unknown.

Implementation note: Phasm investigated TCM-style multi-QF pre-analysis during development but found it counterproductive at the embedding rates used in practice. The multi-round recompression analysis adds significant computational cost at encode time, and for the short messages Phasm targets (under 1 KB), the combination of STDM with coefficient stability heuristics and strong ECC provides sufficient robustness without the full TCM pipeline. TCM remains a well-validated technique in the literature and is reserved for potential future use at higher embedding rates. Phasm’s production coefficient selection uses a lighter-weight stability criterion based on quantization-bin margin analysis rather than iterative recompression.

1.5 Error Correction: Reed-Solomon Codes

Reed-Solomon (RS) codes operate on symbols rather than individual bits, making them naturally suited to burst error patterns. For a code RS($n$, $k$) over GF($2^8$), where $n$ is the codeword length and $k$ is the message length (both in bytes):

  • The code can correct up to $t = \lfloor(n - k)/2\rfloor$ symbol errors
  • The code rate is $R = k/n$
  • The redundancy overhead is $(n - k)/k \times 100\%$

For Phasm’s default configuration of RS(255, 223):

$$t = \left\lfloor \frac{255 - 223}{2} \right\rfloor = 16 \text{ symbol errors}$$

This means up to 16 corrupted bytes out of every 255-byte codeword can be corrected – a symbol error rate tolerance of approximately 6.3%. Because JPEG recompression errors tend to cluster spatially (adjacent coefficients in the same 8x8 block are correlated), block interleaving is applied at depth 32 before RS encoding, spreading spatially correlated errors across multiple codewords.

For randomly distributed errors at the interleaved symbol level, RS(255, 223) achieves reliable decoding (failure probability below $10^{-6}$) for raw symbol error rates up to approximately 5%.

2. Methodology: The Armor Architecture

Phasm’s Armor mode implements a layered architecture designed to address distinct failure modes: coefficient instability, residual bit errors, and geometric transformations. The production system comprises coefficient stability selection, STDM data embedding with Reed-Solomon ECC, DFT template embedding for geometric resilience, and the Fortress sub-mode for social media survival.

                graph TB
                    subgraph "Armor Mode: Production Architecture"
                        direction TB

                        A["Layer 1: Coefficient Selection
(Stability Analysis)"] --> B["Layer 2: Data Embedding
(STDM + Reed-Solomon ECC)"] B --> C["Layer 3: DFT Template
(Geometric Resilience)"] B --> D["Fortress Sub-Mode
(BA-QIM for Social Media)"] end subgraph "Layer 1 Detail" A1["Quantization-bin margin analysis
identifies stable coefficients"] --> A2["Select coefficients with high
recompression survival probability"] A2 --> A3["~70% of nzAC coefficients
form the robust embedding set"] end subgraph "Layer 2 Detail" B1["Payload + adaptive RS ECC
+ block interleaving (depth 32)"] --> B2["STDM embedding
in stable coefficient groups"] B2 --> B3["Spreading length L=8
Quantization step delta adaptive"] end subgraph "Layer 3 Detail" C1["DFT magnitude template
embedded in mid-frequency ring"] --> C2["Enables recovery from rotation,
scaling, and moderate crop"] end subgraph "Fortress Detail" D1["BA-QIM on DC block averages
Watson perceptual masking (4-tier adaptive)"] --> D2["QIM step=12 (adaptive per-block)
repetition r>=15, survives WhatsApp standard"] end A --> A1 B --> B1 C --> C1 D --> D1

2.1 Synchronization and Geometric Recovery

Early Armor designs included a dedicated synchronization pilot – a PN sequence embedded in low-frequency AC coefficients to enable block-alignment recovery from crops. This approach was deferred in favor of the DFT template embedding system (described in Section 2.4), which provides more general geometric resilience including rotation and scaling recovery, not just block-aligned crop recovery.

For block-aligned crop recovery specifically, the DFT template correlation provides alignment detection as a byproduct of its geometric parameter estimation, making a separate synchronization pilot redundant.

2.2 Layer 1: Coefficient Selection

Phasm’s production coefficient selection identifies which DCT coefficients will survive recompression without changing value. Rather than the full TCM multi-round recompression pipeline described in Section 1.4, the production system uses a quantization-bin margin analysis that achieves comparable selection quality with significantly lower computational cost.

The selection algorithm evaluates each coefficient’s distance from its quantization bin boundary. A coefficient $c$ quantized with step $q$ occupies the interval $[(c - 0.5)q, (c + 0.5)q)$. The stability margin is the normalized distance from the bin edge:

$$\text{margin}(c, k) = \frac{|c \cdot q_k - \text{round}(c \cdot q_k)|}{q_k / 2}$$

Coefficients with high margin (close to the center of their quantization bin) are selected for embedding, while those near bin boundaries – which are vulnerable to flipping under recompression – are excluded.

The fraction of non-zero AC coefficients that qualify varies by image content and the original quality factor. For a typical 2-megapixel smartphone photo at QF 75:

  • Approximately ~70% of nzAC coefficients pass the stability threshold
  • This yields approximately 70,000 usable coefficients from ~100,000 total nzAC

Coefficient stability is not uniform across frequencies. Low-frequency coefficients (DC and the first few AC positions in zigzag order) are the most stable because their quantization steps are small relative to the coefficient magnitudes. High-frequency coefficients, which are quantized more aggressively, are the least stable. The stability metric for a coefficient $c$ at frequency position $k$ with quantization step $q_k$ can be approximated as:

$$S(c, k) \approx \frac{|c| \cdot q_k^{(\text{orig})}}{q_k^{(\text{target})}}$$

Coefficients with $S > 2$ are highly likely to survive recompression; those with $S < 1$ are almost certain to change.

2.3 Layer 2: Data Embedding with STDM

The message payload is processed through the following pipeline before STDM embedding:

  1. Framing: 1-byte mode flag + 2-byte length prefix (u16, supporting payloads up to 64 KB) + encrypted message + 4-byte CRC-32
  2. Encryption: AES-256-GCM-SIV (always-on; key derived via Argon2id from user passphrase)
  3. Reed-Solomon encoding: RS(255, 223) applied to the framed, encrypted payload
  4. Block interleaving: Depth-32 interleaving to distribute burst errors
  5. STDM embedding: The interleaved, RS-encoded bitstream is embedded into the stable coefficient set $\mathcal{S}$ using STDM with spreading length $L = 8$

The STDM quantization step $\Delta$ for each coefficient group is chosen adaptively based on the quantization table entries for the involved frequency positions:

$$\Delta_g = \alpha \cdot \max_{i \in g} q_{k_i}$$

where $g$ is the group of $L$ coefficients, $q_{k_i}$ is the quantization step for coefficient $i$’s frequency position, and $\alpha$ is a strength parameter (typically $\alpha = 2.5$ for strong robustness, $\alpha = 2.0$ for a quality-robustness balance). The factor of approximately 2 ensures the STDM quantization grid is coarser than the JPEG quantization grid, which is the necessary condition for recompression survival.

Extraction reverses the pipeline: STDM extraction (blind, using only the spreading vectors and $\Delta$), de-interleaving, RS decoding, CRC verification, decryption.

2.4 Layer 3: DFT Template Embedding (Geometric Resilience)

Armor mode includes a DFT (Discrete Fourier Transform) template embedding system that provides resilience against geometric transformations – rotation, scaling, and moderate cropping. This addresses a class of image manipulations that purely DCT-domain methods cannot survive. For a deep technical analysis of the DFT template algorithm, including peak selection, log-polar correlation, and ring payload extraction, see DFT Template Embedding for Geometric Resilience.

The DFT template works by embedding a known pattern into the magnitude spectrum of the DFT of the image. Because the DFT magnitude is invariant to translation and transforms predictably under rotation and scaling, the decoder can estimate the geometric parameters of any transformation applied to the image and invert them before attempting DCT-domain payload extraction.

The embedding and extraction process:

  1. Embed: A pseudo-random template pattern is additively embedded into a mid-frequency annular ring of the DFT magnitude spectrum, chosen to avoid both the low-frequency energy concentration and high-frequency noise floor
  2. Transform: The image may undergo rotation, scaling, or cropping during transmission
  3. Detect: The decoder computes the DFT magnitude of the received image and performs log-polar correlation to estimate rotation angle and scale factor
  4. Correct: The estimated geometric parameters are inverted, realigning the image to its original geometry
  5. Extract: Standard DCT-domain STDM extraction proceeds on the corrected image

The DFT template adds minimal visual distortion because the embedding energy is spread across the entire image in the frequency domain. Memory optimizations in the Phase 3 geometric decode pipeline achieved an 81-90% reduction in peak memory usage, making DFT-based geometric recovery practical on mobile devices and in WebAssembly environments. Its resilience covers:

  • Rotation up to +/-30 degrees
  • Scaling from 0.7x to 1.5x
  • Moderate cropping (up to ~25% area removal)
  • Combinations of the above with recompression

This layer complements the DCT-domain STDM embedding rather than replacing it: the DFT template handles geometry, while STDM handles the actual data payload. Together, they extend Armor mode’s robustness beyond pure recompression scenarios.

2.5 Fortress Sub-Mode: Social Media Survival for Short Messages

The Fortress sub-mode represents a practical breakthrough for social media steganography. It auto-activates for short messages (typically under ~20-30 bytes) when there is sufficient image capacity, providing end-to-end survival through platforms previously considered infeasible – including WhatsApp standard recompression.

Fortress uses a fundamentally different embedding approach from standard Armor:

  • BA-QIM (Block Average QIM): Instead of embedding in individual DCT coefficients, Fortress embeds in the average of DC coefficients across blocks. Block averages are far more stable under aggressive recompression because averaging suppresses the per-coefficient quantization noise
  • Watson perceptual masking with 4-tier adaptive QIM step sizing: Embedding strength is adapted per-block using Watson’s perceptual model with four adaptive tiers that scale the QIM step based on local texture complexity, concentrating embedding energy where the human visual system is least sensitive (textured regions, mid-luminance areas). The 4-tier system couples masking range to the repetition factor, so short messages get maximum robustness while longer messages get maximum visual quality
  • QIM step size $\Delta = 12$: A large base quantization step ensures the embedding structure survives even aggressive re-quantization at low quality factors. The Watson masking tiers adapt this step per-block from $\Delta \times 0.3$ (smooth blocks) to $\Delta \times 1.5$ (heavy texture)
  • Minimum repetition $r \geq 15$: Each message bit is repeated at least 15 times across different block groups, providing massive redundancy that compensates for the high noise introduced by social media pipelines

The tradeoff is capacity: Fortress mode supports only very short messages (roughly 20-30 bytes, depending on image size). But for many practical use cases – a passphrase, a short URL, a phone number, a brief instruction – this is sufficient.

Fortress survival characteristics:

Channel QF / Processing Fortress Survival
WhatsApp standard QF ~60-70, resize to ~1600px Yes (end-to-end)
Facebook QF ~71-85, max 2048px Yes
Signal HD QF ~95, max 4096px Yes
Twitter/X QF ~85, max 4096px Yes
Instagram QF ~72, resize to 1080px Marginal (depends on image)
WhatsApp HD QF ~70-80, max ~4096px Yes

The key insight behind Fortress is that block-level DC averages are a fundamentally more stable embedding domain than individual AC coefficients. While individual AC coefficients are sensitive to re-quantization step changes, the average across a group of DC coefficients changes only when many blocks shift simultaneously – which requires much more aggressive processing.

3. Results

3.1 Recompression Survival Rates

The following table summarizes expected survival rates – defined as the probability of successful message extraction (RS decoding + CRC verification pass) – for the Armor architecture across recompression scenarios. These projections are derived from published BER data for STDM (Comesana and Perez-Gonzalez, 2006; Pereira and Pun, 2000), coefficient stability measurements (Butora and Fridrich, 2023), and Reed-Solomon coding gain calculations.

Recompression Scenario Raw BER (before ECC) Post-RS BER Message Survival Rate
Same QF (e.g., 75 to 75) < 0.5% ~0% > 99%
Mild QF drop (85 to 75) 1-5% < 0.1% > 95%
Moderate QF drop (85 to 65) 5-12% 0.5-2% > 90% (with RS)
Aggressive QF drop (85 to 50) 10-20% 2-8% 70-85%
Very aggressive (85 to 30) 25-40% 15-30% < 50%
Cross-library (libjpeg vs mozjpeg, same QF) 0.5-3% ~0% > 95%
Format round-trip (JPEG to PNG to JPEG, same QF) < 0.5% ~0% > 99%

Same-QF recompression yields near-zero raw BER because the stability-selected coefficients are, by definition, those that are unlikely to change under recompression. Residual errors come from edge cases where the actual transmission encoder’s IDCT rounding differs slightly from the pre-analysis assumptions.

3.2 BER as a Function of Quality Factor Drop

The raw bit error rate increases approximately linearly with the magnitude of the quality factor drop, until a threshold where it rises exponentially. For STDM with stability-selected coefficients and spreading length $L = 8$:

Original QF Recompression QF QF Drop Raw BER (STDM + Stability Sel.) Raw BER (Scalar QIM)
85 85 0 < 0.3% < 1.0%
85 80 5 0.5-2% 1-4%
85 75 10 1-5% 2-8%
85 70 15 3-8% 5-12%
85 65 20 5-12% 8-18%
85 60 25 8-15% 12-25%
85 50 35 12-22% 18-35%
85 40 45 20-35% 28-45%
85 30 55 30-45% 35-50%

STDM consistently outperforms scalar QIM by approximately 30-50% in relative BER reduction. The spreading mechanism averages noise across $L = 8$ coefficients, reducing the effective noise variance by a factor of $L = 8$ and improving the projection’s signal-to-noise ratio by a factor of $\sqrt{8} \approx 2.8$.

3.3 Cross-Library Recompression

Different JPEG libraries (libjpeg, libjpeg-turbo, mozjpeg, Apple’s ImageIO, Android’s system encoder) implement slightly different IDCT algorithms. The IEEE 1180 standard permits small rounding differences, introducing additional coefficient perturbations when the encoding library differs from the decoding library.

Library Pair Added BER (over same-library) Total BER (same QF) Survival with ECC
libjpeg to libjpeg-turbo +0.3-0.8% 0.5-1.5% > 99%
libjpeg to mozjpeg +0.5-1.5% 0.8-2.5% > 97%
libjpeg-turbo to Apple ImageIO +0.5-1.0% 0.7-2.0% > 98%
mozjpeg to Android encoder +1.0-2.5% 1.5-3.5% > 95%
Worst case (arbitrary pair) +1-3% 1.5-4.0% > 93%

Cross-library recompression at the same QF adds 1-3% BER in the worst case – well within the RS(255, 223) correction capacity.

3.4 Capacity vs. Robustness Tradeoff

Armor mode offers four sub-modes that trade capacity for robustness. The core tradeoff mechanism is ECC redundancy level, repetition coding, and embedding domain:

Sub-Mode Redundancy ECC Rate Capacity* Survives QF Drop Social Media
Fortress r>=15 repetition (BA-QIM) Adaptive ~20-30 bytes Down to QF ~50 Yes (WhatsApp std)
Armor-Max 4x repetition 1/2 ~54 bytes Down to QF ~55 Pre-size required
Armor-Standard 2x repetition 2/3 ~440 bytes Down to QF ~60 Pre-size required
Armor-Lite No repetition 3/4 ~2,600 bytes Down to QF ~65 No

*Capacity for a typical 2-megapixel smartphone photo at QF 75.

The capacity budget derivation for Armor-Standard on a 2 MP image:

$$\text{Total nzAC coefficients} \approx 100{,}000$$

$$\text{Stability-selected robust set (70%)} \approx 70{,}000$$

$$\text{STDM embedding at } L = 8: \quad \frac{70{,}000}{8} \approx 8{,}750 \text{ bits}$$

$$\text{After RS(255, 223) coding at rate } \approx 7/8: \quad 8{,}750 \times \frac{223}{255} \approx 7{,}652 \text{ data bits}$$

$$\text{With 2x repetition (divide by 2)}: \quad \frac{7{,}652}{2} \approx 3{,}826 \text{ data bits} \approx 478 \text{ bytes}$$

After subtracting framing overhead (37 bytes for mode flag, u16 length prefix, GCM-SIV tag, nonce, and CRC), the usable plaintext capacity for Armor-Standard is approximately 440 bytes typical for a 2 MP image (actual capacity varies with image content and the adaptive RS parity tier selected). The u16 length prefix supports payloads up to 64 KB, a significant increase from the original u8 design that was limited to ~255 bytes. Capacity scales linearly with image resolution: a 12 MP smartphone photo (~600,000 nzAC) offers substantially more capacity across all sub-modes.

3.5 Comparison with Other Robust Methods

To contextualize these results, the following table compares the Armor architecture against other published robust steganography and watermarking methods:

Method Same QF Survival QF 85 to 65 Cross-Library Capacity (2 MP) Requires GPU Geometric Robust Social Media*
Phasm Armor (STDM + RS) > 99% > 90% > 95% 54-2,600 bytes No Yes (DFT template) Via Fortress
Phasm Fortress (BA-QIM) > 99% > 95% > 99% ~20-30 bytes No Yes (DFT template) Yes
Scalar QIM (DCT) ~95% ~75% ~90% ~5,000 bits No No No
TCM + STC (Butora 2023) ~100% (matched QF) N/A (matched only) ~98% 0.1-0.5 bpnzAC No No No
Errorless SPC (2024) 100% (matched QF) N/A ~99% 0.1-0.3 bpnzAC No No No
MRAS (2025) ~99% ~92% Unknown ~3,000 bits No No Partial
Spread Spectrum (Cox 1997) ~99% detect. ~90% detect. ~97% detect. 64-256 bits No Partial No
HiDDeN (Zhu 2018) ~98% acc ~90% acc N/A 30-48 bits Yes Partial Partial
StegaStamp (Tancik 2019) ~99% acc ~96% acc N/A 56 bits (100 raw) Yes Yes Yes
TrustMark (Bui 2024) ~99% acc ~96% acc N/A 70 bits (100 raw) Yes Partial Partial

*Social Media = survives platforms that resize images (e.g., WhatsApp standard, Instagram).

The STDM + RS architecture provides 1-3 orders of magnitude more capacity than deep learning watermarking methods (kilobytes versus dozens of bits), requires no GPU, and runs efficiently on mobile CPUs in pure Rust compiled to WebAssembly. The Fortress sub-mode bridges the gap for social media platforms by trading capacity for extreme robustness – at ~20-30 bytes it is comparable in capacity to deep learning methods but achieves this without any GPU or neural network, running entirely in pure Rust/WASM.

4. Discussion

4.1 Why Coefficient Stability Is the Key Insight

The most important engineering decision in the Armor architecture is where to embed, not how. For a detailed analysis of which DCT-domain properties survive recompression – including block averages, brightness ordering, and coefficient signs – see JPEG Recompression Invariants. STDM embedding in arbitrarily chosen coefficients yields raw BER of 5-12% under moderate recompression. The same STDM embedding restricted to stability-selected coefficients yields raw BER below 1% under the same conditions. Coefficient selection accounts for more of the survival improvement than the choice of embedding algorithm.

This insight aligns with the broader robust steganography literature. Butora and Fridrich (2023) demonstrated that “errorless” steganography is achievable not through better embedding but through better coefficient selection – identifying the exact subset of coefficients that form fixed points under iterated JPEG compression.

The mathematical intuition is straightforward. A DCT coefficient $c$ quantized with step $q$ occupies the interval $[(c - 0.5)q, (c + 0.5)q)$. During recompression, the coefficient is reconstructed as $c \cdot q$ (dequantization), subjected to pixel-domain rounding and re-transformation, then re-quantized. If the reconstructed value falls within the same quantization bin, the coefficient is stable. The probability of stability depends on where within the bin the true (unquantized) value lies:

$$P(\text{stable}) \approx 1 - \frac{2\epsilon}{q}$$

where $\epsilon$ is the magnitude of the pixel-domain rounding error projected back into the DCT domain. For low-frequency coefficients with large $q$ and small $\epsilon$, stability probability approaches 1. For high-frequency coefficients near bin boundaries, it can be much lower.

Phasm’s coefficient selection uses quantization-bin margin analysis to identify coefficients that satisfy $P(\text{stable}) \approx 1$, focusing embedding on coefficients well within their quantization bins. This achieves comparable selection quality to full iterative TCM analysis at a fraction of the computational cost, which is important for real-time encoding on mobile devices and in WebAssembly.

4.2 Phase 2: Adaptive Robustness with Repetition Coding

The Phase 2 implementation of Armor mode adds adaptive robustness: when the message is small relative to image capacity, the extra space is used for inner repetition coding with soft majority voting. Rather than simple hard-decision majority voting, the decoder uses confidence-weighted combination:

For each repeated copy $j$ of an extracted bit $\hat{b}_j$, the STDM extraction also produces a soft confidence metric – the normalized distance of the projection from the nearest decision boundary:

$$\text{conf}_j = \frac{|p'_j - Q_\Delta(p'_j - d_{\hat{b}_j}) - d_{\hat{b}_j}|}{\Delta / 4}$$

The final bit decision is:

$$\hat{b} = \begin{cases} 1 & \text{if } \sum_{j} (-1)^{1 - \hat{b}_j} \cdot \text{conf}_j > 0 \\ 0 & \text{otherwise} \end{cases}$$

This soft voting over repetition copies provides approximately 1-2 dB of coding gain over hard majority voting, particularly beneficial when recompression noise is unevenly distributed across the coefficient set. The adaptive RS layer selects from multiple parity tiers based on the available redundancy budget, automatically maximizing robustness for the given message size.

4.3 The Pixel-Domain Round-Trip Problem

A critical nuance that is underappreciated in the robust steganography literature is the distinction between coefficient-domain requantization and pixel-domain re-encoding.

Coefficient-domain requantization applies new quantization tables directly to the existing DCT coefficients – no pixel reconstruction is involved. This is the gentlest form of recompression and yields the lowest BER. It is also rare in practice: only a few platforms (notably Twitter/X under specific conditions and possibly Telegram for images meeting resolution constraints) perform coefficient-domain passthrough.

The far more common pipeline – and the one used by virtually all social media platforms and image editing applications – is pixel-domain re-encoding:

                graph LR
                    A["JPEG Bitstream
(stego image)"] --> B["Entropy Decode"] B --> C["Dequantize
(multiply by Q matrix)"] C --> D["Inverse DCT"] D --> E["Pixel Rounding
(float to uint8)"] E --> F["Forward DCT"] F --> G["Re-quantize
(divide by new Q matrix)"] G --> H["Entropy Encode"] H --> I["New JPEG Bitstream
(recompressed)"] style E fill:#d44,stroke:#333,color:#fff

The pixel rounding step (highlighted) is the critical difference. Even at the same quality factor, the float-to-integer conversion at the pixel level introduces rounding errors that, when projected back into the DCT domain, can perturb coefficients enough to flip embedded bits. Published measurements show an additional 1-5% BER from pixel-domain round-trip versus coefficient-domain requantization at the same QF.

Phasm’s coefficient stability selection partially accounts for this by modeling the quantization-bin margin – coefficients near bin boundaries are excluded from the embedding set. However, different JPEG libraries use different IDCT implementations (integer approximations with varying precision), so the actual pixel-domain errors may differ from the modeled expectations. This is the primary source of residual errors in the “same QF” scenario and the reason we report “> 99%” rather than “100%.”

4.4 Geometric Transformations: From Fundamental Limitation to Partial Solution

Image resizing is catastrophic for standard DCT-domain methods that embed in 8x8 blocks. When an image is downscaled, the block grid of the output bears no spatial relationship to the original – extraction reads entirely wrong coefficients, yielding BER approaching 50% (random).

Phasm addresses this challenge at two levels:

DFT template embedding (Section 2.4) provides geometric resilience for standard Armor mode. The DFT magnitude spectrum is invariant to translation and transforms predictably under rotation and scaling, enabling the decoder to estimate and invert geometric transformations before attempting DCT-domain extraction. This handles rotation, scaling, and moderate cropping – but not arbitrary downscaling that destroys too much frequency content.

Fortress sub-mode (Section 2.5) addresses the social media resizing problem directly. By embedding in block-level DC averages rather than individual AC coefficients, Fortress operates at a spatial scale that is more resilient to resolution changes. The heavy repetition coding (r>=15) provides enough redundancy to survive the combined effect of resizing and aggressive recompression.

The implications for real-world deployment, updated to reflect both standard Armor and Fortress capabilities:

Channel Resizes? Standard Armor Fortress (~20-30 bytes)
Email attachment No Yes Yes
Cloud storage link (phasm.link, Dropbox, etc.) No Yes Yes
AirDrop / direct file transfer No Yes Yes
“Send as file” (WhatsApp, Telegram, Signal) No Yes Yes
Twitter/X (image < 4096px) No Yes Yes
Facebook (image < 2048px) No Yes (moderate) Yes
Telegram (image < 2560px) Possibly no Yes (if passthrough) Yes
WhatsApp standard (image > 1600px) Yes No Yes
WhatsApp HD Possibly Marginal Yes
Instagram (image > 1080px wide) Yes No Marginal
WeChat Yes + heavy recompression No No
Screenshot Yes (resolution change) No No

For a comprehensive analysis of how 15 major platforms process images and which are compatible with steganography, see our companion post: How 15 Platforms Process Your Photos.

For platforms where neither Armor nor Fortress is viable, the recommended workflow remains “send as file/document” mode or sharing via phasm.link.

Deep learning methods (StegaStamp, TrustMark) can survive resizing but are limited to 30-100 bits per image. Fortress achieves comparable capacity (~20-30 bytes) for social media channels without requiring GPU or neural network inference.

4.5 Social Media Platform Processing

We compiled a comprehensive analysis of image processing pipelines across 15 major platforms (detailed in our companion post: How 15 Platforms Process Your Photos). With the addition of Fortress mode, the platform tiers have expanded:

Standard Armor feasible (no resize if image is pre-sized): Twitter/X (QF 85, max 4096px), Telegram (QF ~82, max 2560px, possible passthrough), Facebook (QF ~71-85, max 2048px, but applies sharpening), Signal HD (QF ~95, max 4096px), Discord (QF 80, CDN may preserve original).

Fortress feasible (survives resize + recompression for short messages): WhatsApp standard (QF ~60-70, resize to ~1600px), WhatsApp HD (QF ~70-80), Facebook (all sizes), Twitter/X (all sizes), Signal HD.

Marginal (even with Fortress): Instagram (always resizes to 1080px + aggressive processing – depends on image content).

Not feasible: WeChat (QF ~53 + heavy resize + additional processing), Snapchat (QF ~50-70 + resize + overlays).

The practical takeaway: Standard Armor mode targets controlled JPEG-to-JPEG channels without resizing. Fortress extends viability to WhatsApp standard and other resizing platforms for short messages. For platforms where neither mode is viable, “send as file” or sharing via phasm.link remains the reliable path.

5. Practical Implications

The results above translate into concrete guidance for practitioners. STDM was chosen over scalar QIM for its noise-averaging advantage (~30-50% BER reduction), over spread spectrum for its capacity (kilobytes versus dozens of bits), and over DWT-domain methods which introduce the very pixel-rounding errors we are trying to survive. Armor mode is appropriate for JPEG-to-JPEG channels without resizing (file sharing, email, cloud links, “send as file” on messaging apps). For platforms that resize images, the Fortress sub-mode now extends feasibility to WhatsApp standard and similar platforms for short messages. When maximum stealth is the priority rather than robustness, use Ghost mode instead – Ghost uses J-UNIWARD adaptive cost functions (now fully implemented as of version 1.3.0) with syndrome-trellis codes for undetectable embedding.

Important caveat: Armor mode is designed to survive JPEG recompression and geometric transformations (via the DFT template). It is not designed to survive arbitrary image editing operations such as brightness/contrast adjustments, color correction, filters, or content-aware modifications. These operations alter pixel values in ways that are not modeled by the recompression survival framework.

Based on the quantitative analysis:

  1. For social media sharing of short messages, use Fortress. The Fortress sub-mode auto-activates for short messages and survives WhatsApp standard recompression end-to-end. If your message fits in ~20-30 bytes (a passphrase, short URL, or brief instruction), Fortress is the most reliable path through platforms that resize.

  2. Pre-size images to platform resolution limits for standard Armor. If the intended transmission channel has a known resolution cap (Twitter: 4096px, Facebook: 2048px, Telegram: 2560px), resize the cover image to that limit before Armor encoding. This prevents the platform from resizing, eliminating the most destructive processing step.

  3. Use Armor-Standard as the default for controlled channels. The 2x repetition + adaptive RS configuration offers the best general-purpose robustness with practical capacity for short text messages.

  4. Choose Armor-Max for maximum recompression resistance. The 4x repetition of Armor-Max limits capacity to approximately 54 bytes (roughly a single sentence) but provides the strongest error correction against aggressive quality factor drops.

  5. Choose Armor-Lite for maximum capacity. When the channel is known and controlled (direct file transfer, cloud link), the extra robustness of repetition coding is unnecessary. Armor-Lite provides approximately 2,600 bytes of capacity with recompression-only robustness.

  6. Always prefer “send as file” or a direct link when capacity matters. For messages longer than ~30 bytes through social media, bypass image processing entirely. Phasm’s sharing via phasm.link provides a universal solution.

6. Conclusion

DCT-domain robust steganography, when engineered as a layered system combining coefficient stability selection, STDM embedding, and Reed-Solomon error correction, achieves high reliability against JPEG recompression. The key results are:

  • > 99% message survival under same-quality-factor recompression
  • > 95% survival under mild quality factor drops (10-point QF reduction)
  • > 95% survival under cross-library recompression
  • > 90% survival under moderate QF drops (20 points) with appropriate ECC
  • Fortress sub-mode: end-to-end survival through WhatsApp standard for messages under ~20-30 bytes
  • DFT template: geometric resilience against rotation, scaling, and moderate cropping

These results hold for text-length payloads (up to 64 KB with the u16 length prefix) in typical smartphone photographs (2+ megapixels). The four sub-modes (Fortress, Max, Standard, Lite) provide a principled tradeoff between capacity and robustness. Armor-Standard offers approximately 440 bytes of usable capacity with recompression survival (up from ~290 bytes following the upgrade to a u16 length prefix supporting payloads up to 64 KB), while Fortress extends the feasibility envelope to social media platforms that resize images – previously considered infeasible for non-neural DCT-domain techniques.

The geometric limitation has been partially addressed by two complementary systems: the DFT template handles rotation, scaling, and moderate cropping for standard Armor payloads, while Fortress’s block-average embedding domain provides inherent resilience to the resize-plus-recompression pipelines used by social media platforms. Platforms with extremely aggressive processing (WeChat, Snapchat) remain infeasible. For these channels, file-based sharing or direct URLs via phasm.link remain the recommended alternatives.

Phasm’s Armor mode implements this architecture in pure Rust, compiled to both native code (iOS, Android) and WebAssembly (browser). The core engine is open source on GitHub (GPL-3.0). All processing runs client-side – no images or messages are transmitted to any server. The combination of research-backed robustness engineering with privacy-by-design architecture aims to make robust steganography practical and accessible for non-expert users.

References

  1. I. S. Reed and G. Solomon, “Polynomial Codes Over Certain Finite Fields,” Journal of the Society for Industrial and Applied Mathematics, vol. 8, no. 2, pp. 300–304, 1960.

  2. A. B. Watson, “DCT Quantization Matrices Visually Optimized for Individual Images,” Proceedings of SPIE, vol. 1913, pp. 202–216, 1993.

  3. I. J. Cox, J. Kilian, F. T. Leighton, and T. Shamoon, “Secure Spread Spectrum Watermarking for Multimedia,” IEEE Transactions on Image Processing, vol. 6, no. 12, pp. 1673–1687, 1997.

  4. S. Pereira and T. Pun, “Robust Template Matching for Affine Resistant Image Watermarks,” IEEE Transactions on Image Processing, vol. 9, no. 6, pp. 1123–1129, 2000.

  5. B. Chen and G. W. Wornell, “Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding,” IEEE Transactions on Information Theory, vol. 47, no. 4, pp. 1423–1443, 2001.

  6. P. Comesana and F. Perez-Gonzalez, “On the Capacity of Stego-Systems,” Proceedings of the 8th ACM Workshop on Multimedia and Security, pp. 15–24, 2006.

  7. N. Nikolaidis and I. Pitas, “High-Performance JPEG Steganography Using Quantization Index Modulation in DCT Domain,” Pattern Recognition Letters, vol. 27, no. 4, pp. 455–461, 2006.

  8. F. Perez-Gonzalez, C. Mosquera, M. Barni, and A. Abrardo, “Improved Spread Transform Dither Modulation Using a Perceptual Model,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2007.

  9. V. Holub, J. Fridrich, and T. Denemark, “Universal Distortion Function for Steganography in an Arbitrary Domain,” EURASIP Journal on Information Security, vol. 2014, no. 1, 2014.

  10. J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei, “HiDDeN: Hiding Data with Deep Networks,” European Conference on Computer Vision (ECCV), 2018.

  11. M. Tancik, B. Mildenhall, and R. Ng, “StegaStamp: Invisible Hyperlinks in Physical Photographs,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

  12. Z. Zhang et al., “Improving Robustness of TCM-based Robust Steganography with Variable Robustness Cost,” arXiv preprint, arXiv:2211.10095, 2022.

  13. J. Butora and J. Fridrich, “Errorless Robust JPEG Steganography Using Outputs of JPEG Coders,” IEEE Transactions on Dependable and Secure Computing, 2023.

  14. T. Qiao et al., “Robust Steganography in Practical Communication: A Comparative Study,” EURASIP Journal on Image and Video Processing, 2023.

  15. J. Butora et al., “Errorless Robust JPEG Steganography Using Steganographic Polar Codes,” EURASIP Journal on Information Security, 2024.

  16. T. Bui et al., “TrustMark: Universal Watermarking for Arbitrary Resolution Images,” International Conference on Computer Vision (ICCV), 2024.

  17. “A Matching Robust Adaptive Steganography Scheme for JPEG Images over Social Networking Platforms,” Signal Processing, 2025.