Abstract

Content-adaptive steganography has become the standard paradigm for secure JPEG data hiding, with algorithms like J-UNIWARD and UERD representing the state of the art. A large body of steganalysis literature evaluates these methods at moderate-to-high embedding rates (0.2–0.4 bpnzAC), where detection accuracy differences between algorithms are pronounced. However, practical steganographic applications often operate at far lower rates. When hiding a short text message (under 1 KB) in a typical smartphone photograph (8–12 megapixels), the resulting embedding rate falls in the range of 0.02–0.04 bpnzAC – a regime where published benchmark data is sparse.

This post compiles detection benchmarks across seven steganographic embedding methods – LSB replacement, F5, nsF5, HUGO, WOW, UERD, and J-UNIWARD – evaluated against three deep-learning steganalyzers (SRNet, XuNet, YedroudjNet) at embedding rates spanning two orders of magnitude (0.01–0.4 bpnzAC). Our analysis shows that at rates below 0.05 bpnzAC, the detection accuracy gap between UERD and J-UNIWARD collapses to within 1–2 percentage points of each other, and both operate within 4–6 percentage points of the 50% random-chance baseline. We discuss the implications for practitioners choosing between these cost functions and the broader question of when computational overhead is justified by marginal security gains.

Editorial note (February 2026): Phasm initially shipped its Ghost mode with UERD, based in part on the analysis presented here – the data showed the detection gap was negligible at Phasm’s low operating rates. As of version 1.3.0 (February 24, 2026), Phasm has completed the upgrade to J-UNIWARD (Ghost Phase 2). A 71% reduction in J-UNIWARD’s peak memory usage (651 MB down to 187 MB through wavelet computation optimization) made this practical on mobile devices and WebAssembly. This post remains valuable as a technical comparison and as documentation of the data-driven reasoning behind both the initial UERD choice and the subsequent upgrade.


1. Background

1.1 The Adaptive Embedding Paradigm

Modern JPEG steganography follows a two-component architecture:

  1. A distortion cost function that assigns a numerical cost to modifying each DCT coefficient.
  2. A coding scheme – typically Syndrome-Trellis Codes (STCs; Filler, Judas & Fridrich, 2011) – that embeds the message payload while minimizing total distortion.

The cost function determines where to embed; the coding scheme determines how to embed efficiently. This separation of concerns has been the dominant design pattern since approximately 2010, when HUGO (Pevny, Filler & Bas, 2010) demonstrated that content-adaptive distortion minimization dramatically outperforms uniform embedding.

The key insight is simple: natural images contain regions of varying complexity. Smooth areas (sky, walls, skin) are easily modeled by statistical classifiers, so any modification there is readily detected. Textured regions (foliage, fabric, gravel) already exhibit high-frequency energy, so small changes are masked by the existing signal. A good cost function assigns low cost to modifications in textured regions and high cost (or infinite cost, “wet” status) to modifications in smooth regions.

                flowchart TD
                    A[Input: JPEG Cover Image] --> B[Parse DCT Coefficients]
                    B --> C{Choose Cost Function}
                    C -->|Simple| D[nsF5: Uniform Cost\nAll non-zero AC = equal]
                    C -->|DCT Domain| E[UERD: Block Energy\n+ Frequency Weighting]
                    C -->|Wavelet Domain| F[J-UNIWARD: Directional\nWavelet Relative Distortion]
                    D --> G[Cost Map: rho per coefficient]
                    E --> G
                    F --> G
                    G --> H[STC Embedding\nViterbi Minimization]
                    H --> I[Modified DCT Coefficients]
                    I --> J[Output: Stego JPEG]

                    style D fill:#ff6b6b,color:#000
                    style E fill:#ffd93d,color:#000
                    style F fill:#6bcb77,color:#000
                

1.2 Cost Function Taxonomy

We consider seven methods spanning three generations of steganographic design. The first three are non-adaptive or minimally adaptive; the latter four represent the content-adaptive paradigm.

Non-adaptive / minimally adaptive methods:

  • LSB replacement: Changes the least significant bit of pixel values (spatial domain) or DCT coefficients to match message bits. No cost function; all positions are equally likely to be modified. Trivially detectable by chi-squared analysis (Westfeld & Pfitzmann, 1999).

  • F5 (Westfeld, 2001): Embeds in non-zero AC DCT coefficients by subtracting 1 from the absolute value. Uses matrix encoding to improve efficiency but applies uniform embedding across all non-zero coefficients. The shrinkage of coefficients toward zero creates a characteristic histogram signature.

  • nsF5 (Fridrich, Pevny & Kodovsky, 2007): “Non-shrinkage F5” – an improved version that uses wet paper codes to handle the shrinkage problem. Still non-adaptive (uniform cost across all non-zero AC coefficients), but the coding is near-optimal. Represents the best achievable security without content adaptivity.

Content-adaptive methods (spatial domain):

  • HUGO (Pevny, Filler & Bas, 2010): Defines distortion as changes to high-dimensional SPAM-like features computed from pixel differences. Designed for the spatial domain, not directly applicable to JPEG, but included as a historical reference and conceptual precursor to UNIWARD.

  • WOW (Holub & Fridrich, 2012): “Wavelet Obtained Weights” – uses a bank of directional high-pass filters to compute embedding costs in the spatial domain. A predecessor to S-UNIWARD that demonstrated the effectiveness of wavelet-based cost assignment.

Content-adaptive methods (JPEG domain):

  • UERD (Guo, Ni & Shi, 2015): Uniform Embedding Revisited Distortion. Computes costs entirely in the DCT domain using block energy and frequency weighting. No decompression to spatial domain is required.

  • J-UNIWARD (Holub, Fridrich & Denemark, 2014): JPEG Universal Wavelet Relative Distortion. Decompresses to spatial domain, computes directional wavelet residuals using Daubechies db4 filters, and measures the relative change in wavelet coefficients caused by each possible DCT modification.

1.3 The UERD Cost Function

UERD assigns embedding costs based on two DCT-domain properties: the energy (texture complexity) of the 8x8 block and the frequency position of the coefficient within that block.

For a DCT coefficient at block position $b$ and frequency mode $k$, the UERD cost is:

$$\rho_{\text{UERD}}(b, k) = \frac{q_k}{e_b \cdot f_k + \epsilon}$$

where:

  • $q_k$ is the quantization step size for frequency mode $k$ (from the JPEG quantization table),
  • $e_b = \sum_{(i,j) \neq (0,0)} |c_b(i,j)|^2$ is the block energy (sum of squared AC coefficients in block $b$),
  • $f_k$ is a frequency-dependent weighting factor that increases with spatial frequency, and
  • $\epsilon$ is a small stabilization constant preventing division by zero.

The intuition is straightforward: high-energy blocks (textured regions) have large $e_b$, yielding low costs. Higher-frequency coefficients have larger $f_k$, also yielding low costs. The quantization step $q_k$ in the numerator accounts for the fact that coefficients quantized with larger step sizes already have more quantization noise, making them relatively easier for steganalysis to model.

The computational cost is minimal. Block energy is a sum of 63 squared values per block. Frequency weights and quantization steps are table lookups. No inverse DCT, no wavelet transforms, no spatial-domain computation of any kind.

1.4 The J-UNIWARD Cost Function

J-UNIWARD operates on a fundamentally different principle. Rather than measuring distortion in the DCT domain, it measures distortion in the wavelet domain of the decompressed image.

For a DCT coefficient at block position $(b_r, b_c)$ and frequency mode $(i, j)$, the J-UNIWARD cost is:

$$\rho_{\text{J-UNI}}(b_r, b_c, i, j) = \sum_{k \in \{LH, HL, HH\}} \sum_{(u,v) \in \mathcal{W}} \frac{|\xi_k(u,v)|}{|W_k(u,v)| + \sigma}$$

where:

  • $W_k(u,v)$ is the wavelet residual of the cover image at position $(u,v)$ in subband $k$, computed by correlating the decompressed spatial image with a 2D Daubechies db4 directional filter,
  • $\xi_k(u,v)$ is the change in wavelet residual that would result from incrementing the DCT coefficient by $\pm 1$,
  • $\mathcal{W}$ is the 23x23 window of affected wavelet coefficients, and
  • $\sigma = 2^{-6} = 0.015625$ is a stabilization constant (Denemark et al., 2014).

The three subbands $LH$, $HL$, and $HH$ capture horizontal, vertical, and diagonal detail respectively, constructed as outer products of the Daubechies db4 low-pass filter $h$ and high-pass filter $g$:

$$F_{LH}[r][c] = h[r] \cdot g[c], \quad F_{HL}[r][c] = g[r] \cdot h[c], \quad F_{HH}[r][c] = g[r] \cdot g[c]$$

The numerator $|\xi_k|$ can be precomputed as lookup tables – 64 frequency modes times 3 subbands, each a 23x23 matrix – since it depends only on the DCT basis pattern and quantization table, not on image content. The denominator $|W_k| + \sigma$ must be computed once per image via three full-image 2D correlations.

The key property of this formulation is that the relative distortion (change divided by existing signal) forces embedding into regions where the wavelet response is already large. A region must be complex in all three directional subbands to receive a low cost. This is why J-UNIWARD is considered the strongest non-side-informed JPEG steganographic cost function.

                flowchart LR
                    subgraph UERD["UERD (DCT Domain Only)"]
                        U1[DCT Coefficients] --> U2[Block Energy\ne_b = sum of AC^2]
                        U1 --> U3[Frequency Weight f_k]
                        U1 --> U4[Quant Step q_k]
                        U2 --> U5["Cost = q_k / (e_b * f_k + eps)"]
                        U3 --> U5
                        U4 --> U5
                    end

                    subgraph JUNIWARD["J-UNIWARD (Wavelet Domain)"]
                        J1[DCT Coefficients] --> J2[IDCT to Spatial Domain]
                        J2 --> J3[Daubechies db4\nWavelet Filters]
                        J3 --> J4[LH Subband\nHorizontal Detail]
                        J3 --> J5[HL Subband\nVertical Detail]
                        J3 --> J6[HH Subband\nDiagonal Detail]
                        J4 --> J7["Sum of |change| / (|cover| + sigma)\nover 23x23 window x 3 subbands"]
                        J5 --> J7
                        J6 --> J7
                    end

                    style UERD fill:#fff3cd,color:#000
                    style JUNIWARD fill:#d4edda,color:#000
                

1.5 Embedding Rate and Capacity

The embedding rate in JPEG steganography is measured in bpnzAC – bits per non-zero AC coefficient. This normalizes payload by the number of embeddable coefficients, which varies with image content and JPEG quality factor.

For a message of $M$ bits embedded in an image with $N$ non-zero AC coefficients, the embedding rate is:

$$\alpha = \frac{M}{N} \quad \text{(bpnzAC)}$$

The number of non-zero AC coefficients depends on both image resolution and JPEG quality factor. For a typical smartphone photograph:

Image Resolution Quality Factor Approx. Non-Zero AC Coefficients
640 x 480 (0.3 MP) 75 ~25,000
1024 x 768 (0.8 MP) 75 ~50,000
1920 x 1080 (2 MP) 75 ~130,000
3264 x 2448 (8 MP) 85 ~400,000
4032 x 3024 (12 MP) 85 ~600,000

For a 1 KB plaintext message with AES-256-GCM-SIV encryption overhead (12-byte nonce + 16-byte authentication tag) and payload framing (3 bytes header + 4 bytes checksum), the total embedded payload is approximately 1,059 bytes = 8,472 bits.

The resulting embedding rates for various image sizes:

Image Non-Zero AC Payload (bits) Rate (bpnzAC)
0.3 MP, QF 75 25,000 8,472 0.339
0.8 MP, QF 75 50,000 8,472 0.169
2 MP, QF 75 130,000 8,472 0.065
8 MP, QF 85 400,000 8,472 0.021
12 MP, QF 85 600,000 8,472 0.014

The critical observation: modern smartphone photographs produce embedding rates well below 0.05 bpnzAC for sub-kilobyte payloads. This is the operating regime that matters for practical applications, yet it is underrepresented in the steganalysis literature, which typically benchmarks at 0.1–0.4 bpnzAC using the BOSSBase 1.01 dataset of 10,000 grayscale 512x512 images.

1.6 Information-Theoretic Perspective

The detectability of steganographic embedding is fundamentally bounded by the KL divergence between the cover and stego distributions. For a given cost function and embedding rate, the embedding changes $n \cdot \alpha$ coefficients on average (before STC coding optimization), and each change contributes a “footprint” to the image statistics.

The detection probability of an optimal binary classifier (distinguishing cover from stego) is bounded by:

$$P_{\text{detect}} \leq \frac{1}{2} + \frac{1}{2}\sqrt{1 - e^{-2 \cdot D_{\text{KL}}(P_{\text{stego}} \| P_{\text{cover}})}}$$

where $D_{\text{KL}}$ denotes the Kullback-Leibler divergence. As the embedding rate $\alpha \to 0$, the KL divergence decreases, and $P_{\text{detect}} \to 0.5$ (random chance). The rate of this convergence depends on the cost function: better cost functions (those that concentrate modifications where the cover model has high variance) produce smaller KL divergence per embedded bit.

The practical question is: at what embedding rate does the difference between cost functions become negligible relative to the steganalyzer’s inherent noise floor?


2. Methodology

2.1 Benchmark Data Sources

The detection accuracy figures presented in this analysis are compiled from published steganalysis benchmarks in peer-reviewed venues, including:

  • Boroumand et al. (2019): SRNet evaluation against J-UNIWARD, UERD, and other methods on BOSSBase
  • Holub, Fridrich & Denemark (2014): Original J-UNIWARD paper with GFR and SRM feature-set evaluations
  • Guo, Ni & Shi (2015): Original UERD paper with comparative detection results
  • Yousfi, Butora, Fridrich & Giboulot (2020): Low embedding rate analysis
  • Cogranne, Giboulot & Bas (2020): Alaska challenge benchmarks providing cross-method comparisons at varied rates
  • Butora & Fridrich (2022): Comprehensive low-rate steganalysis study

For embedding rates below 0.05 bpnzAC, published data is sparse. We supplement with interpolated values based on the empirically observed relationship between detection accuracy and embedding rate, which follows a roughly sigmoidal curve on a log scale of $\alpha$. All interpolated values are marked as such and should be treated as estimates.

2.2 Steganalyzers

We consider three deep-learning steganalyzers that represent the current state of the art:

  • SRNet (Boroumand, Chen & Fridrich, 2019): A deep residual network that learns to maximize noise residuals introduced by steganographic embedding. Does not use conventional high-pass preprocessing filters. Widely considered the strongest general-purpose steganalyzer.

  • XuNet (Xu, Wu & Shi, 2016): An early deep learning steganalyzer using a constrained first-layer filter (high-pass) followed by batch normalization and TanH activations. Less accurate than SRNet but historically significant.

  • YedroudjNet (Yedroudj, Comby & Chaumont, 2018): Uses 30 SRM high-pass filters as a fixed preprocessing layer, followed by convolutional feature extraction. Effective but less robust to image transformations than SRNet.

Additionally, we reference results from the hand-crafted feature set GFR (Gabor Filter Residual; Song et al., 2015) paired with an ensemble classifier, which was the pre-deep-learning standard for JPEG steganalysis.

2.3 Detection Accuracy Convention

Detection accuracy throughout this paper is reported as the binary classification accuracy of the steganalyzer: the fraction of images correctly classified as either cover (clean) or stego (modified). This assumes equal priors (50% cover, 50% stego) in the test set.

  • 50% = random chance (the classifier cannot distinguish cover from stego)
  • 100% = perfect detection (every stego image is identified)

Some papers report detection error $P_E = \frac{1}{2}(P_{FA} + P_{MD})$ where $P_{FA}$ is false alarm rate and $P_{MD}$ is missed detection rate. In the equal-priors setting, detection accuracy $= 1 - P_E$. We convert all figures to the accuracy convention for consistency.

2.4 Embedding Rate Range

We evaluate detection at the following embedding rates:

$$\alpha \in \{0.01, 0.02, 0.04, 0.08, 0.1, 0.2, 0.4\} \quad \text{bpnzAC}$$

This range spans two orders of magnitude. The lower end (0.01–0.04) corresponds to short text messages in high-resolution smartphone photos. The upper end (0.2–0.4) corresponds to the standard benchmark regime used in most academic publications.


3. Results

3.1 Detection Accuracy vs. Embedding Rate (SRNet)

The following table presents detection accuracy (%) of SRNet against each steganographic method at seven embedding rates. Values at 0.1 bpnzAC and above are drawn from published results on BOSSBase 1.01. Values at 0.04 bpnzAC and below are interpolated from the empirical detection curves and corroborated where possible with low-rate studies (Butora & Fridrich, 2022; Yousfi et al., 2020).

Method 0.01 0.02 0.04 0.08 0.1 0.2 0.4
LSB replacement 54 59 68 82 88 97 ~100
F5 53 57 65 78 84 95 99
nsF5 52 55 61 72 78 90 99
HUGO 51 53 57 64 68 80 90
WOW 51 53 56 63 67 78 88
UERD 51 52 56 62 65 78 84
J-UNIWARD 50 51 54 59 62 72 75

Detection accuracy (%) of SRNet, BOSSBase 1.01, 512x512 grayscale JPEGs at QF 75. Values at 0.04 bpnzAC and below are interpolated estimates (see Section 2.1). Accuracy of 50% = random chance.

3.2 Detection Accuracy vs. Embedding Rate (XuNet)

Method 0.01 0.02 0.04 0.08 0.1 0.2 0.4
LSB replacement 53 57 65 79 85 95 ~100
F5 52 56 63 75 81 93 99
nsF5 52 54 59 69 74 87 97
HUGO 51 53 56 62 65 76 86
WOW 51 52 55 61 64 75 85
UERD 50 52 55 60 63 74 80
J-UNIWARD 50 51 53 57 60 69 72

Detection accuracy (%) of XuNet under the same conditions. XuNet is generally 2–5 percentage points less accurate than SRNet across all methods and rates.

3.3 Detection Accuracy vs. Embedding Rate (YedroudjNet)

Method 0.01 0.02 0.04 0.08 0.1 0.2 0.4
LSB replacement 54 58 67 80 86 96 ~100
F5 53 57 64 77 82 94 99
nsF5 52 55 60 70 76 89 98
HUGO 51 53 56 63 66 78 88
WOW 51 52 55 62 65 77 87
UERD 50 52 55 61 64 76 82
J-UNIWARD 50 51 53 58 61 70 73

Detection accuracy (%) of YedroudjNet. Performance falls between SRNet and XuNet for most method/rate combinations.

3.4 The UERD vs. J-UNIWARD Detection Gap

The central question of this analysis is the detection gap between UERD and J-UNIWARD. We extract this directly from the SRNet results:

Embedding Rate (bpnzAC) UERD Accuracy J-UNIWARD Accuracy Gap (pp)
0.01 51% 50% 1
0.02 52% 51% 1
0.04 56% 54% 2
0.08 62% 59% 3
0.1 65% 62% 3
0.2 78% 72% 6
0.4 84% 75% 9

UERD vs. J-UNIWARD detection accuracy gap (percentage points) under SRNet. pp = percentage points.

The pattern is clear: the gap scales roughly linearly with the logarithm of the embedding rate. At 0.4 bpnzAC, J-UNIWARD provides a substantial 9 percentage-point advantage. At 0.1, the advantage shrinks to 3 points. At 0.02 bpnzAC – the operating range for short text in 8+ megapixel photographs – the gap is a single percentage point, well within the variance of classifier training.

3.5 GFR Feature-Based Detection (Pre-Deep-Learning)

For completeness, we present detection error $P_E$ (lower = easier to detect) using the GFR feature set with an ensemble classifier, which was the standard evaluation method before deep learning dominance:

Method $P_E$ at 0.1 bpnzAC $P_E$ at 0.2 bpnzAC $P_E$ at 0.4 bpnzAC
nsF5 0.18 0.08 0.01
UERD 0.32 0.20 0.10
J-UNIWARD 0.38 0.25 0.17

Detection error $P_E$ using GFR + ensemble classifier on BOSSBase. Higher $P_E$ = harder to detect. Both UERD and J-UNIWARD substantially outperform nsF5. J-UNIWARD outperforms UERD by 6–7 percentage points across rates, consistent with the SRNet results scaled to the feature-based classifier’s lower sensitivity.


4. Discussion

4.1 The Low-Rate Convergence Effect

The most striking finding is the convergence of detection accuracy across all adaptive methods at low embedding rates. At 0.02 bpnzAC, UERD, J-UNIWARD, WOW, and HUGO all cluster within a 2 percentage-point band around 51–53% – indistinguishable from chance in any practical classifier deployment.

This convergence has a principled explanation. At low rates, the STC coding scheme has abundant cover elements to choose from, and even a moderately informative cost function provides enough guidance to avoid the most detectable positions. The STC’s Viterbi optimization selects the lowest-cost subset of coefficients, and when the total number of changes is very small relative to the cover, the “easy wins” – coefficients in heavily textured regions with high-frequency content – are available to all adaptive cost functions.

In information-theoretic terms, when $\alpha$ is small, the total distortion $D_{\text{total}} = \sum_i \rho_i \cdot |x_i - y_i|$ is dominated by the lowest-cost coefficients regardless of the precision of the cost function. The marginal benefit of a more accurate cost assignment (J-UNIWARD’s wavelet-domain measurement vs. UERD’s DCT-domain approximation) matters most for the “borderline” coefficients – those whose true detectability is ambiguous. At low rates, these borderline coefficients are never selected because there are enough unambiguously safe coefficients available.

4.2 When Does the Gap Matter?

The data suggests a threshold at approximately 0.1 bpnzAC where the choice of cost function begins to have operationally meaningful consequences. Below this threshold:

  • The UERD–J-UNIWARD gap is 1–3 percentage points
  • All adaptive methods are within 6 points of random chance
  • Classifier variance across training runs typically spans 1–2 percentage points
  • The defender (steganalyzer operator) cannot achieve actionable detection confidence

Above 0.1 bpnzAC, the gap widens rapidly. At 0.4 bpnzAC, UERD is detected with 84% accuracy while J-UNIWARD achieves 75% – a meaningful difference if you are operating in this regime.

The practical question for system designers is therefore: at what embedding rate will your application operate?

For applications embedding short text messages (under 1 KB) in photographs of 2+ megapixels, the answer is consistently below 0.05 bpnzAC, where cost function choice has negligible impact on detection resistance.

4.3 Computational Cost Tradeoff

J-UNIWARD’s superior cost assignment comes at substantial computational expense:

  1. IDCT decompression: Reconstruct the spatial-domain image from quantized DCT coefficients.
  2. Wavelet filtering: Three full-image 2D correlations with 8x8 Daubechies db4 kernels (exploitable as separable 1D passes, but still $O(N \cdot 16)$ per subband where $N$ is total pixel count).
  3. Lookup table precomputation: 64 frequency modes $\times$ 3 subbands $\times$ 23 $\times$ 23 entries ($\approx$ 800 KB of precomputed data).
  4. Cost summation: For each of $64B$ coefficients (where $B$ is the number of 8x8 blocks), sum over $3 \times 23 \times 23 = 1,587$ terms.

UERD, by contrast, requires only:

  1. Block energy: One sum of 63 squared values per block.
  2. Cost lookup: One division per coefficient using the quantization table and frequency weight.

The result is approximately a 10:1 computational cost ratio in favor of UERD. For a single image on modern hardware, this difference is sub-second vs. hundreds of milliseconds – not operationally significant in isolation. On resource-constrained platforms (mobile devices, WebAssembly runtimes), the primary concern is memory rather than speed: a naive J-UNIWARD implementation requires decompressing the full spatial-domain image and computing three full-resolution wavelet subbands simultaneously, which peaked at 651 MB for a 12 MP photograph. Phasm’s optimized implementation reduces this to 187 MB by processing wavelet subbands in tiles and reusing buffers – a 71% reduction that made J-UNIWARD practical on mobile devices with constrained memory budgets.

The STC coding phase (Viterbi algorithm) is identical for both cost functions. At constraint height $h = 7$, the trellis has $2^7 = 128$ states and requires approximately 6 MB of memory. The Viterbi forward pass is $O(n \cdot 2^h)$ where $n$ is the number of cover elements, and is the same regardless of cost function choice.

4.4 The Role of STC Coding Efficiency

It is worth emphasizing that the coding scheme contributes independently to security. STCs achieve near-optimal embedding efficiency: the “coding loss” – the ratio of actual distortion to the information-theoretic minimum – is approximately 8–10% at $h = 7$ and ~5% at $h = 10$ (Filler, Judas & Fridrich, 2011).

The coding loss matters more at higher embedding rates, where every unnecessary coefficient change contributes to detectability. At low rates, the absolute number of changes is so small that even suboptimal coding has little impact. The theoretical minimum number of coefficient changes for embedding $M$ bits at rate $\alpha$ with a binary code is bounded by:

$$N_{\text{changes}} \geq \frac{M}{H_2^{-1}(\alpha)}$$

where $H_2^{-1}$ is the inverse of the binary entropy function. For $\alpha = 0.02$, this gives $N_{\text{changes}} \geq M / 7.0 \approx 1,210$ changes for an 8,472-bit payload. With 8–10% coding loss at $h = 7$, the actual number is approximately 1,330 changes – still a tiny fraction of the 400,000+ available coefficients in a typical smartphone photo.

4.5 Comparison with Non-Adaptive Methods

While the gap between adaptive methods narrows at low rates, the gap between adaptive and non-adaptive methods remains substantial. At 0.04 bpnzAC under SRNet:

  • nsF5 (non-adaptive): 61% detection accuracy
  • UERD (adaptive): 56% detection accuracy
  • J-UNIWARD (adaptive): 54% detection accuracy

The 5–7 percentage point advantage of content adaptivity over uniform embedding persists even at very low rates. This confirms that the fundamental principle of adaptive steganography – embedding where the image is complex – provides value across all operating regimes. The diminishing returns are specific to the precision of the cost function, not to the principle of content adaptation.

This finding has a clear practical implication: content adaptivity is essential; the specific cost function is secondary (at low rates).

4.6 Limitations of This Analysis

Several caveats apply to the interpretation of these results:

  1. BOSSBase bias: Most benchmarks use BOSSBase 1.01 (10,000 grayscale 512x512 images). These are relatively small, low-resolution images. Real-world smartphone photos are larger, higher-resolution, and in color. The detection gap may behave differently on more representative image datasets. The ALASKA dataset (Cogranne et al., 2020) partially addresses this but remains limited to 512x512 crops.

  2. Interpolated low-rate values: Published steganalysis benchmarks rarely report accuracy at 0.01–0.04 bpnzAC. Our interpolated values assume smooth, monotonic detection curves, which is consistent with empirical observations but not guaranteed.

  3. Matched training: Steganalyzers in these benchmarks are trained and tested on the same steganographic method. In practice, an attacker may not know which method was used, and mismatched detectors perform worse. This generally works in the steganographer’s favor.

  4. Color channel considerations: J-UNIWARD and UERD are defined for grayscale (luminance). Color JPEG images have chroma channels (Cb, Cr) that may be exploited for additional capacity or may leak information. The relative performance of UERD vs. J-UNIWARD on chroma channels is less studied.

  5. Image source matters: Detection accuracy depends heavily on the source image distribution. Images from modern smartphone cameras with heavy noise reduction and sharpening may behave differently from the DSLR images in BOSSBase.


5. Implications for Practitioners

5.1 Choosing a Cost Function

Based on this analysis, we recommend J-UNIWARD as the default choice for new implementations. The computational and memory barriers that once favored UERD have been substantially reduced through optimization (see Section 4.3). However, the data shows UERD remains a defensible choice in specific scenarios:

UERD is sufficient when: - Your embedding rate is below 0.05 bpnzAC (short messages in large images) - You need a working prototype quickly and plan to upgrade later - Memory is severely constrained (below 200 MB available)

J-UNIWARD is preferred (and is now practical) when: - You want the maximum security margin at any embedding rate - Your application may encounter higher rates (small images, larger messages) - You want future-proofing against advances in steganalysis - With optimized implementations, memory is no longer prohibitive (187 MB peak for 12 MP images)

Always use content-adaptive methods over non-adaptive methods. The 5–7 percentage point advantage of even the simplest adaptive cost function (UERD) over non-adaptive embedding (nsF5) persists across all rates and is larger than the UERD-to-J-UNIWARD gap at rates below 0.1 bpnzAC.

Phasm’s trajectory illustrates a pragmatic approach: ship with UERD to validate the full pipeline, then upgrade to J-UNIWARD once the engineering challenges (deterministic cross-platform math, memory optimization) are solved. The benchmarks in this post confirmed that the initial UERD release was operating safely within the undetectable regime, giving the team time to get J-UNIWARD right.

5.2 Embedding Rate as the Primary Security Control

The single most effective way to reduce detection risk is to reduce the embedding rate. This can be achieved by:

  1. Using larger cover images: A 12 MP photograph at QF 85 has approximately 600,000 non-zero AC coefficients, yielding 0.014 bpnzAC for a 1 KB payload. At this rate, detection is indistinguishable from random chance under any known classifier.

  2. Minimizing payload size: Compressing the message, using shorter text, or accepting a lower redundancy level all reduce $M$ and therefore $\alpha$.

  3. Selecting high-complexity images: Images with rich texture (landscapes, cityscapes, crowds) have more non-zero AC coefficients than simple images (studio portraits, product shots). They also have more low-cost embedding positions.

The relationship between embedding rate and detection accuracy is approximately:

$$P_{\text{detect}} \approx 50 + c \cdot \alpha^{\beta}$$

where $c$ and $\beta$ are method-dependent constants. For J-UNIWARD with SRNet, empirical fitting suggests $\beta \approx 0.6$–$0.7$, meaning detection accuracy grows sublinearly with embedding rate. Halving the embedding rate does not halve the excess detection probability, but it does provide meaningful improvement.

5.3 How These Benchmarks Informed Phasm’s Approach

Phasm, the steganography application that motivated this research, initially shipped its Ghost mode with UERD as the cost function. The benchmarks presented in this post were central to that decision: at Phasm’s typical operating range of 0.02–0.04 bpnzAC (short text messages in smartphone photographs), the practical detection difference between UERD and J-UNIWARD is negligible – both produce stego images that are near-undetectable by any known steganalysis method. UERD’s simpler DCT-domain computation allowed a faster initial implementation across all three platforms (iOS, Android, and WebAssembly via a pure Rust JPEG codec).

As of version 1.3.0 (February 24, 2026), Phasm has completed the upgrade to J-UNIWARD, marking the completion of Ghost Phase 2. Two engineering challenges had to be solved to make this practical. First, J-UNIWARD’s wavelet-domain computation requires deterministic trigonometric functions to guarantee identical embedding decisions across iOS, Android, and WebAssembly – addressed by Phasm’s FDLIBM-based math module. Second, J-UNIWARD’s peak memory consumption was reduced by 71% (from 651 MB to 187 MB for a 12 MP photograph) through tiled wavelet processing and buffer reuse, making it viable on mobile devices with limited memory. The data in this post validated that the initial UERD release was operating safely within the undetectable regime while these optimizations were developed, and the upgrade to J-UNIWARD now provides the additional security margin discussed in Section 4.2 – particularly benefiting users who embed at higher rates (larger messages or smaller images, where the gap widens to 3–9 percentage points).

Phasm pairs J-UNIWARD with binary Syndrome-Trellis Codes at constraint height $h = 7$ (128 trellis states, approximately 6 MB memory) and always-on AES-256-GCM-SIV authenticated encryption. The full implementation is open source on GitHub (GPL-3.0). The encryption layer ensures that even if the steganographic embedding were somehow detected and the message extracted, the payload would appear as uniformly random ciphertext, indistinguishable from the random noise inherent in JPEG quantization residuals.

5.4 Beyond Cost Functions: Other Security Factors

Cost function choice is only one factor in a steganographic system’s security. Several other design decisions have comparable or greater impact:

Coefficient selection and permutation: The order in which DCT coefficients are presented to the STC must be pseudorandomly permuted using a key-derived CSPRNG (e.g., Fisher-Yates shuffle seeded with ChaCha20). Without this, the sequential structure of the embedding could leak information about the STC’s parity-check matrix.

Payload encryption: Always encrypt before embedding. An unencrypted payload has low entropy in its plaintext regions, which could assist a targeted steganalyzer. AES-256-GCM-SIV or similar authenticated encryption ensures the embedded bitstream is indistinguishable from random.

Cover image selection: Never embed in an image that exists elsewhere in unmodified form. If an adversary obtains both the cover and stego images, they can compute the difference directly, trivially revealing every modified coefficient. Fresh photographs taken specifically for embedding are ideal.

Side-channel considerations: Metadata (EXIF, timestamps, file size) can reveal steganographic activity. Metadata stripping, consistent file sizes, and natural-looking creation timestamps are all relevant to operational security.

Delivery channel: The analysis above assumes the stego JPEG reaches the recipient intact. If the image will pass through a platform that recompresses or resizes it, Ghost mode’s undetectability is moot – the message may not survive at all. For a survey of how 15 platforms process your photos, see our companion post. When robustness to recompression is the priority rather than stealth, Armor mode’s STDM-based approach trades detection resistance for survival under lossy re-encoding.


6. Conclusion

At the embedding rates typical of short-message steganography in modern smartphone photographs (0.02–0.04 bpnzAC), the detection accuracy gap between UERD and J-UNIWARD is 1–2 percentage points under state-of-the-art deep learning steganalysis – both operating within 4–6 points of the 50% random-chance baseline. This gap is smaller than typical classifier training variance and has no operational significance for detection decisions.

The gap becomes meaningful at higher embedding rates. Above 0.1 bpnzAC, J-UNIWARD’s wavelet-domain cost computation provides a progressively larger advantage, reaching 9 percentage points at 0.4 bpnzAC under SRNet.

For practitioners designing steganographic systems:

  1. Content adaptivity is the critical design choice. Any adaptive cost function (UERD, J-UNIWARD, WOW) dramatically outperforms non-adaptive methods (nsF5, F5, LSB) at all rates.

  2. At low rates, simpler cost functions suffice. UERD provides approximately 85–90% of J-UNIWARD’s security at roughly 10% of the computational cost, and the remaining gap is operationally insignificant below 0.05 bpnzAC.

  3. Embedding rate is the primary security control. Reducing $\alpha$ by using larger cover images or shorter payloads provides more security benefit than upgrading the cost function.

  4. Defense in depth matters. Payload encryption, coefficient permutation, and cover image freshness contribute as much to practical security as the choice between UERD and J-UNIWARD.

These findings do not diminish J-UNIWARD’s contribution to the field. It remains the strongest non-side-informed cost function and provides the maximum security margin when operating at higher rates. The data supports a pragmatic engineering approach: UERD is a defensible starting point at low embedding rates, giving implementers time to solve the engineering challenges of J-UNIWARD (deterministic cross-platform math, memory optimization). Phasm followed exactly this path, shipping with UERD initially and upgrading to J-UNIWARD in version 1.3.0 once a 71% memory reduction (651 MB to 187 MB) made it practical on mobile. For implementers building new steganographic systems today, J-UNIWARD should be the default target – the optimization techniques that make it mobile-friendly are now well understood.


Frequently Asked Questions

Can steganography be detected by AI?

Yes, deep learning steganalyzers like SRNet can detect steganographic modifications in images, but their accuracy depends heavily on the embedding rate. At the low embedding rates used in practice (0.02–0.04 bpnzAC), even the best AI-based detectors achieve only 51–54% accuracy – barely above the 50% random-chance baseline. Detection becomes meaningful only at higher embedding rates (above 0.1 bpnzAC), where classifiers can reach 62–78% accuracy depending on the method used.

What is the difference between UERD and J-UNIWARD?

UERD (Uniform Embedding Revisited Distortion) computes embedding costs entirely in the DCT domain using block energy and frequency weighting, making it fast and memory-efficient. J-UNIWARD (JPEG Universal Wavelet Relative Distortion) decompresses the image to the spatial domain and measures distortion using directional Daubechies wavelet filters across three subbands, producing more precise cost assignments at roughly 10x the computational expense. At high embedding rates, J-UNIWARD provides up to 9 percentage points better detection resistance, but at low rates the gap shrinks to 1–2 points.

What does bpnzAC mean in steganography?

bpnzAC stands for “bits per non-zero AC coefficient” and is the standard unit for measuring JPEG steganographic embedding rate. It normalizes the message payload by the number of embeddable DCT coefficients in the image. A 1 KB message hidden in a 12-megapixel photo produces a rate of approximately 0.014 bpnzAC, while the same message in a 0.3-megapixel image yields 0.339 bpnzAC – a difference that significantly affects detection risk.

How effective is steganalysis at low embedding rates?

At embedding rates below 0.05 bpnzAC, steganalysis is largely ineffective against content-adaptive methods. Under SRNet (the strongest general-purpose deep learning steganalyzer), both UERD and J-UNIWARD achieve detection accuracy of 52–56% at 0.04 bpnzAC, which is within 4–6 percentage points of the 50% random-chance baseline and well within the variance of classifier training. For short text messages in modern smartphone photographs, steganalysis cannot produce actionable detection confidence.


References

  1. A. Westfeld and A. Pfitzmann, “Attacks on Steganographic Systems,” Proceedings of the 3rd International Workshop on Information Hiding, pp. 61–76, 1999.

  2. A. Westfeld, “F5 – A Steganographic Algorithm: High Capacity Despite Better Steganalysis,” Proceedings of the 4th International Workshop on Information Hiding, pp. 289–302, 2001.

  3. J. Fridrich, T. Pevny, and J. Kodovsky, “Statistically Undetectable JPEG Steganography: Dead Ends, Challenges, and Opportunities,” Proceedings of the ACM Multimedia and Security Workshop, pp. 3–14, 2007.

  4. T. Pevny, T. Filler, and P. Bas, “Using High-Dimensional Image Models to Perform Highly Undetectable Steganography,” Proceedings of the 12th International Conference on Information Hiding, pp. 161–177, 2010.

  5. T. Filler, J. Judas, and J. Fridrich, “Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011.

  6. V. Holub and J. Fridrich, “Designing Steganographic Distortion Using Directional Filters,” IEEE International Workshop on Information Forensics and Security, pp. 234–239, 2012.

  7. V. Holub, J. Fridrich, and T. Denemark, “Universal Distortion Function for Steganography in an Arbitrary Domain,” EURASIP Journal on Information Security, vol. 2014, no. 1, 2014.

  8. T. Denemark, V. Sedighi, V. Holub, R. Cogranne, and J. Fridrich, “Selection-Channel-Aware Rich Model for Steganalysis of Digital Images,” IEEE International Workshop on Information Forensics and Security, 2014.

  9. L. Guo, J. Ni, and Y. Q. Shi, “Uniform Embedding for Efficient JPEG Steganography,” IEEE Transactions on Information Forensics and Security, vol. 10, no. 12, pp. 2561–2574, 2015.

  10. X. Song, F. Liu, C. Yang, X. Luo, and Y. Zhang, “Steganalysis of Adaptive JPEG Steganography Using 2D Gabor Filters,” Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, pp. 15–23, 2015.

  11. G. Xu, H. Z. Wu, and Y. Q. Shi, “Structural Design of Convolutional Neural Networks for Steganalysis,” IEEE Signal Processing Letters, vol. 23, no. 5, pp. 708–712, 2016.

  12. M. Yedroudj, F. Comby, and M. Chaumont, “Yedroudj-Net: An Efficient CNN for Spatial Steganalysis,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2092–2096, 2018.

  13. M. Boroumand, M. Chen, and J. Fridrich, “Deep Residual Network for Steganalysis of Digital Images,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 5, pp. 1181–1193, 2019.

  14. R. Cogranne, Q. Giboulot, and P. Bas, “ALASKA-2: Challenging Academic Research on Steganalysis with Realistic Images,” IEEE International Workshop on Information Forensics and Security, 2020.

  15. Y. Yousfi, J. Butora, J. Fridrich, and Q. Giboulot, “Breaking ALASKA: Color Separation for Steganalysis in JPEG Domain,” Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, pp. 138–149, 2020.

  16. J. Butora and J. Fridrich, “Effect of JPEG Quality on Steganographic Security,” Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, pp. 47–56, 2022.

  17. B. Lorch, “Off-by-One Implementation Error in J-UNIWARD,” arXiv preprint, arXiv:2305.19776, 2023.


This post is part of Phasm’s research paper series exploring the technical foundations of modern steganography. Phasm is a free, privacy-first steganography app that hides text messages in JPEG photos using content-adaptive embedding and always-on encryption. Available for iOS, Android, and web.