Abstract
Standard content-adaptive steganography treats every coefficient modification as equally costly in both directions: pushing a DCT coefficient up by one costs the same as pushing it down by one. When the encoder has access to the original uncompressed pixels – because the user captured a photo in-app or uploaded a PNG – it knows something the decoder does not: the quantization rounding errors. SI-UNIWARD exploits this side information to assign asymmetric embedding costs, preferentially modifying coefficients that were already near the quantization boundary. The result is roughly 1.5–2x capacity at the same steganalysis detection risk, or equivalently, the same capacity with significantly lower statistical distortion. We explain the theory behind side-informed embedding, walk through Phasm’s pure-Rust implementation, and prove that the decoder requires exactly zero changes – SI-UNIWARD-encoded images decode with the same ghost_decode() function that handles standard J-UNIWARD.
1. The Rounding Error Insight
Every JPEG image is built on the same lossy pipeline: divide the image into 8x8 pixel blocks, apply the forward Discrete Cosine Transform (DCT), divide each coefficient by a quantization table value, and round to the nearest integer. That final rounding step is where information is irreversibly lost – and where the opportunity for side-informed embedding begins.
Consider a single DCT coefficient. After the forward DCT and division by the quantization step, suppose the continuous value is 7.4. Standard JPEG rounds this to 7. The rounding error is:
$$e = 7.4 - 7 = +0.4$$
This error tells us something valuable: the coefficient was almost 8. Pushing it from 7 to 8 is a small perturbation – the pixel-domain change is minimal because we are moving the coefficient back toward where it “wanted” to be before quantization. Pushing it from 7 to 6, on the other hand, moves it away from the pre-quantization value, introducing more perceptual distortion.
Now consider a coefficient with continuous value 7.02, rounded to 7. The rounding error is +0.02 – this coefficient landed squarely on the integer. Flipping it in either direction costs roughly the same.
The key insight: coefficients with large rounding errors (close to +/-0.5) are cheap to modify in the preferred direction. A coefficient at 6.48 (error = -0.48) is practically sitting on the boundary between 6 and 7. Nudging it from 7 to 6 costs almost nothing perceptually.
In standard J-UNIWARD (which we covered in our detection benchmarks and STC implementation guide), costs are symmetric: rho_plus = rho_minus = rho. The Syndrome-Trellis Code decides whether to flip each coefficient, and the direction is chosen independently using the nsF5 convention (toward zero for coefficients with absolute value > 1).
SI-UNIWARD breaks this symmetry. When side information is available, the cost of the preferred direction drops dramatically while the anti-preferred direction stays at the original cost. The STC sees lower costs overall, which means it can embed the same payload with fewer total modifications – or embed a larger payload at the same distortion budget.
2. J-UNIWARD vs SI-UNIWARD: What Changes
2.1 Cost Modulation
In standard J-UNIWARD, the embedding cost for each AC coefficient is computed from the wavelet-domain distortion caused by a +/-1 change. Because the UNIWARD distortion function uses absolute values in the numerator, both directions produce identical costs:
$$\rho^+_{i,j} = \rho^-_{i,j} = \rho_{i,j}$$
SI-UNIWARD modulates these symmetric costs using the rounding error $e$:
$$\rho^{SI}_{i,j} = \rho_{i,j} \times (1 - 2|e_{i,j}|)$$
The factor $(1 - 2|e|)$ ranges from 1.0 (when $|e| = 0$, no benefit) to 0.0 (when $|e| = 0.5$, maximum benefit). In practice, we clamp the modulated cost to a small positive floor ($10^{-6}$) to avoid the half-coefficient artifact – zero-cost embedding at quantization midpoints is a known detectable pattern in the steganalysis literature.
| Rounding error $\|e\|$ | Cost multiplier | Interpretation |
|---|---|---|
| 0.00 | 1.00 | No benefit – squarely on the integer |
| 0.10 | 0.80 | Slight discount |
| 0.25 | 0.50 | Half the original cost |
| 0.40 | 0.20 | 80% cheaper |
| 0.49 | 0.02 | Nearly free |
| 0.50 | clamped to $\epsilon$ | Half-coefficient floor |
2.2 Modification Direction
In standard Ghost mode, the modification direction follows the nsF5 convention: coefficients with $|c| > 1$ are pushed toward zero, and coefficients with $|c| = 1$ are pushed away from zero (anti-shrinkage, preventing the coefficient from collapsing to zero and becoming unrecoverable).
SI-UNIWARD changes the rule for $|c| > 1$: instead of toward zero, the modification goes toward the pre-quantization value:
- Rounding error $e > 0$ (precover was above the integer): modify $c \to c + 1$
- Rounding error $e \le 0$ (precover was at or below): modify $c \to c - 1$
The anti-shrinkage rule for $|c| = 1$ is preserved unchanged. These positions are always pushed away from zero regardless of the rounding error, because allowing a coefficient to reach zero would make the modification irreversible (zero cannot encode direction information).
2.3 What Stays the Same
This is the crucial part: SI-UNIWARD only changes the cost map and the modification direction. Everything else in the Ghost mode pipeline remains identical:
- STC encoder: The binary Viterbi trellis at $h = 7$ is unchanged. It sees a cost map with lower values on some positions, but the algorithm is the same.
- STC decoder: Syndrome extraction reads LSBs. Both $+1$ and $-1$ modifications produce the same LSB, so the direction is invisible to the decoder.
- Frame format: Same salt, nonce, CRC structure. No version flag needed.
- Key derivation: Same Argon2id + ChaCha20 PRNG. Same structural keys.
- Coefficient selection: Same non-DC, finite-cost AC positions.
- Permutation: Same Fisher-Yates shuffle with the same seeds.
3. When Do You Have Side Information?
Side information requires the pre-quantization DCT coefficients – the continuous values before rounding. This is only possible when the encoder performs the JPEG compression itself:
| Input format | Side info available? | Why |
|---|---|---|
| PNG, HEIC, WebP, RAW | Yes | App decodes to raw pixels, then JPEG-compresses. The rounding errors are computable. |
| In-app camera capture | Yes | On iOS (HEIC default) and Android (raw bitmap), the app has raw pixels before JPEG encoding. |
| Existing JPEG file | No | The original pre-quantization values are lost. Standard J-UNIWARD is used. |
Phasm detects the input format automatically on all platforms. When side information is available, SI-UNIWARD activates without user intervention – the app labels this Deep Cover with a green indicator pill, mirroring how Fortress mode auto-activates for short Armor messages.
On iOS, the default camera format is HEIC, so most in-app captures qualify for Deep Cover. On Android, the camera API returns a raw Bitmap. On the web, PNG and WebP uploads are decoded to ImageData pixels via canvas before JPEG encoding.
4. The Capacity Gain
4.1 Literature Results
The capacity advantage of side-informed embedding is well-established in the steganography literature. Fridrich and Kodovsky (2013) showed that access to the pre-quantization image roughly doubles the secure embedding capacity at a given detectability threshold. The intuition is straightforward: approximately half of all coefficients have rounding errors above 0.25 in magnitude, and for those coefficients, the preferred-direction cost is less than half the symmetric cost. The STC optimizer exploits these cheap positions, achieving the same syndrome with significantly fewer expensive modifications.
4.2 Phasm’s Conservative Estimate
Phasm uses a capacity ratio to determine the maximum safe embedding rate. For standard J-UNIWARD, the ratio is 5.0 – meaning one payload bit per 5 usable AC coefficients. For SI-UNIWARD, the ratio drops to 3.5:
$$\text{capacity}_{SI} = \frac{\text{usable positions}}{3.5 \times 8} \text{ bytes}$$
This gives approximately 43% more capacity than standard J-UNIWARD. We deliberately underestimate relative to the literature’s ~2x figure because:
- Real-world photos have varying DCT complexity – heavily textured regions benefit more than smooth areas.
- Platform JPEG encoders may use slightly different DCT implementations, introducing small mismatches between our forward DCT and theirs.
- We’d rather guarantee undetectability than maximize throughput.
For a concrete example: a 1024x768 photo with ~30,000 usable AC coefficients yields approximately 750 bytes capacity in J-UNIWARD and approximately 1,070 bytes in SI-UNIWARD – enough to hold a substantially longer message or a small file attachment.
4.3 Or: Same Message, Lower Distortion
The capacity framing tells only half the story. Most messages are short – a sentence or two. For these, SI-UNIWARD doesn’t increase the message size but instead reduces the statistical footprint of the embedding. The STC concentrates modifications on coefficients where the cost is lowest (those near the quantization boundary), making the resulting stego image even harder to distinguish from the unmodified cover.
At Phasm’s typical embedding rates of 0.02–0.04 bpnzAC, J-UNIWARD already pushes detection accuracy to near random chance (~52–56% for SRNet on BOSSBase at 0.05 bpnzAC). SI-UNIWARD pushes it further – fewer modifications, each one smaller in perceptual impact.
5. Implementation Deep Dive
The full implementation lives in Phasm’s open-source Rust core (phasmcore). Here are the key components.
5.1 Unquantized Forward DCT
The existing dct_block() function performs forward DCT on an 8x8 pixel block and returns quantized i16 coefficients. The new dct_block_unquantized() is a near-copy that skips the final .round():
/// Forward DCT -> divide by QT, but do NOT round.
/// Returns continuous f64 values.
pub fn dct_block_unquantized(
pixels: &[f64; 64],
qt: &[u16; 64],
) -> [f64; 64] {
// Same DCT as dct_block, but return val / qt as f64
// instead of (val / qt).round() as i16
}
The rounding error for each coefficient is then unquantized[k] - quantized[k] as f64, guaranteed to lie in $[-0.5, +0.5]$.
5.2 Side Information Computation
The SideInfo struct holds per-coefficient rounding errors in DctGrid flat order:
pub struct SideInfo {
pub rounding_errors: Vec<f64>,
pub blocks_wide: usize,
pub blocks_tall: usize,
}
SideInfo::compute() takes raw RGB pixels and the cover JPEG’s DctGrid:
- Convert RGB to Y (luminance) as 8x8 block arrays using BT.601 coefficients
- For each block: forward DCT, divide by quantization table (no rounding)
- Subtract the cover integer coefficient
- Clamp to $[-0.5, +0.5]$ for robustness against minor floating-point differences between platform JPEG encoders and our deterministic DCT implementation
5.3 Cost Modulation
modulate_costs_si() walks the J-UNIWARD cost map and applies the $(1 - 2|e|)$ factor:
pub fn modulate_costs_si(
cost_map: &mut CostMap,
side_info: &SideInfo,
cover_grid: &DctGrid,
) {
for each AC coefficient (i, j) in each block:
if DC or WET or |coeff| == 1: skip
let error = side_info.error_at(flat_idx);
let factor = (1.0 - 2.0 * error.abs()) as f32;
let modulated = (cost * factor).max(MIN_SI_COST);
cost_map.set(br, bc, i, j, modulated);
}
Three categories are explicitly skipped:
- DC coefficients remain WET (infinite cost) – modifying the DC coefficient introduces visible brightness shifts
- WET positions (already infinite cost) stay untouched
- |coeff| = 1 positions keep their symmetric cost because anti-shrinkage forces the direction regardless of the rounding error
5.4 Direction Selection
The modification direction is determined at embedding time:
pub fn si_modify_coefficient(coeff: i16, rounding_error: f64) -> i16 {
if coeff == 1 { 2 } // anti-shrinkage
else if coeff == -1 { -2 } // anti-shrinkage
else if rounding_error > 0.0 {
coeff + 1 // precover was above -> go up
} else {
coeff - 1 // precover was at/below -> go down
}
}
Compare this with the standard nsF5 direction used in J-UNIWARD, where |coeff| > 1 always moves toward zero:
pub fn nsf5_modify_coefficient(coeff: i16) -> i16 {
if coeff == 1 { 2 }
else if coeff == -1 { -2 }
else if coeff > 0 { coeff - 1 } // toward zero
else { coeff + 1 } // toward zero
}
5.5 Public API
The public interface mirrors the existing Ghost encode API:
pub fn ghost_encode_si(
image_bytes: &[u8], // cover JPEG
raw_pixels_rgb: &[u8], // original RGB pixels
pixel_width: u32,
pixel_height: u32,
message: &str,
passphrase: &str,
) -> Result<Vec<u8>, StegoError>
Internally, both ghost_encode() and ghost_encode_si() call the same ghost_encode_impl() with an Option<SideInfo> parameter. When None, standard J-UNIWARD is used. When Some(si), cost modulation and SI-direction are applied.
6. Backward Compatibility: Why the Decoder Doesn’t Care
This is worth stating explicitly, because it is the property that makes SI-UNIWARD a clean upgrade with zero migration risk.
The Ghost mode decoder performs syndrome extraction:
$$\textbf{m} = \hat{H} \times \textbf{s}$$
where $\textbf{s}$ is the vector of LSBs from selected stego coefficients and $\hat{H}$ is the submatrix of the parity-check matrix. The decoder reads LSBs. It does not know or care:
- Which direction the coefficient was modified (both $+1$ and $-1$ produce the same LSB for any starting value)
- What cost the encoder assigned to each position (costs are encoder-side only)
- Whether the encoder used J-UNIWARD or SI-UNIWARD cost computation
The coefficient selection (which positions are embeddable), the permutation order, the frame format, and the key derivation are all identical. Therefore:
ghost_decode(si_encoded_image, passphrase) == ghost_decode(j_encoded_image, passphrase)
Both paths produce the correct message. No version flag. No fallback logic. No compatibility shim.
This also means that all existing J-UNIWARD-encoded images continue to decode without any change. The SI-UNIWARD upgrade is entirely encoder-side.
7. The Detection Angle
Steganalyzers – both traditional feature-based methods (SRM, maxSRMd2) and deep learning classifiers (SRNet, ZhuNet, Steg-GMAN) – work by detecting statistical anomalies that deviate from the “natural” distribution of JPEG DCT coefficients.
SI-UNIWARD makes detection harder for two complementary reasons:
Fewer modifications at the same payload size. Because the STC can route more of the embedding through cheap (high rounding error) positions, the total number of modified coefficients decreases. Fewer changes means a smaller statistical footprint.
Each modification looks more natural. When a coefficient with continuous pre-quantization value 7.48 is pushed from 7 to 8, it moves to a value that is perceptually almost identical to the original uncompressed signal. The modification blends into the natural quantization noise of the JPEG format itself. By contrast, the standard nsF5 direction (toward zero, so 7 to 6) moves the coefficient further from its pre-quantization value, introducing a larger statistical deviation.
At Phasm’s conservative embedding rates (~0.02–0.04 bpnzAC), the detection advantage of SI-UNIWARD over J-UNIWARD is modest in absolute terms – both are already near the classifier’s chance level. The practical benefit is a larger safety margin: if a future steganalyzer improves detection of J-UNIWARD at 0.03 bpnzAC by a few percentage points, SI-UNIWARD images at the same rate remain undetectable.
For a deeper dive into detection benchmarks at low embedding rates, see our UERD vs J-UNIWARD comparison.
8. Try It Yourself
SI-UNIWARD is live in Phasm across all platforms:
- Web: Open phasm.app, select Ghost mode, and upload a PNG image. The capacity indicator will show the SI-UNIWARD (Deep Cover) capacity – roughly 43% more than if you upload the same image as JPEG.
- iOS / Android: Take a photo in-app or pick a non-JPEG from your library. The “Deep Cover” pill appears automatically next to the Ghost mode selector.
- Source code: The implementation is in
core/src/stego/side_info.rs– approximately 200 lines of documented Rust.
For a hands-on comparison: encode the same short message in the same photo, once as JPEG (J-UNIWARD) and once as PNG (SI-UNIWARD). Compare the capacity shown by Phasm. Then decode both – the same ghost_decode handles both transparently.
Frequently Asked Questions
Does SI-UNIWARD work with existing JPEG photos?
No. If the input is already a JPEG, the pre-quantization values are lost – they were discarded during the original compression. Phasm uses standard J-UNIWARD for JPEG inputs. SI-UNIWARD activates only for non-JPEG inputs (PNG, HEIC, WebP, RAW, or in-app camera captures) where Phasm controls the JPEG compression step.
Can someone detect whether SI-UNIWARD or J-UNIWARD was used?
The decoder cannot distinguish them – it reads LSBs identically. A steganalyzer looking at the stego image also cannot reliably tell which variant was used, because SI-UNIWARD produces fewer and less detectable modifications. If anything, SI-UNIWARD images are harder to classify as stego than J-UNIWARD images at the same payload size.
Does this affect Armor mode?
No. Armor mode uses STDM/QIM embedding with Reed-Solomon error correction – a completely different embedding strategy optimized for robustness rather than stealth. SI-UNIWARD is Ghost mode only, where the goal is to minimize the statistical footprint of embedding.
Is the Deep Cover label visible to recipients?
No. “Deep Cover” is an encoder-side indicator only. The recipient sees a normal Ghost mode decode result with no indication of whether the sender used J-UNIWARD or SI-UNIWARD. The decode quality and integrity score are computed identically in both cases.
References
-
Holub, V., Fridrich, J., and Denemark, T. (2014). “Universal Distortion Function for Steganography in an Arbitrary Domain.” EURASIP Journal on Information Security, 2014:1.
-
Fridrich, J. and Kodovsky, J. (2013). “Steganalysis of JPEG images using rich models.” Proceedings of SPIE, Media Watermarking, Security, and Forensics.
-
Filler, T., Judas, J., and Fridrich, J. (2011). “Minimizing Additive Distortion in Steganography Using Syndrome-Trellis Codes.” IEEE Transactions on Information Forensics and Security, 6(3):920–935.
-
Chen, B. and Wornell, G.W. (2001). “Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding.” IEEE Transactions on Information Theory, 47(4):1423–1443.
-
Phasm core engine (phasmcore): github.com/cgaffga/phasmcore –
core/src/stego/side_info.rs,core/src/stego/cost/mod.rs,core/src/jpeg/pixels.rs.