ELA Pixel Detective
AGT-FOR-002 employs Error Level Analysis (ELA) combined with a convolutional neural network classifier to detect digitally manipulated regions within claim images. The agent re-compresses each submitted JPEG image at a fixed quality level (typically 95%) and computes the absolute pixel-wise difference between the re-compressed version and the original. In an unmodified image, all regions exhibit a uniform error level because they were all compressed together. Regions that were copy-pasted, airbrushed, or AI-generated exhibit significantly different error levels because they originate from a different compression history. The CNN classifier, trained on over 400,000 labelled authentic and manipulated images, interprets the ELA heatmap to localise tampering regions with bounding boxes and assigns a manipulation probability score.
Tech Stack
Input
A single JPEG or PNG image file. For best results, the image should not have been re-saved after the original capture.
Accepted Formats
Fields
| Name | Type | Req | Description |
|---|---|---|---|
| image_file | binary | Yes | Raw image bytes |
| ela_quality | int | No | JPEG re-compression quality level 1–100 (default: 95) |
| ela_amplification | int | No | Brightness multiplier for ELA visualisation (default: 20) |
| cnn_threshold | float | No | Manipulation probability threshold to trigger FLAG (default: 0.65) |
Output
ELA heatmap image, manipulation probability, bounding boxes of suspected tampered regions, and an overall verdict.
Format:
JSONFields
| Name | Type | Description |
|---|---|---|
| ela_heatmap_b64 | base64 string | PNG heatmap image encoded in base64 for inline display |
| manipulation_probability | float | CNN confidence that the image contains at least one manipulated region (0.0–1.0) |
| tampered_regions | array<object> | List of bounding boxes: {x, y, w, h, local_score} for each suspicious area |
| ela_mean_error | float | Mean ELA pixel error across the image |
| ela_std_error | float | Standard deviation of ELA pixel error — high std indicates uneven compression history |
| flags | array<string> | FLAG_REGION_SPLICE, FLAG_CLONE_STAMP, FLAG_AI_GENERATED, FLAG_RECOMPRESSED |
| risk_score | float | Normalised risk contribution 0.0–1.0 |
| verdict | string | PASS | FLAG | INCONCLUSIVE |
Example Response
{
"manipulation_probability": 0.89,
"tampered_regions": [
{"x": 142, "y": 310, "w": 88, "h": 56, "local_score": 0.94}
],
"ela_mean_error": 12.4,
"ela_std_error": 38.7,
"flags": ["FLAG_REGION_SPLICE"],
"risk_score": 0.87,
"verdict": "FLAG"
}
How It Works
Error Level Analysis is a forensic technique based on the lossy nature of JPEG compression. Every time a JPEG is saved, its pixel values change slightly to accommodate the compression algorithm's block-based DCT quantisation. If an image is unmodified, all regions share the same compression history — they were all saved together at the same quality settings. When someone pastes a region from another image into the photo (for example, changing a vehicle registration plate or removing visible damage), that pasted region has a different compression history.
AGT-FOR-002 exploits this asymmetry by re-compressing the image at a fixed quality (95%) and computing the absolute difference heatmap. Authentic regions show low, uniform error. Spliced or cloned regions show anomalously high or anomalously low error, creating bright spots in the heatmap.
However, raw ELA is noisy and can produce false positives on legitimate image features like edges and text. This is where the CNN adds value. The network has learned, from hundreds of thousands of labelled examples, which ELA patterns correspond to genuine manipulation versus natural high-error regions. It outputs a probability score and, via GradCAM, localises the specific bounding boxes of suspicious areas.
For AI-generated images (deepfakes, Stable Diffusion outputs), the ELA pattern is distinctive: the entire image tends to have uniformly high error because generative models do not use standard JPEG compression during synthesis. The agent detects this pattern with a dedicated FLAG_AI_GENERATED flag.
The output is a structured JSON response plus a base64-encoded heatmap image that adjudicators can view in the claims portal to understand exactly what the model found suspicious.
Thinking Steps
Image Ingestion & Format Normalisation
Load the submitted binary using Pillow. If PNG, convert to RGB (removing alpha channel). Verify the image dimensions are at least 64×64 pixels to ensure ELA has meaningful signal.
Very small images (thumbnails) produce noisy ELA with many false positives.
ELA Re-compression
Save a copy of the image to an in-memory buffer at the configured quality level (default 95). The key insight: a genuine photo at this quality has been compressed once; a manipulated photo has regions with different prior compression histories, which will manifest as uneven error levels.
Quality 95 is chosen because it preserves enough detail to reveal forensic differences while still introducing measurable compression artefacts.
Error Map Computation
Compute the absolute pixel-wise difference between the original image and its re-compressed copy using NumPy. Multiply by the amplification factor (default 20) to make subtle differences visible. This produces the ELA heatmap.
Higher amplification helps human reviewers see differences but does not change the CNN's numerical input.
Statistical Analysis of Error Distribution
Compute the mean and standard deviation of ELA values across all pixels. A high standard deviation relative to the mean indicates regions with inconsistent compression history — the primary statistical signature of splicing.
Background sky regions typically show very low ELA while text overlays (phone numbers added after capture) show high ELA.
CNN Classification
Resize the ELA heatmap to 224×224 and normalise to ImageNet statistics. Pass through the fine-tuned ResNet-50 classifier to obtain a manipulation probability. The CNN was trained on the CASIA2.0 and Columbia Uncompressed Image Splicing datasets augmented with insurance-domain synthetic manipulations.
The model runs in ONNX Runtime for ~50 ms inference latency on CPU.
Region Localisation via GradCAM
If the CNN probability exceeds 0.5, run GradCAM on the final convolutional layer to produce a saliency map. Threshold the saliency map at the 90th percentile and find contours using OpenCV to extract bounding boxes of tampered regions.
GradCAM ensures the model's decision is spatially interpretable — adjudicators can see exactly which region the CNN found suspicious.
Verdict Assembly
Combine statistical ELA score (std_error/mean_error ratio), CNN probability, and number of localised regions into a final risk score. Assign verdict: PASS if risk_score < 0.20, FLAG if >= 0.65, INCONCLUSIVE otherwise.
The inconclusive band (0.20–0.65) triggers a human review workflow rather than automatic rejection.
Thinking Tree
-
Root Question: Has the submitted image been digitally manipulated?
-
Compute ELA heatmap
-
Is ELA standard deviation anomalously high?
- Yes — strong compression inconsistency detected
- No — uniform compression history
-
Is ELA standard deviation anomalously high?
-
CNN classification of ELA map
-
Manipulation probability > 0.65?
- Yes → FLAG, run GradCAM localisation
- 0.20–0.65 → INCONCLUSIVE, request human review
- < 0.20 → PASS
-
Manipulation probability > 0.65?
-
Check for AI-generation signature
- Uniform high ELA across entire image → FLAG_AI_GENERATED
- Localised high ELA spots → FLAG_REGION_SPLICE
-
Compute ELA heatmap
Decision Tree
Is the image format JPEG or PNG?
ELA std_error / mean_error ratio > 3.0?
CNN manipulation probability > 0.65?
Uniform high ELA across > 80% of image?
CNN probability 0.20–0.65?
FLAG — Region splicing or clone stamp detected with high confidence
FLAG — AI-generated image signature; not a genuine photograph
INCONCLUSIVE — Non-standard format; ELA not applicable
INCONCLUSIVE — Moderate manipulation signal; escalate to human reviewer
PASS — No significant manipulation detected
Technical Design
Architecture
AGT-FOR-002 is a stateless async FastAPI microservice. ELA computation is pure NumPy (CPU-bound, ~20 ms for a 12 MP image). CNN inference uses ONNX Runtime with a quantised INT8 model for ~50 ms on CPU. GradCAM is computed only when manipulation probability exceeds 0.5, keeping the happy-path latency low. The service is containerised with a 512 MB memory limit.
Components
| Component | Role | Technology |
|---|---|---|
| ImageNormaliser | Loads binary, converts format, validates dimensions | Pillow 10.x |
| ELAEngine | Re-compresses image and computes difference heatmap | Pillow + NumPy |
| StatisticalAnalyser | Computes mean, std, and ratio of ELA pixel errors | NumPy |
| CNNClassifier | Classifies ELA map as authentic or manipulated | ONNX Runtime + ResNet-50 INT8 |
| GradCAMLocaliser | Produces saliency map and bounding boxes | PyTorch hooks + OpenCV contours |
| VerdictAssembler | Combines scores into risk_score and verdict | Pure Python |
Architecture Diagram
┌────────────────────────────┐
│ POST /analyze (image) │
└─────────────┬──────────────┘
│
▼
┌────────────────────────────┐
│ ImageNormaliser │
│ (format check, RGB conv) │
└─────────────┬──────────────┘
│
┌─────┴─────┐
▼ ▼
┌───────────┐ ┌────────────┐
│ ELAEngine │ │ Original │
│(recompress│ │ image copy │
│ & diff) │ └────────────┘
└─────┬─────┘
│ ELA heatmap
▼
┌─────────────────────────────┐
│ StatisticalAnalyser │
│ (mean, std, ratio) │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ CNNClassifier │
│ (ResNet-50 ONNX, 224×224) │
└──────────┬──────────────────┘
│ prob > 0.5?
▼
┌─────────────────────────────┐
│ GradCAMLocaliser │
│ (saliency → bboxes) │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ VerdictAssembler │
└─────────────────────────────┘
Data Flow