AI Agents / ELA Pixel Detective

AGT-FOR-002 Digital Forensics AI-based

ELA Pixel Detective

AGT-FOR-002 employs Error Level Analysis (ELA) combined with a convolutional neural network classifier to detect digitally manipulated regions within claim images. The agent re-compresses each submitted JPEG image at a fixed quality level (typically 95%) and computes the absolute pixel-wise difference between the re-compressed version and the original. In an unmodified image, all regions exhibit a uniform error level because they were all compressed together. Regions that were copy-pasted, airbrushed, or AI-generated exhibit significantly different error levels because they originate from a different compression history. The CNN classifier, trained on over 400,000 labelled authentic and manipulated images, interprets the ELA heatmap to localise tampering regions with bounding boxes and assigns a manipulation probability score.

Tech Stack

Python 3.11 Runtime

Pillow 10.x Image I/O, JPEG re-compression at fixed quality

NumPy 1.26 Pixel-wise arithmetic for ELA difference map

scikit-image 0.22 Image normalisation, histogram analysis, region labelling

PyTorch 2.x CNN inference engine

torchvision Pre-trained ResNet backbone, transforms

OpenCV 4.x Bounding box drawing, contour detection on heatmap

ONNX Runtime Optimised model serving in production

Input

A single JPEG or PNG image file. For best results, the image should not have been re-saved after the original capture.

Accepted Formats

JPEG PNG

Fields

Name	Type	Req	Description
image_file	binary	Yes	Raw image bytes
ela_quality	int	No	JPEG re-compression quality level 1–100 (default: 95)
ela_amplification	int	No	Brightness multiplier for ELA visualisation (default: 20)
cnn_threshold	float	No	Manipulation probability threshold to trigger FLAG (default: 0.65)

Output

ELA heatmap image, manipulation probability, bounding boxes of suspected tampered regions, and an overall verdict.

Format:

JSON

Fields

Name	Type	Description
ela_heatmap_b64	base64 string	PNG heatmap image encoded in base64 for inline display
manipulation_probability	float	CNN confidence that the image contains at least one manipulated region (0.0–1.0)
tampered_regions	array<object>	List of bounding boxes: {x, y, w, h, local_score} for each suspicious area
ela_mean_error	float	Mean ELA pixel error across the image
ela_std_error	float	Standard deviation of ELA pixel error — high std indicates uneven compression history
flags	array<string>	FLAG_REGION_SPLICE, FLAG_CLONE_STAMP, FLAG_AI_GENERATED, FLAG_RECOMPRESSED
risk_score	float	Normalised risk contribution 0.0–1.0
verdict	string	PASS \| FLAG \| INCONCLUSIVE

Example Response

{
  "manipulation_probability": 0.89,
  "tampered_regions": [
    {"x": 142, "y": 310, "w": 88, "h": 56, "local_score": 0.94}
  ],
  "ela_mean_error": 12.4,
  "ela_std_error": 38.7,
  "flags": ["FLAG_REGION_SPLICE"],
  "risk_score": 0.87,
  "verdict": "FLAG"
}

How It Works

Error Level Analysis is a forensic technique based on the lossy nature of JPEG compression. Every time a JPEG is saved, its pixel values change slightly to accommodate the compression algorithm's block-based DCT quantisation. If an image is unmodified, all regions share the same compression history — they were all saved together at the same quality settings. When someone pastes a region from another image into the photo (for example, changing a vehicle registration plate or removing visible damage), that pasted region has a different compression history.

AGT-FOR-002 exploits this asymmetry by re-compressing the image at a fixed quality (95%) and computing the absolute difference heatmap. Authentic regions show low, uniform error. Spliced or cloned regions show anomalously high or anomalously low error, creating bright spots in the heatmap.

However, raw ELA is noisy and can produce false positives on legitimate image features like edges and text. This is where the CNN adds value. The network has learned, from hundreds of thousands of labelled examples, which ELA patterns correspond to genuine manipulation versus natural high-error regions. It outputs a probability score and, via GradCAM, localises the specific bounding boxes of suspicious areas.

For AI-generated images (deepfakes, Stable Diffusion outputs), the ELA pattern is distinctive: the entire image tends to have uniformly high error because generative models do not use standard JPEG compression during synthesis. The agent detects this pattern with a dedicated FLAG_AI_GENERATED flag.

The output is a structured JSON response plus a base64-encoded heatmap image that adjudicators can view in the claims portal to understand exactly what the model found suspicious.

Thinking Steps

Image Ingestion & Format Normalisation

Load the submitted binary using Pillow. If PNG, convert to RGB (removing alpha channel). Verify the image dimensions are at least 64×64 pixels to ensure ELA has meaningful signal.

Very small images (thumbnails) produce noisy ELA with many false positives.

ELA Re-compression

Save a copy of the image to an in-memory buffer at the configured quality level (default 95). The key insight: a genuine photo at this quality has been compressed once; a manipulated photo has regions with different prior compression histories, which will manifest as uneven error levels.

Quality 95 is chosen because it preserves enough detail to reveal forensic differences while still introducing measurable compression artefacts.

Error Map Computation

Compute the absolute pixel-wise difference between the original image and its re-compressed copy using NumPy. Multiply by the amplification factor (default 20) to make subtle differences visible. This produces the ELA heatmap.

Higher amplification helps human reviewers see differences but does not change the CNN's numerical input.

Statistical Analysis of Error Distribution

Compute the mean and standard deviation of ELA values across all pixels. A high standard deviation relative to the mean indicates regions with inconsistent compression history — the primary statistical signature of splicing.

Background sky regions typically show very low ELA while text overlays (phone numbers added after capture) show high ELA.

CNN Classification

Resize the ELA heatmap to 224×224 and normalise to ImageNet statistics. Pass through the fine-tuned ResNet-50 classifier to obtain a manipulation probability. The CNN was trained on the CASIA2.0 and Columbia Uncompressed Image Splicing datasets augmented with insurance-domain synthetic manipulations.

The model runs in ONNX Runtime for ~50 ms inference latency on CPU.

Region Localisation via GradCAM

If the CNN probability exceeds 0.5, run GradCAM on the final convolutional layer to produce a saliency map. Threshold the saliency map at the 90th percentile and find contours using OpenCV to extract bounding boxes of tampered regions.

GradCAM ensures the model's decision is spatially interpretable — adjudicators can see exactly which region the CNN found suspicious.

Verdict Assembly

Combine statistical ELA score (std_error/mean_error ratio), CNN probability, and number of localised regions into a final risk score. Assign verdict: PASS if risk_score < 0.20, FLAG if >= 0.65, INCONCLUSIVE otherwise.

The inconclusive band (0.20–0.65) triggers a human review workflow rather than automatic rejection.

Thinking Tree

Root Question: Has the submitted image been digitally manipulated?
- Compute ELA heatmap
  - Is ELA standard deviation anomalously high?
    - Yes — strong compression inconsistency detected
    - No — uniform compression history
- CNN classification of ELA map
  - Manipulation probability > 0.65?
    - Yes → FLAG, run GradCAM localisation
    - 0.20–0.65 → INCONCLUSIVE, request human review
    - < 0.20 → PASS
- Check for AI-generation signature
  - Uniform high ELA across entire image → FLAG_AI_GENERATED
  - Localised high ELA spots → FLAG_REGION_SPLICE

Decision Tree

Is the image format JPEG or PNG?

Yes → d2 No → inconclusive_format

ELA std_error / mean_error ratio > 3.0?

Yes → d3 No → d4

CNN manipulation probability > 0.65?

Yes → flag_splice No → d4

Uniform high ELA across > 80% of image?

Yes → flag_ai_gen No → d5

CNN probability 0.20–0.65?

Yes → inconclusive_review No → pass

FLAG — Region splicing or clone stamp detected with high confidence

flag_splice

FLAG — AI-generated image signature; not a genuine photograph

flag_ai_gen

INCONCLUSIVE — Non-standard format; ELA not applicable

inconclusive_format

INCONCLUSIVE — Moderate manipulation signal; escalate to human reviewer

inconclusive_review

PASS — No significant manipulation detected

pass

Technical Design

Architecture

AGT-FOR-002 is a stateless async FastAPI microservice. ELA computation is pure NumPy (CPU-bound, ~20 ms for a 12 MP image). CNN inference uses ONNX Runtime with a quantised INT8 model for ~50 ms on CPU. GradCAM is computed only when manipulation probability exceeds 0.5, keeping the happy-path latency low. The service is containerised with a 512 MB memory limit.

Components

Component	Role	Technology
ImageNormaliser	Loads binary, converts format, validates dimensions	Pillow 10.x
ELAEngine	Re-compresses image and computes difference heatmap	Pillow + NumPy
StatisticalAnalyser	Computes mean, std, and ratio of ELA pixel errors	NumPy
CNNClassifier	Classifies ELA map as authentic or manipulated	ONNX Runtime + ResNet-50 INT8
GradCAMLocaliser	Produces saliency map and bounding boxes	PyTorch hooks + OpenCV contours
VerdictAssembler	Combines scores into risk_score and verdict	Pure Python

Architecture Diagram

┌────────────────────────────┐
│  POST /analyze (image)     │
└─────────────┬──────────────┘
              │
              ▼
┌────────────────────────────┐
│     ImageNormaliser        │
│  (format check, RGB conv)  │
└─────────────┬──────────────┘
              │
        ┌─────┴─────┐
        ▼           ▼
┌───────────┐  ┌────────────┐
│ ELAEngine │  │ Original   │
│(recompress│  │ image copy │
│ & diff)   │  └────────────┘
└─────┬─────┘
      │ ELA heatmap
      ▼
┌─────────────────────────────┐
│    StatisticalAnalyser      │
│  (mean, std, ratio)         │
└──────────┬──────────────────┘
           │
           ▼
┌─────────────────────────────┐
│      CNNClassifier          │
│  (ResNet-50 ONNX, 224×224)  │
└──────────┬──────────────────┘
           │ prob > 0.5?
           ▼
┌─────────────────────────────┐
│     GradCAMLocaliser        │
│  (saliency → bboxes)        │
└──────────┬──────────────────┘
           │
           ▼
┌─────────────────────────────┐
│      VerdictAssembler       │
└─────────────────────────────┘

Data Flow

API Gateway ImageNormaliser | Raw image binary

ImageNormaliser ELAEngine | PIL Image object (RGB)

ELAEngine StatisticalAnalyser | NumPy ELA difference array

StatisticalAnalyser CNNClassifier | Normalised 224×224 ELA tensor

CNNClassifier GradCAMLocaliser | Manipulation probability + layer activations

GradCAMLocaliser VerdictAssembler | Bounding boxes + saliency heatmap

VerdictAssembler API Gateway | JSON verdict + base64 heatmap

Back to AI Agents AGT-FOR-002