Sentinel Core

Manager Portal

M
AI Agents / EXIF Metadata Analyst
AGT-FOR-001 Digital Forensics Rule-based

EXIF Metadata Analyst

AGT-FOR-001 is a deterministic rule-based agent responsible for extracting and cross-validating all embedded metadata from image and PDF files submitted as claim evidence. It reads EXIF, IPTC, and XMP metadata layers using Phil Harvey's ExifTool and the Python piexif/Pillow stack, then compares GPS coordinates, capture timestamp, and device model fingerprint against the claimant's declared incident location, time, and equipment. Any discrepancy beyond configurable thresholds — such as GPS coordinates more than 2 km from the declared scene, a capture timestamp outside the claimed incident window, or a device model inconsistent with the policyholder's registered equipment — is flagged as a spatial, temporal, or device anomaly respectively. Because the logic is entirely deterministic and auditable, this agent produces high-confidence, court-admissible evidence chains.

Tech Stack

Python 3.11 Runtime
ExifTool 12.x Primary EXIF/IPTC/XMP extraction engine
Pillow 10.x Image I/O and secondary EXIF read
piexif 1.1.x Low-level EXIF byte-structure access
PyMuPDF (fitz) PDF metadata extraction
geopy 2.x GPS coordinate distance calculation
FastAPI REST API endpoint exposure

Input

The agent accepts a single image or PDF file plus the claimant-declared incident metadata for cross-validation.

Accepted Formats

JPEG PNG TIFF HEIC PDF

Fields

Name Type Req Description
file binary Yes Raw bytes of the submitted image or PDF file
declared_lat float Yes Claimant-declared GPS latitude of the incident
declared_lon float Yes Claimant-declared GPS longitude of the incident
declared_timestamp ISO-8601 string Yes Claimant-declared date and time of the incident
declared_device_model string No Optional: registered device model (e.g. iPhone 14 Pro)
gps_tolerance_km float No Override GPS mismatch threshold in km (default: 2.0)
time_tolerance_hours float No Override timestamp mismatch threshold in hours (default: 1.0)

Output

A structured JSON verdict containing extracted metadata fields, individual sub-checks, and an overall flag decision.

Format:

JSON

Fields

Name Type Description
exif_gps_lat float GPS latitude extracted from EXIF
exif_gps_lon float GPS longitude extracted from EXIF
gps_distance_km float Geodesic distance between EXIF GPS and declared GPS
exif_timestamp string Original capture timestamp from EXIF DateTimeOriginal tag
time_delta_hours float Absolute difference in hours between EXIF and declared timestamps
device_make string Camera/phone make from EXIF Make tag
device_model string Camera/phone model from EXIF Model tag
software_edited string Software tag if image was processed post-capture
flags array<string> List of triggered flag codes: GPS_MISMATCH, TIMESTAMP_MISMATCH, DEVICE_MISMATCH, NO_EXIF, EXIF_STRIPPED
risk_score float Normalised risk contribution 0.0–1.0
verdict string PASS | FLAG | INCONCLUSIVE
evidence_chain array<string> Human-readable audit log of each check performed

Example Response

{
  "exif_gps_lat": 10.9821,
  "exif_gps_lon": 106.3045,
  "gps_distance_km": 34.7,
  "exif_timestamp": "2024-08-14T03:22:10",
  "time_delta_hours": 17.4,
  "device_make": "Apple",
  "device_model": "iPhone 11",
  "software_edited": null,
  "flags": ["GPS_MISMATCH", "TIMESTAMP_MISMATCH"],
  "risk_score": 0.92,
  "verdict": "FLAG",
  "evidence_chain": [
    "EXIF GPS (10.9821, 106.3045) is 34.7 km from declared location (10.7758, 106.7004) — threshold 2.0 km",
    "EXIF DateTimeOriginal 2024-08-14T03:22:10 differs from declared 2024-08-14T20:45:00 by 17.4 h — threshold 1.0 h"
  ]
}

How It Works

AGT-FOR-001 operates as a synchronous request-response microservice. When a claim document is submitted to the fraud detection pipeline, the orchestrator posts the file and declared metadata to the agent's REST endpoint.

The agent first validates the file's magic bytes to confirm the true format, then shells out to ExifTool — the gold-standard metadata extraction tool used by forensic labs worldwide — requesting full JSON output of all tag groups. This produces a structured dictionary of potentially hundreds of metadata fields.

Three primary validation engines then run in sequence. The GPS engine reads the Composite:GPSLatitude and Composite:GPSLongitude fields and computes the geodesic (great-circle) distance to the claimant-declared coordinates using the Haversine formula. A mismatch greater than 2 km indicates the photo was not taken where the claimant says.

The timestamp engine reads EXIF:DateTimeOriginal (the timestamp the shutter was pressed, embedded by the camera firmware and very hard to forge without specialist tools) and compares it to the declared incident time. A delta greater than one hour triggers a flag.

The device engine compares the EXIF Make+Model fingerprint to the policyholder's registered device. It also inspects the Software tag: legitimate claim photos should not have been processed in Photoshop or GIMP.

Finally, an absence-detection pass checks whether EXIF was stripped or zeroed. Deliberate stripping is itself a strong fraud signal. All check results are assembled into a structured evidence chain that is both machine-readable for downstream aggregation and human-readable for adjudicators.

Thinking Steps

1

Ingest & Validate File

Receive the uploaded binary, check magic bytes to confirm the declared MIME type matches the actual file signature. Reject files that masquerade as JPEG but contain PDF headers, or vice versa.

Magic-byte validation prevents attackers from stripping metadata by renaming a PDF as JPEG.

2

Extract Full Metadata Tree

Invoke ExifTool via subprocess with JSON output mode to extract all tag groups (EXIF, IPTC, XMP, ICC_Profile, Composite). Store the complete tag dictionary for downstream checks.

ExifTool's Composite group synthesises derived values like GPS coordinates in decimal degrees, saving manual DMS conversion.

3

GPS Cross-Validation

Read GPSLatitude and GPSLongitude from the Composite group. Compute the geodesic distance to the declared incident coordinates using the Haversine formula via geopy. Flag if distance exceeds gps_tolerance_km (default 2.0 km).

2 km tolerance accounts for GPS receiver accuracy variance in urban canyons.

4

Timestamp Cross-Validation

Read DateTimeOriginal (preferred), CreateDate, or ModifyDate tags. Parse to timezone-aware datetime. Compute absolute delta against declared_timestamp. Flag if delta exceeds time_tolerance_hours (default 1.0 h).

If only ModifyDate is present (no DateTimeOriginal), confidence is lower — evidence_chain notes the tag source.

5

Device Model Cross-Validation

Read Make + Model tags. If declared_device_model is provided, normalise both strings (lowercase, strip spaces) and compare. Flag DEVICE_MISMATCH on a mismatch. Also check Software tag for signs of post-processing tools (e.g. 'Adobe Photoshop', 'GIMP').

A claimant filing with a phone model they have never registered in the policy is a soft fraud signal.

6

Metadata Absence / Strip Detection

If no EXIF group is found at all, raise NO_EXIF flag. If GPSInfo exists but all coordinates are zero, raise EXIF_STRIPPED. These patterns indicate deliberate metadata removal before submission.

Modern social media platforms auto-strip EXIF; if the claimant claims to have downloaded their own photo from Facebook, EXIF absence is explainable — context matters.

7

Risk Score Aggregation & Verdict

Each flag contributes a weighted score: GPS_MISMATCH=0.45, TIMESTAMP_MISMATCH=0.35, DEVICE_MISMATCH=0.15, NO_EXIF=0.25, EXIF_STRIPPED=0.40. Clip total to 1.0. Verdict: PASS if score<0.20, FLAG if score>=0.20, INCONCLUSIVE if NO_EXIF and no other flags.

Weights are configurable via environment variables; these defaults were calibrated on 12,000 historical claims.

Thinking Tree

  • Root Question: Is the submitted image authentic evidence of the declared incident?
    • Does EXIF data exist?
      • Yes — proceed to validation checks
        • GPS coordinates present?
          • GPS within 2 km of declared location → PASS (GPS)
          • GPS mismatch > 2 km → FLAG GPS_MISMATCH
        • DateTimeOriginal present?
          • Timestamp within 1 h of declared → PASS (time)
          • Timestamp delta > 1 h → FLAG TIMESTAMP_MISMATCH
        • Device model declared by claimant?
          • EXIF model matches declared → PASS (device)
          • EXIF model differs → FLAG DEVICE_MISMATCH
      • No EXIF at all → FLAG NO_EXIF (INCONCLUSIVE)
      • EXIF present but GPS zeroed → FLAG EXIF_STRIPPED

Decision Tree

Does the file contain any EXIF metadata?

Yes → d2 No → flag_no_exif
d1

Are GPS coordinates present and non-zero?

Yes → d3 No → flag_stripped
d2

GPS distance from declared location ≤ 2 km?

Yes → d4 No → flag_gps
d3

DateTimeOriginal delta from declared time ≤ 1 h?

Yes → d5 No → flag_time
d4

Device model consistent with registered equipment?

Yes → pass No → flag_device
d5

INCONCLUSIVE — No EXIF found; possible social-media download or deliberate strip

flag_no_exif

FLAG — EXIF_STRIPPED: GPS zeroed; likely deliberate metadata removal

flag_stripped

FLAG — GPS_MISMATCH: Photo taken far from declared incident scene

flag_gps

FLAG — TIMESTAMP_MISMATCH: Photo predates or postdates declared incident

flag_time

FLAG — DEVICE_MISMATCH: Image captured by unregistered device

flag_device

PASS — All metadata checks consistent with declared incident

pass

Technical Design

Architecture

AGT-FOR-001 is a stateless synchronous microservice built with FastAPI. Each request is fully self-contained: the file and declared metadata arrive together, all processing happens in-memory, and the response is returned before the HTTP connection closes. ExifTool runs as a subprocess (not a persistent daemon) so there is no shared state between requests. The agent is horizontally scalable behind a load balancer.

Components

Component Role Technology
MagicByteValidator Confirms true file type from binary signature python-magic / imghdr stdlib
ExifToolBridge Shells out to ExifTool and parses JSON output subprocess + ExifTool 12.x
PillowFallback Secondary EXIF read when ExifTool unavailable Pillow 10.x _getexif()
GPSValidator Computes geodesic distance between two coordinate pairs geopy.distance.geodesic (WGS-84)
TimestampValidator Parses EXIF datetime strings and computes delta Python datetime + pytz
DeviceValidator Normalises and compares Make/Model tags Python string ops
RiskAggregator Weights flags into a 0–1 risk score Pure Python arithmetic
EvidenceChainBuilder Constructs human-readable audit log entries Python f-strings + list builder

Architecture Diagram

┌─────────────────────────────┐
│   POST /analyze (file +     │
│   declared metadata)        │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│      MagicByteValidator     │
│  (reject format spoofing)   │
└──────────────┬──────────────┘
               │
               ▼
┌─────────────────────────────┐
│       ExifToolBridge        │
│  exiftool -j -G <file>      │
└──────┬─────────┬────────────┘
       │         │
       ▼         ▼
┌──────────┐ ┌──────────────┐
│   GPS    │ │  Timestamp   │
│Validator │ │  Validator   │
└──────┬───┘ └──────┬───────┘
       │            │
       └─────┬──────┘
             │
             ▼
  ┌──────────────────┐
  │  DeviceValidator │
  └────────┬─────────┘
           │
           ▼
  ┌──────────────────────┐
  │   RiskAggregator +   │
  │ EvidenceChainBuilder │
  └──────────┬───────────┘
             │
             ▼
   JSON verdict response

Data Flow

API Gateway MagicByteValidator | Raw file bytes + declared metadata JSON
MagicByteValidator ExifToolBridge | Validated temp file path
ExifToolBridge GPSValidator | Composite:GPSLatitude, Composite:GPSLongitude
ExifToolBridge TimestampValidator | EXIF:DateTimeOriginal
ExifToolBridge DeviceValidator | EXIF:Make, EXIF:Model, EXIF:Software
GPSValidator RiskAggregator | GPS flag + distance_km
TimestampValidator RiskAggregator | Time flag + delta_hours
DeviceValidator RiskAggregator | Device flag
RiskAggregator EvidenceChainBuilder | Flags list + individual scores
EvidenceChainBuilder API Gateway | Full JSON verdict + evidence_chain array