EXIF Metadata Analyst
AGT-FOR-001 is a deterministic rule-based agent responsible for extracting and cross-validating all embedded metadata from image and PDF files submitted as claim evidence. It reads EXIF, IPTC, and XMP metadata layers using Phil Harvey's ExifTool and the Python piexif/Pillow stack, then compares GPS coordinates, capture timestamp, and device model fingerprint against the claimant's declared incident location, time, and equipment. Any discrepancy beyond configurable thresholds — such as GPS coordinates more than 2 km from the declared scene, a capture timestamp outside the claimed incident window, or a device model inconsistent with the policyholder's registered equipment — is flagged as a spatial, temporal, or device anomaly respectively. Because the logic is entirely deterministic and auditable, this agent produces high-confidence, court-admissible evidence chains.
Tech Stack
Input
The agent accepts a single image or PDF file plus the claimant-declared incident metadata for cross-validation.
Accepted Formats
Fields
| Name | Type | Req | Description |
|---|---|---|---|
| file | binary | Yes | Raw bytes of the submitted image or PDF file |
| declared_lat | float | Yes | Claimant-declared GPS latitude of the incident |
| declared_lon | float | Yes | Claimant-declared GPS longitude of the incident |
| declared_timestamp | ISO-8601 string | Yes | Claimant-declared date and time of the incident |
| declared_device_model | string | No | Optional: registered device model (e.g. iPhone 14 Pro) |
| gps_tolerance_km | float | No | Override GPS mismatch threshold in km (default: 2.0) |
| time_tolerance_hours | float | No | Override timestamp mismatch threshold in hours (default: 1.0) |
Output
A structured JSON verdict containing extracted metadata fields, individual sub-checks, and an overall flag decision.
Format:
JSONFields
| Name | Type | Description |
|---|---|---|
| exif_gps_lat | float | GPS latitude extracted from EXIF |
| exif_gps_lon | float | GPS longitude extracted from EXIF |
| gps_distance_km | float | Geodesic distance between EXIF GPS and declared GPS |
| exif_timestamp | string | Original capture timestamp from EXIF DateTimeOriginal tag |
| time_delta_hours | float | Absolute difference in hours between EXIF and declared timestamps |
| device_make | string | Camera/phone make from EXIF Make tag |
| device_model | string | Camera/phone model from EXIF Model tag |
| software_edited | string | Software tag if image was processed post-capture |
| flags | array<string> | List of triggered flag codes: GPS_MISMATCH, TIMESTAMP_MISMATCH, DEVICE_MISMATCH, NO_EXIF, EXIF_STRIPPED |
| risk_score | float | Normalised risk contribution 0.0–1.0 |
| verdict | string | PASS | FLAG | INCONCLUSIVE |
| evidence_chain | array<string> | Human-readable audit log of each check performed |
Example Response
{
"exif_gps_lat": 10.9821,
"exif_gps_lon": 106.3045,
"gps_distance_km": 34.7,
"exif_timestamp": "2024-08-14T03:22:10",
"time_delta_hours": 17.4,
"device_make": "Apple",
"device_model": "iPhone 11",
"software_edited": null,
"flags": ["GPS_MISMATCH", "TIMESTAMP_MISMATCH"],
"risk_score": 0.92,
"verdict": "FLAG",
"evidence_chain": [
"EXIF GPS (10.9821, 106.3045) is 34.7 km from declared location (10.7758, 106.7004) — threshold 2.0 km",
"EXIF DateTimeOriginal 2024-08-14T03:22:10 differs from declared 2024-08-14T20:45:00 by 17.4 h — threshold 1.0 h"
]
}
How It Works
AGT-FOR-001 operates as a synchronous request-response microservice. When a claim document is submitted to the fraud detection pipeline, the orchestrator posts the file and declared metadata to the agent's REST endpoint.
The agent first validates the file's magic bytes to confirm the true format, then shells out to ExifTool — the gold-standard metadata extraction tool used by forensic labs worldwide — requesting full JSON output of all tag groups. This produces a structured dictionary of potentially hundreds of metadata fields.
Three primary validation engines then run in sequence. The GPS engine reads the Composite:GPSLatitude and Composite:GPSLongitude fields and computes the geodesic (great-circle) distance to the claimant-declared coordinates using the Haversine formula. A mismatch greater than 2 km indicates the photo was not taken where the claimant says.
The timestamp engine reads EXIF:DateTimeOriginal (the timestamp the shutter was pressed, embedded by the camera firmware and very hard to forge without specialist tools) and compares it to the declared incident time. A delta greater than one hour triggers a flag.
The device engine compares the EXIF Make+Model fingerprint to the policyholder's registered device. It also inspects the Software tag: legitimate claim photos should not have been processed in Photoshop or GIMP.
Finally, an absence-detection pass checks whether EXIF was stripped or zeroed. Deliberate stripping is itself a strong fraud signal. All check results are assembled into a structured evidence chain that is both machine-readable for downstream aggregation and human-readable for adjudicators.
Thinking Steps
Ingest & Validate File
Receive the uploaded binary, check magic bytes to confirm the declared MIME type matches the actual file signature. Reject files that masquerade as JPEG but contain PDF headers, or vice versa.
Magic-byte validation prevents attackers from stripping metadata by renaming a PDF as JPEG.
Extract Full Metadata Tree
Invoke ExifTool via subprocess with JSON output mode to extract all tag groups (EXIF, IPTC, XMP, ICC_Profile, Composite). Store the complete tag dictionary for downstream checks.
ExifTool's Composite group synthesises derived values like GPS coordinates in decimal degrees, saving manual DMS conversion.
GPS Cross-Validation
Read GPSLatitude and GPSLongitude from the Composite group. Compute the geodesic distance to the declared incident coordinates using the Haversine formula via geopy. Flag if distance exceeds gps_tolerance_km (default 2.0 km).
2 km tolerance accounts for GPS receiver accuracy variance in urban canyons.
Timestamp Cross-Validation
Read DateTimeOriginal (preferred), CreateDate, or ModifyDate tags. Parse to timezone-aware datetime. Compute absolute delta against declared_timestamp. Flag if delta exceeds time_tolerance_hours (default 1.0 h).
If only ModifyDate is present (no DateTimeOriginal), confidence is lower — evidence_chain notes the tag source.
Device Model Cross-Validation
Read Make + Model tags. If declared_device_model is provided, normalise both strings (lowercase, strip spaces) and compare. Flag DEVICE_MISMATCH on a mismatch. Also check Software tag for signs of post-processing tools (e.g. 'Adobe Photoshop', 'GIMP').
A claimant filing with a phone model they have never registered in the policy is a soft fraud signal.
Metadata Absence / Strip Detection
If no EXIF group is found at all, raise NO_EXIF flag. If GPSInfo exists but all coordinates are zero, raise EXIF_STRIPPED. These patterns indicate deliberate metadata removal before submission.
Modern social media platforms auto-strip EXIF; if the claimant claims to have downloaded their own photo from Facebook, EXIF absence is explainable — context matters.
Risk Score Aggregation & Verdict
Each flag contributes a weighted score: GPS_MISMATCH=0.45, TIMESTAMP_MISMATCH=0.35, DEVICE_MISMATCH=0.15, NO_EXIF=0.25, EXIF_STRIPPED=0.40. Clip total to 1.0. Verdict: PASS if score<0.20, FLAG if score>=0.20, INCONCLUSIVE if NO_EXIF and no other flags.
Weights are configurable via environment variables; these defaults were calibrated on 12,000 historical claims.
Thinking Tree
-
Root Question: Is the submitted image authentic evidence of the declared incident?
-
Does EXIF data exist?
-
Yes — proceed to validation checks
-
GPS coordinates present?
- GPS within 2 km of declared location → PASS (GPS)
- GPS mismatch > 2 km → FLAG GPS_MISMATCH
-
DateTimeOriginal present?
- Timestamp within 1 h of declared → PASS (time)
- Timestamp delta > 1 h → FLAG TIMESTAMP_MISMATCH
-
Device model declared by claimant?
- EXIF model matches declared → PASS (device)
- EXIF model differs → FLAG DEVICE_MISMATCH
-
GPS coordinates present?
- No EXIF at all → FLAG NO_EXIF (INCONCLUSIVE)
- EXIF present but GPS zeroed → FLAG EXIF_STRIPPED
-
Yes — proceed to validation checks
-
Does EXIF data exist?
Decision Tree
Does the file contain any EXIF metadata?
Are GPS coordinates present and non-zero?
GPS distance from declared location ≤ 2 km?
DateTimeOriginal delta from declared time ≤ 1 h?
Device model consistent with registered equipment?
INCONCLUSIVE — No EXIF found; possible social-media download or deliberate strip
FLAG — EXIF_STRIPPED: GPS zeroed; likely deliberate metadata removal
FLAG — GPS_MISMATCH: Photo taken far from declared incident scene
FLAG — TIMESTAMP_MISMATCH: Photo predates or postdates declared incident
FLAG — DEVICE_MISMATCH: Image captured by unregistered device
PASS — All metadata checks consistent with declared incident
Technical Design
Architecture
AGT-FOR-001 is a stateless synchronous microservice built with FastAPI. Each request is fully self-contained: the file and declared metadata arrive together, all processing happens in-memory, and the response is returned before the HTTP connection closes. ExifTool runs as a subprocess (not a persistent daemon) so there is no shared state between requests. The agent is horizontally scalable behind a load balancer.
Components
| Component | Role | Technology |
|---|---|---|
| MagicByteValidator | Confirms true file type from binary signature | python-magic / imghdr stdlib |
| ExifToolBridge | Shells out to ExifTool and parses JSON output | subprocess + ExifTool 12.x |
| PillowFallback | Secondary EXIF read when ExifTool unavailable | Pillow 10.x _getexif() |
| GPSValidator | Computes geodesic distance between two coordinate pairs | geopy.distance.geodesic (WGS-84) |
| TimestampValidator | Parses EXIF datetime strings and computes delta | Python datetime + pytz |
| DeviceValidator | Normalises and compares Make/Model tags | Python string ops |
| RiskAggregator | Weights flags into a 0–1 risk score | Pure Python arithmetic |
| EvidenceChainBuilder | Constructs human-readable audit log entries | Python f-strings + list builder |
Architecture Diagram
┌─────────────────────────────┐
│ POST /analyze (file + │
│ declared metadata) │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ MagicByteValidator │
│ (reject format spoofing) │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ ExifToolBridge │
│ exiftool -j -G <file> │
└──────┬─────────┬────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────────┐
│ GPS │ │ Timestamp │
│Validator │ │ Validator │
└──────┬───┘ └──────┬───────┘
│ │
└─────┬──────┘
│
▼
┌──────────────────┐
│ DeviceValidator │
└────────┬─────────┘
│
▼
┌──────────────────────┐
│ RiskAggregator + │
│ EvidenceChainBuilder │
└──────────┬───────────┘
│
▼
JSON verdict response
Data Flow