AI Agents / Reverse Image Search

AGT-FOR-006 Digital Forensics AI-based

Reverse Image Search

AGT-FOR-006 detects stolen, recycled, or internet-sourced images that are fraudulently submitted as genuine claim evidence. The agent computes a perceptual hash of each submitted image and queries multiple reverse image search services in parallel — Google Vision API, TinEye API, and a proprietary internal hash database of previously seen claim images. It also checks whether the image, or a visually near-duplicate, already exists publicly on the internet (news sites, auto dealer listings, social media). A claimant submitting a publicly-available photo of a vehicle fire as evidence of their own incident is a definitive fraud indicator. The agent returns all discovered source URLs, similarity scores, and the earliest known publication date to establish an unambiguous prior existence timeline.

Tech Stack

Python 3.11 Runtime

Google Cloud Vision API Web detection and reverse image search

TinEye API Exact and near-duplicate web search

imagehash 4.x Perceptual hash (pHash, dHash, wHash) computation

Pillow 10.x Image loading and preprocessing

aiohttp 3.x Async HTTP client for parallel API calls

Redis 7.x Internal hash database for previously seen claim images

PostgreSQL 15 Persistent storage of image hash index

Input

A single image file and optional context about the claimed incident for relevance filtering.

Accepted Formats

JPEG PNG WEBP GIF

Fields

Name	Type	Req	Description
image_file	binary	Yes	Raw image bytes to search for
claim_id	string	Yes	Claim ID for tracking and internal hash cross-reference
incident_type	string	No	Incident category for result relevance scoring (e.g. motor, fire, flood)
max_results	int	No	Maximum number of matching URLs to return per source (default: 10)

Output

Match results from all search sources, the computed image hashes, and a final verdict on whether the image is original or recycled.

Format:

JSON

Fields

Name	Type	Description
phash	string	64-bit perceptual hash (hex) of the submitted image
internal_match	object \| null	If found in internal database: {claim_id, submission_date, similarity_pct}
google_matches	array<object>	Google Vision web detection results: {url, title, score, first_seen}
tineye_matches	array<object>	TinEye results: {url, crawl_date, score, image_url}
earliest_known_date	string \| null	ISO-8601 date of the earliest known web publication of this image
flags	array<string>	FLAG_INTERNET_SOURCE, FLAG_DUPLICATE_CLAIM, FLAG_STOCK_PHOTO, FLAG_NEWS_ARTICLE
risk_score	float	Normalised risk contribution 0.0–1.0
verdict	string	PASS \| FLAG \| INCONCLUSIVE

Example Response

{
  "phash": "f8e4c2a0b6d4e8f0",
  "internal_match": null,
  "google_matches": [
    {"url": "https://vnexpress.net/...", "title": "Xe bốc cháy trên cao tốc", "score": 0.98, "first_seen": "2023-03-12"}
  ],
  "tineye_matches": [],
  "earliest_known_date": "2023-03-12",
  "flags": ["FLAG_INTERNET_SOURCE", "FLAG_NEWS_ARTICLE"],
  "risk_score": 0.97,
  "verdict": "FLAG"
}

How It Works

AGT-FOR-006 operates on the forensic principle that a genuine incident photo is unique — it has never appeared anywhere on the internet before the moment of submission. Any prior appearance on the web, regardless of where, indicates the image was not taken at the claimant's incident.

The agent's first layer of defence is the internal claim image database. Every image submitted to any claim is hashed and indexed. Before querying expensive external APIs, the agent checks whether this hash (or a close match) already exists in a previous claim — the most direct form of duplicate fraud detection.

The second layer uses external reverse image search services in parallel (via aiohttp async calls) to query both Google Vision API and TinEye. These services have indexed billions of web pages and can find an image even if it has been resized, recompressed, cropped, or watermarked.

The critical insight is the temporal analysis: if the earliest known publication of the image predates the claimed incident, the image cannot be genuine evidence of that incident. For example, a claimant submitting a photo of a car fire that appeared in a newspaper three months ago cannot claim that photo as evidence of their own incident today.

The agent also detects stock photo usage (images from Getty Images or Shutterstock commonly appear in fraudulent claims) and social media reuse (an image posted to Facebook or Twitter before the incident).

All matches are returned with URLs, dates, and similarity scores, providing adjudicators with direct evidence they can verify independently.

Thinking Steps

Image Preprocessing & Hash Computation

Load the image with Pillow, convert to grayscale, resize to 64×64 for hashing. Compute three hash types: pHash (DCT-based, robust to compression), dHash (difference hash, fast), and wHash (wavelet hash, robust to blur). Store all three for multi-algorithm matching.

Using multiple hash algorithms reduces both false positives and false negatives: pHash catches resized/recompressed copies, wHash catches blurred or watermarked versions.

Internal Database Cross-Reference

Query the Redis hash index (built from all previously submitted claim images) using Hamming distance ≤ 10 bits as the similarity threshold. A match here means this exact (or near-identical) image was previously submitted in another claim — a definitive duplicate fraud signal.

The Redis sorted set structure enables O(log N) approximate nearest-neighbour search on the hash space.

Google Vision Web Detection

Submit the image to Google Cloud Vision API's webDetection feature. Extracts: fullMatchingImages (exact copies on the web), partialMatchingImages (cropped or partially matching), visuallySimilarImages, and webEntities. Record all URLs and their first-crawled dates.

Google's webDetection often finds images that TinEye misses because Google crawls more recently indexed pages.

TinEye Reverse Search

Submit the image to TinEye's REST API which specialises in exact and near-duplicate matching across its index of over 60 billion images. TinEye returns match count, URLs, and crawl dates. Its strength is finding images that have been slightly cropped or recoloured.

TinEye excels at finding recycled stock photos, which are a common source for fraudulent 'damage' photos.

Timeline Analysis

Extract the earliest known publication date from all discovered URLs. If the earliest publication date predates the claimed incident date, this proves the image cannot be an original photo of the incident.

A delta of even one day before the incident is conclusive evidence of image recycling.

Source Classification & Flag Assignment

Classify discovered sources: news articles → FLAG_NEWS_ARTICLE, stock photo sites (Shutterstock, Getty) → FLAG_STOCK_PHOTO, previous claims in internal DB → FLAG_DUPLICATE_CLAIM, any web source → FLAG_INTERNET_SOURCE. Risk score increases with the number of web matches and their similarity.

A single exact match on a news article from before the incident date is sufficient for a high-confidence FLAG regardless of other signals.

Thinking Tree

Root Question: Is this image original and unpublished before the claimed incident?
- Check internal claim database
  - Hash match found in previous claim → FLAG_DUPLICATE_CLAIM
  - No internal match — proceed to web search
- Google Vision Web Detection
  - Exact match found on web
    - Publication date before incident → FLAG_INTERNET_SOURCE (high confidence)
    - Publication date after incident → weak signal, note only
  - No exact web match — check TinEye
- TinEye near-duplicate search
  - Stock photo site match → FLAG_STOCK_PHOTO
  - News article match → FLAG_NEWS_ARTICLE
  - No match on any service → PASS

Decision Tree

Does image hash match any previous claim in internal DB?

Yes → flag_dup No → d2

Google Vision finds exact or near-duplicate web match?

Yes → d3 No → d4

Earliest known publication date before incident date?

Yes → flag_internet No → d4

TinEye finds match on stock photo site?

Yes → flag_stock No → d5

TinEye finds match on any news or media site?

Yes → flag_news No → pass

FLAG — DUPLICATE_CLAIM: Same image used in a previous claim

flag_dup

FLAG — INTERNET_SOURCE: Image was published online before the incident

flag_internet

FLAG — STOCK_PHOTO: Image sourced from commercial stock photo library

flag_stock

FLAG — NEWS_ARTICLE: Image found in news/media coverage unrelated to claimant

flag_news

PASS — No prior web publication found; image appears original

pass

Technical Design

Architecture

AGT-FOR-006 is an async FastAPI microservice. All three search operations (internal DB, Google Vision, TinEye) run concurrently via asyncio.gather to minimise latency. The internal Redis hash index enables sub-millisecond duplicate detection before expensive external API calls. Total p95 latency is approximately 3–5 seconds depending on Google Vision response time.

Components

Component	Role	Technology
HashComputer	Computes pHash, dHash, wHash from image	imagehash 4.x + Pillow
InternalDBChecker	Queries Redis hash index with Hamming distance filter	Redis ZRANGEBYSCORE + Python bitcount
GoogleVisionClient	Calls Google Cloud Vision webDetection endpoint	google-cloud-vision Python SDK
TinEyeClient	Calls TinEye REST API	aiohttp + TinEye API v2
TimelineAnalyser	Extracts and compares publication dates	Python datetime + dateparser
SourceClassifier	Categorises match URLs into source types	URL pattern matching + domain allowlist
ResultAggregator	Merges results from all sources into unified verdict	Pure Python

Architecture Diagram

┌──────────────────────────────┐
│  POST /analyze (image +      │
│  claim_id)                   │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│       HashComputer           │
│  (pHash + dHash + wHash)     │
└──────┬───────────────────────┘
       │
       ├──────────────────────┐
       ▼                      ▼
┌──────────────┐   ┌──────────────────────┐
│InternalDB    │   │  Async API Calls      │
│Checker       │   │ ┌─────────────────┐  │
│(Redis pHash) │   │ │ GoogleVisionCli │  │
└──────┬───────┘   │ └────────┬────────┘  │
       │           │          │            │
       │           │ ┌────────▼────────┐  │
       │           │ │  TinEyeClient   │  │
       │           │ └────────┬────────┘  │
       │           └──────────┼───────────┘
       │                      │
       └──────────┬───────────┘
                  │
                  ▼
     ┌────────────────────────┐
     │   TimelineAnalyser +   │
     │   SourceClassifier     │
     └──────────┬─────────────┘
                │
                ▼
     ┌────────────────────────┐
     │    ResultAggregator    │
     └────────────────────────┘

Data Flow

API Gateway HashComputer | Raw image binary

HashComputer InternalDBChecker | pHash hex string

HashComputer GoogleVisionClient | Image bytes (base64)

HashComputer TinEyeClient | Image bytes (multipart)

GoogleVisionClient TimelineAnalyser | URL list with crawl dates

TinEyeClient TimelineAnalyser | Match objects with crawl dates

TimelineAnalyser SourceClassifier | Dated URL list

SourceClassifier ResultAggregator | Classified matches with flags

InternalDBChecker ResultAggregator | Internal match result

ResultAggregator API Gateway | Full JSON verdict

Back to AI Agents AGT-FOR-006