Sentinel Core

Manager Portal

M
AI Agents / Reverse Image Search
AGT-FOR-006 Digital Forensics AI-based

Reverse Image Search

AGT-FOR-006 detects stolen, recycled, or internet-sourced images that are fraudulently submitted as genuine claim evidence. The agent computes a perceptual hash of each submitted image and queries multiple reverse image search services in parallel — Google Vision API, TinEye API, and a proprietary internal hash database of previously seen claim images. It also checks whether the image, or a visually near-duplicate, already exists publicly on the internet (news sites, auto dealer listings, social media). A claimant submitting a publicly-available photo of a vehicle fire as evidence of their own incident is a definitive fraud indicator. The agent returns all discovered source URLs, similarity scores, and the earliest known publication date to establish an unambiguous prior existence timeline.

Tech Stack

Python 3.11 Runtime
Google Cloud Vision API Web detection and reverse image search
TinEye API Exact and near-duplicate web search
imagehash 4.x Perceptual hash (pHash, dHash, wHash) computation
Pillow 10.x Image loading and preprocessing
aiohttp 3.x Async HTTP client for parallel API calls
Redis 7.x Internal hash database for previously seen claim images
PostgreSQL 15 Persistent storage of image hash index

Input

A single image file and optional context about the claimed incident for relevance filtering.

Accepted Formats

JPEG PNG WEBP GIF

Fields

Name Type Req Description
image_file binary Yes Raw image bytes to search for
claim_id string Yes Claim ID for tracking and internal hash cross-reference
incident_type string No Incident category for result relevance scoring (e.g. motor, fire, flood)
max_results int No Maximum number of matching URLs to return per source (default: 10)

Output

Match results from all search sources, the computed image hashes, and a final verdict on whether the image is original or recycled.

Format:

JSON

Fields

Name Type Description
phash string 64-bit perceptual hash (hex) of the submitted image
internal_match object | null If found in internal database: {claim_id, submission_date, similarity_pct}
google_matches array<object> Google Vision web detection results: {url, title, score, first_seen}
tineye_matches array<object> TinEye results: {url, crawl_date, score, image_url}
earliest_known_date string | null ISO-8601 date of the earliest known web publication of this image
flags array<string> FLAG_INTERNET_SOURCE, FLAG_DUPLICATE_CLAIM, FLAG_STOCK_PHOTO, FLAG_NEWS_ARTICLE
risk_score float Normalised risk contribution 0.0–1.0
verdict string PASS | FLAG | INCONCLUSIVE

Example Response

{
  "phash": "f8e4c2a0b6d4e8f0",
  "internal_match": null,
  "google_matches": [
    {"url": "https://vnexpress.net/...", "title": "Xe bốc cháy trên cao tốc", "score": 0.98, "first_seen": "2023-03-12"}
  ],
  "tineye_matches": [],
  "earliest_known_date": "2023-03-12",
  "flags": ["FLAG_INTERNET_SOURCE", "FLAG_NEWS_ARTICLE"],
  "risk_score": 0.97,
  "verdict": "FLAG"
}

How It Works

AGT-FOR-006 operates on the forensic principle that a genuine incident photo is unique — it has never appeared anywhere on the internet before the moment of submission. Any prior appearance on the web, regardless of where, indicates the image was not taken at the claimant's incident.

The agent's first layer of defence is the internal claim image database. Every image submitted to any claim is hashed and indexed. Before querying expensive external APIs, the agent checks whether this hash (or a close match) already exists in a previous claim — the most direct form of duplicate fraud detection.

The second layer uses external reverse image search services in parallel (via aiohttp async calls) to query both Google Vision API and TinEye. These services have indexed billions of web pages and can find an image even if it has been resized, recompressed, cropped, or watermarked.

The critical insight is the temporal analysis: if the earliest known publication of the image predates the claimed incident, the image cannot be genuine evidence of that incident. For example, a claimant submitting a photo of a car fire that appeared in a newspaper three months ago cannot claim that photo as evidence of their own incident today.

The agent also detects stock photo usage (images from Getty Images or Shutterstock commonly appear in fraudulent claims) and social media reuse (an image posted to Facebook or Twitter before the incident).

All matches are returned with URLs, dates, and similarity scores, providing adjudicators with direct evidence they can verify independently.

Thinking Steps

1

Image Preprocessing & Hash Computation

Load the image with Pillow, convert to grayscale, resize to 64×64 for hashing. Compute three hash types: pHash (DCT-based, robust to compression), dHash (difference hash, fast), and wHash (wavelet hash, robust to blur). Store all three for multi-algorithm matching.

Using multiple hash algorithms reduces both false positives and false negatives: pHash catches resized/recompressed copies, wHash catches blurred or watermarked versions.

2

Internal Database Cross-Reference

Query the Redis hash index (built from all previously submitted claim images) using Hamming distance ≤ 10 bits as the similarity threshold. A match here means this exact (or near-identical) image was previously submitted in another claim — a definitive duplicate fraud signal.

The Redis sorted set structure enables O(log N) approximate nearest-neighbour search on the hash space.

3

Google Vision Web Detection

Submit the image to Google Cloud Vision API's webDetection feature. Extracts: fullMatchingImages (exact copies on the web), partialMatchingImages (cropped or partially matching), visuallySimilarImages, and webEntities. Record all URLs and their first-crawled dates.

Google's webDetection often finds images that TinEye misses because Google crawls more recently indexed pages.

4

TinEye Reverse Search

Submit the image to TinEye's REST API which specialises in exact and near-duplicate matching across its index of over 60 billion images. TinEye returns match count, URLs, and crawl dates. Its strength is finding images that have been slightly cropped or recoloured.

TinEye excels at finding recycled stock photos, which are a common source for fraudulent 'damage' photos.

5

Timeline Analysis

Extract the earliest known publication date from all discovered URLs. If the earliest publication date predates the claimed incident date, this proves the image cannot be an original photo of the incident.

A delta of even one day before the incident is conclusive evidence of image recycling.

6

Source Classification & Flag Assignment

Classify discovered sources: news articles → FLAG_NEWS_ARTICLE, stock photo sites (Shutterstock, Getty) → FLAG_STOCK_PHOTO, previous claims in internal DB → FLAG_DUPLICATE_CLAIM, any web source → FLAG_INTERNET_SOURCE. Risk score increases with the number of web matches and their similarity.

A single exact match on a news article from before the incident date is sufficient for a high-confidence FLAG regardless of other signals.

Thinking Tree

  • Root Question: Is this image original and unpublished before the claimed incident?
    • Check internal claim database
      • Hash match found in previous claim → FLAG_DUPLICATE_CLAIM
      • No internal match — proceed to web search
    • Google Vision Web Detection
      • Exact match found on web
        • Publication date before incident → FLAG_INTERNET_SOURCE (high confidence)
        • Publication date after incident → weak signal, note only
      • No exact web match — check TinEye
    • TinEye near-duplicate search
      • Stock photo site match → FLAG_STOCK_PHOTO
      • News article match → FLAG_NEWS_ARTICLE
      • No match on any service → PASS

Decision Tree

Does image hash match any previous claim in internal DB?

Yes → flag_dup No → d2
d1

Google Vision finds exact or near-duplicate web match?

Yes → d3 No → d4
d2

Earliest known publication date before incident date?

Yes → flag_internet No → d4
d3

TinEye finds match on stock photo site?

Yes → flag_stock No → d5
d4

TinEye finds match on any news or media site?

Yes → flag_news No → pass
d5

FLAG — DUPLICATE_CLAIM: Same image used in a previous claim

flag_dup

FLAG — INTERNET_SOURCE: Image was published online before the incident

flag_internet

FLAG — STOCK_PHOTO: Image sourced from commercial stock photo library

flag_stock

FLAG — NEWS_ARTICLE: Image found in news/media coverage unrelated to claimant

flag_news

PASS — No prior web publication found; image appears original

pass

Technical Design

Architecture

AGT-FOR-006 is an async FastAPI microservice. All three search operations (internal DB, Google Vision, TinEye) run concurrently via asyncio.gather to minimise latency. The internal Redis hash index enables sub-millisecond duplicate detection before expensive external API calls. Total p95 latency is approximately 3–5 seconds depending on Google Vision response time.

Components

Component Role Technology
HashComputer Computes pHash, dHash, wHash from image imagehash 4.x + Pillow
InternalDBChecker Queries Redis hash index with Hamming distance filter Redis ZRANGEBYSCORE + Python bitcount
GoogleVisionClient Calls Google Cloud Vision webDetection endpoint google-cloud-vision Python SDK
TinEyeClient Calls TinEye REST API aiohttp + TinEye API v2
TimelineAnalyser Extracts and compares publication dates Python datetime + dateparser
SourceClassifier Categorises match URLs into source types URL pattern matching + domain allowlist
ResultAggregator Merges results from all sources into unified verdict Pure Python

Architecture Diagram

┌──────────────────────────────┐
│  POST /analyze (image +      │
│  claim_id)                   │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│       HashComputer           │
│  (pHash + dHash + wHash)     │
└──────┬───────────────────────┘
       │
       ├──────────────────────┐
       ▼                      ▼
┌──────────────┐   ┌──────────────────────┐
│InternalDB    │   │  Async API Calls      │
│Checker       │   │ ┌─────────────────┐  │
│(Redis pHash) │   │ │ GoogleVisionCli │  │
└──────┬───────┘   │ └────────┬────────┘  │
       │           │          │            │
       │           │ ┌────────▼────────┐  │
       │           │ │  TinEyeClient   │  │
       │           │ └────────┬────────┘  │
       │           └──────────┼───────────┘
       │                      │
       └──────────┬───────────┘
                  │
                  ▼
     ┌────────────────────────┐
     │   TimelineAnalyser +   │
     │   SourceClassifier     │
     └──────────┬─────────────┘
                │
                ▼
     ┌────────────────────────┐
     │    ResultAggregator    │
     └────────────────────────┘

Data Flow

API Gateway HashComputer | Raw image binary
HashComputer InternalDBChecker | pHash hex string
HashComputer GoogleVisionClient | Image bytes (base64)
HashComputer TinEyeClient | Image bytes (multipart)
GoogleVisionClient TimelineAnalyser | URL list with crawl dates
TinEyeClient TimelineAnalyser | Match objects with crawl dates
TimelineAnalyser SourceClassifier | Dated URL list
SourceClassifier ResultAggregator | Classified matches with flags
InternalDBChecker ResultAggregator | Internal match result
ResultAggregator API Gateway | Full JSON verdict