AI Agents / Identity & Criminal Matcher

AGT-BEH-014 Behavioural Analysis AI-based

Identity & Criminal Matcher

AGT-BEH-014 performs facial recognition and biometric identity verification against an insurance fraud blacklist database and public criminal records. The agent extracts faces from submitted ID documents (national ID cards, passports, driver's licences) and self-portrait photos (selfies), then computes 512-dimensional face embeddings using FaceNet512 and ArcFace models. These embeddings are compared against a proprietary blacklist database of confirmed insurance fraudsters, a shared industry blacklist (cross-insurer data sharing), and publicly accessible criminal conviction databases. A cosine similarity score above the configured threshold triggers an identity match alert. The agent also performs liveness detection to prevent photo injection attacks, and cross-validates the extracted face against the face photo embedded in the ID document.

Tech Stack

Python 3.11 Runtime

DeepFace 0.0.79 Unified facial recognition framework — face detection, alignment, embedding

FaceNet512 512-dimensional face embedding model — primary matching engine

ArcFace (InsightFace) Secondary embedding model for ensemble voting

MTCNN Multi-task Cascaded CNN for face detection and landmark localisation

OpenCV 4.x Image preprocessing, face crop normalisation

Faiss 1.7.x Approximate nearest neighbour search over blacklist embedding database

PostgreSQL 15 Blacklist database with embedding vectors and case metadata

Input

ID document image, selfie/portrait photo, and optional policy claimant data for context.

Accepted Formats

JPEG PNG

Fields

Name	Type	Req	Description
id_document_image	binary	Yes	National ID, passport, or driver's licence scan
selfie_image	binary	No	Recent self-portrait photo of the claimant for liveness-assured matching
claim_id	string	Yes	Claim ID for audit trail
claimant_id_number	string	No	Declared national ID number for cross-referencing against document OCR
match_threshold	float	No	Cosine similarity threshold for a positive match (default: 0.68)

Output

Identity verification result, blacklist match details, liveness verdict, and overall risk assessment.

Format:

JSON

Fields

Name	Type	Description
id_face_detected	boolean	Whether a face was successfully detected in the ID document
selfie_face_detected	boolean	Whether a face was detected in the selfie
id_selfie_similarity	float \| null	Cosine similarity between ID face and selfie embeddings (0.0–1.0)
id_selfie_match	boolean \| null	Whether ID and selfie belong to the same person
blacklist_matches	array<object>	Blacklist hits: {blacklist_id, similarity, blacklist_source, fraud_type, case_reference}
liveness_score	float \| null	Anti-spoofing liveness score for selfie (0.0–1.0; null if no selfie)
id_number_ocr_match	boolean \| null	Whether OCR-extracted ID number matches declared ID number
flags	array<string>	FLAG_BLACKLIST_HIT, FLAG_IDENTITY_MISMATCH, FLAG_LIVENESS_FAIL, FLAG_ID_NUMBER_MISMATCH, FLAG_FACE_NOT_FOUND
risk_score	float	Normalised risk contribution 0.0–1.0
verdict	string	PASS \| FLAG \| INCONCLUSIVE

Example Response

{
  "id_face_detected": true,
  "selfie_face_detected": true,
  "id_selfie_similarity": 0.73,
  "id_selfie_match": true,
  "blacklist_matches": [
    {
      "blacklist_id": "BL-0047",
      "similarity": 0.91,
      "blacklist_source": "INTERNAL",
      "fraud_type": "staged_motor_claim",
      "case_reference": "CLM-MT-2023-0088"
    }
  ],
  "liveness_score": 0.96,
  "id_number_ocr_match": true,
  "flags": ["FLAG_BLACKLIST_HIT"],
  "risk_score": 0.96,
  "verdict": "FLAG"
}

How It Works

Insurance fraud is often a repeat enterprise. Once a fraudster has successfully extracted a payout, they or their associates frequently attempt the same scheme again — sometimes with the same identity, sometimes with slight variations. Maintaining a cross-insurer blacklist and matching new claimants against it closes this repeat-offender loophole.

AGT-BEH-014 operates in two verification modes simultaneously. The first mode is identity verification: confirming that the selfie matches the ID document. This catches identity theft where someone files a claim under another person's name. The second mode is blacklist matching: checking whether the face in the submitted documents matches any known fraudster.

The use of dual embedding models (FaceNet512 and ArcFace) for ensemble voting is a deliberate design choice. Face recognition models fail in specific ways — lighting conditions, aging, partial occlusion — that are not perfectly correlated between models. Requiring both models to agree reduces false positives without significantly reducing sensitivity.

The liveness detection component addresses a specific attack vector: photograph injection, where a fraudster submits a photo of their accomplice (who is not on the blacklist) rather than a genuine selfie of themselves. The anti-spoofing model detects the texture and depth cues that distinguish a live face capture from a photograph of a photograph.

The Faiss approximate nearest-neighbour index makes blacklist matching fast enough for real-time use. Rather than computing exact cosine similarities against all 500,000 blacklist entries, Faiss uses Hierarchical Navigable Small World (HNSW) graph structure to find approximate nearest neighbours in milliseconds.

A positive blacklist match is the most serious flag in the entire fraud detection system and triggers immediate escalation to the special investigation unit.

Thinking Steps

Face Detection & Alignment

Run MTCNN on both the ID document image and selfie to detect face bounding boxes and 5-point facial landmarks (eye centres, nose, mouth corners). Use the landmarks to align the face to a canonical pose (eyes horizontal, nose centred) for consistent embedding computation.

MTCNN is used rather than simpler detectors because it explicitly handles the partial face occlusions common in ID document photos (shadows, plastic sheen, hologram interference).

Dual-Model Embedding Computation

Compute face embeddings using both FaceNet512 (512-dim) and ArcFace (512-dim) on the aligned face crops. FaceNet512 was trained on VGGFace2 (3.3M images); ArcFace uses angular margin loss for better inter-class separability. Both embeddings are L2-normalised.

Using two independent models provides ensemble robustness: a positive identification requires both models to agree above threshold, reducing false positive rate.

ID / Selfie Cross-Verification

Compute cosine similarity between the ID face embedding and the selfie face embedding. Threshold 0.68 is calibrated for FaceNet512 to achieve FAR (False Accept Rate) < 0.1% at TAR (True Accept Rate) 99.0%. Mismatch below threshold flags FLAG_IDENTITY_MISMATCH.

The similarity threshold must be calibrated per-model. A threshold of 0.68 for FaceNet512 does not apply to other models — this is a common implementation error.

Liveness Detection (Anti-Spoofing)

Apply a Silent Face Anti-Spoofing network to the selfie to detect if it is a live capture or a photo-of-photo attack (pointing a phone at a printed photo or another phone screen). The anti-spoofing model outputs a liveness score. Scores below 0.5 flag FLAG_LIVENESS_FAIL.

Photo injection attacks (submitting a pre-existing photo as a selfie) are the most common way fraudsters attempt to bypass facial recognition.

Blacklist Database Search (Faiss ANN)

Run approximate nearest-neighbour search using Faiss on the internal blacklist embedding database. The database contains embeddings of all previously confirmed insurance fraudsters across all insurers (shared via industry consortium). Any match with cosine similarity ≥ match_threshold triggers FLAG_BLACKLIST_HIT.

Faiss HNSW index enables sub-10 ms search over a 500,000-entry blacklist database on CPU.

ID Number OCR Verification

Extract the ID number text from the ID document image using Tesseract OCR (with custom national ID font tessdata). Compare against the declared claimant_id_number. A mismatch suggests the ID document has been modified or does not belong to the claimant.

Vietnamese CCCD (citizen card) numbers follow a specific 12-digit format with province and date codes that can be validated independently.

Risk Escalation

A FLAG_BLACKLIST_HIT with similarity ≥ 0.85 triggers an immediate CRITICAL escalation (bypassing normal queue) to the anti-fraud investigation team, with the case reference from the blacklist database. This is the only agent in the system that can trigger an immediate policy freeze.

The policy freeze authority requires human confirmation within 4 hours or it is automatically lifted — preventing false positives from wrongly blocking legitimate customers.

Thinking Tree

Root Question: Is the claimant's identity legitimate and not on the fraud blacklist?
- Face detection on ID document
  - Face detected → proceed to embedding
  - No face detected → FLAG_FACE_NOT_FOUND
- ID vs selfie cross-verification (if selfie provided)
  - Similarity ≥ 0.68 → same person
  - Similarity < 0.68 → FLAG_IDENTITY_MISMATCH
- Liveness detection on selfie
  - Liveness score ≥ 0.5 → genuine selfie
  - Liveness score < 0.5 → FLAG_LIVENESS_FAIL
- Blacklist database search
  - No match above threshold → PASS
  - Match found (0.68–0.84) → FLAG_BLACKLIST_HIT (review)
  - Match found (≥ 0.85) → FLAG_BLACKLIST_HIT (CRITICAL escalation)
- ID number OCR validation
  - OCR matches declared ID number → PASS
  - OCR mismatch → FLAG_ID_NUMBER_MISMATCH

Decision Tree

Is a face successfully detected in the ID document image?

Yes → d2 No → flag_no_face

Is a selfie provided?

Yes → d3 No → d4

ID and selfie cosine similarity ≥ 0.68 (same person)?

Yes → d4 No → flag_id_mismatch

Selfie liveness score ≥ 0.50?

Yes → d5 No → flag_liveness

Blacklist search returns match ≥ threshold?

Yes → flag_blacklist No → d6

OCR-extracted ID number matches declared ID number?

Yes → pass No → flag_id_num

FLAG — FACE_NOT_FOUND: Cannot extract face from submitted documents

flag_no_face

FLAG — IDENTITY_MISMATCH: Selfie does not match ID document face

flag_id_mismatch

FLAG — LIVENESS_FAIL: Selfie failed anti-spoofing check; may be a photograph

flag_liveness

FLAG — BLACKLIST_HIT: Face matches confirmed insurance fraudster in blacklist database

flag_blacklist

FLAG — ID_NUMBER_MISMATCH: OCR-extracted number differs from claimant's declared number

flag_id_num

PASS — Identity verified, no blacklist match, liveness confirmed

pass

Technical Design

Architecture

AGT-BEH-014 is a synchronous FastAPI microservice (low latency required for real-time onboarding). All models are loaded into memory at startup. Faiss HNSW index is loaded from disk and kept in memory. The blacklist database is updated daily from the industry consortium data feed. Total request latency: 800–1500 ms for dual-model embedding + Faiss search.

Components

Component	Role	Technology
MTCNNDetector	Face detection and landmark-based alignment	MTCNN via facenet-pytorch
FaceNet512Embedder	Primary 512-dim face embedding	DeepFace FaceNet512 model
ArcFaceEmbedder	Secondary 512-dim face embedding for ensemble	InsightFace ArcFace (buffalo_l)
LivenessDetector	Anti-spoofing classification on selfie	Silent Face Anti-Spoofing (MiniFASNet)
FaissIndexSearcher	Approximate NN search over blacklist embeddings	Faiss HNSW index (CPU)
IDNumberOCR	Extracts ID number text from document image	Tesseract 5.x + custom CCCD tessdata
EmbeddingStore	Persists new embeddings for future matching	PostgreSQL pgvector extension
EscalationRouter	Triggers immediate alert on critical blacklist hit	FastAPI BackgroundTasks + webhook

Architecture Diagram

┌───────────────────────────────────┐
│  POST /verify                     │
│  (id_document + selfie)           │
└────────────────┬──────────────────┘
                 │
          ┌──────┴──────┐
          ▼             ▼
┌──────────────┐ ┌──────────────────┐
│ ID Document  │ │     Selfie       │
│ MTCNNDetect  │ │  MTCNNDetect +   │
│              │ │  LivenessDetect  │
└──────┬───────┘ └──────┬───────────┘
       │                │
       ▼                ▼
┌──────────────┐ ┌──────────────────┐
│ FaceNet512   │ │  ArcFace         │
│ Embedder     │ │  Embedder        │
└──────┬───────┘ └──────┬───────────┘
       │                │
       └───────┬─────────┘
               │ ensemble
               ▼
┌──────────────────────────────┐
│     FaissIndexSearcher       │
│  (blacklist ANN search)      │
└──────────────┬───────────────┘
               │
        ┌──────┴──────┐
        ▼             ▼
┌──────────────┐ ┌────────────────┐
│ IDNumberOCR  │ │ EscalationRouter│
│              │ │ (if critical)   │
└──────┬───────┘ └────────────────┘
       │
       ▼
  JSON verdict

Data Flow

API Gateway MTCNNDetector | ID document image + selfie image

MTCNNDetector FaceNet512Embedder | Aligned face crops (112×112)

MTCNNDetector ArcFaceEmbedder | Aligned face crops (112×112)

MTCNNDetector LivenessDetector | Selfie face crop

FaceNet512Embedder FaissIndexSearcher | 512-dim L2-normalised embedding

ArcFaceEmbedder FaissIndexSearcher | 512-dim L2-normalised embedding

FaissIndexSearcher EscalationRouter | Match objects with similarity scores

FaissIndexSearcher EmbeddingStore | New embedding for future matching

EscalationRouter API Gateway | Full JSON verdict + alert status

Back to AI Agents AGT-BEH-014