Identity & Criminal Matcher
AGT-BEH-014 performs facial recognition and biometric identity verification against an insurance fraud blacklist database and public criminal records. The agent extracts faces from submitted ID documents (national ID cards, passports, driver's licences) and self-portrait photos (selfies), then computes 512-dimensional face embeddings using FaceNet512 and ArcFace models. These embeddings are compared against a proprietary blacklist database of confirmed insurance fraudsters, a shared industry blacklist (cross-insurer data sharing), and publicly accessible criminal conviction databases. A cosine similarity score above the configured threshold triggers an identity match alert. The agent also performs liveness detection to prevent photo injection attacks, and cross-validates the extracted face against the face photo embedded in the ID document.
Tech Stack
Input
ID document image, selfie/portrait photo, and optional policy claimant data for context.
Accepted Formats
Fields
| Name | Type | Req | Description |
|---|---|---|---|
| id_document_image | binary | Yes | National ID, passport, or driver's licence scan |
| selfie_image | binary | No | Recent self-portrait photo of the claimant for liveness-assured matching |
| claim_id | string | Yes | Claim ID for audit trail |
| claimant_id_number | string | No | Declared national ID number for cross-referencing against document OCR |
| match_threshold | float | No | Cosine similarity threshold for a positive match (default: 0.68) |
Output
Identity verification result, blacklist match details, liveness verdict, and overall risk assessment.
Format:
JSONFields
| Name | Type | Description |
|---|---|---|
| id_face_detected | boolean | Whether a face was successfully detected in the ID document |
| selfie_face_detected | boolean | Whether a face was detected in the selfie |
| id_selfie_similarity | float | null | Cosine similarity between ID face and selfie embeddings (0.0–1.0) |
| id_selfie_match | boolean | null | Whether ID and selfie belong to the same person |
| blacklist_matches | array<object> | Blacklist hits: {blacklist_id, similarity, blacklist_source, fraud_type, case_reference} |
| liveness_score | float | null | Anti-spoofing liveness score for selfie (0.0–1.0; null if no selfie) |
| id_number_ocr_match | boolean | null | Whether OCR-extracted ID number matches declared ID number |
| flags | array<string> | FLAG_BLACKLIST_HIT, FLAG_IDENTITY_MISMATCH, FLAG_LIVENESS_FAIL, FLAG_ID_NUMBER_MISMATCH, FLAG_FACE_NOT_FOUND |
| risk_score | float | Normalised risk contribution 0.0–1.0 |
| verdict | string | PASS | FLAG | INCONCLUSIVE |
Example Response
{
"id_face_detected": true,
"selfie_face_detected": true,
"id_selfie_similarity": 0.73,
"id_selfie_match": true,
"blacklist_matches": [
{
"blacklist_id": "BL-0047",
"similarity": 0.91,
"blacklist_source": "INTERNAL",
"fraud_type": "staged_motor_claim",
"case_reference": "CLM-MT-2023-0088"
}
],
"liveness_score": 0.96,
"id_number_ocr_match": true,
"flags": ["FLAG_BLACKLIST_HIT"],
"risk_score": 0.96,
"verdict": "FLAG"
}
How It Works
Insurance fraud is often a repeat enterprise. Once a fraudster has successfully extracted a payout, they or their associates frequently attempt the same scheme again — sometimes with the same identity, sometimes with slight variations. Maintaining a cross-insurer blacklist and matching new claimants against it closes this repeat-offender loophole.
AGT-BEH-014 operates in two verification modes simultaneously. The first mode is identity verification: confirming that the selfie matches the ID document. This catches identity theft where someone files a claim under another person's name. The second mode is blacklist matching: checking whether the face in the submitted documents matches any known fraudster.
The use of dual embedding models (FaceNet512 and ArcFace) for ensemble voting is a deliberate design choice. Face recognition models fail in specific ways — lighting conditions, aging, partial occlusion — that are not perfectly correlated between models. Requiring both models to agree reduces false positives without significantly reducing sensitivity.
The liveness detection component addresses a specific attack vector: photograph injection, where a fraudster submits a photo of their accomplice (who is not on the blacklist) rather than a genuine selfie of themselves. The anti-spoofing model detects the texture and depth cues that distinguish a live face capture from a photograph of a photograph.
The Faiss approximate nearest-neighbour index makes blacklist matching fast enough for real-time use. Rather than computing exact cosine similarities against all 500,000 blacklist entries, Faiss uses Hierarchical Navigable Small World (HNSW) graph structure to find approximate nearest neighbours in milliseconds.
A positive blacklist match is the most serious flag in the entire fraud detection system and triggers immediate escalation to the special investigation unit.
Thinking Steps
Face Detection & Alignment
Run MTCNN on both the ID document image and selfie to detect face bounding boxes and 5-point facial landmarks (eye centres, nose, mouth corners). Use the landmarks to align the face to a canonical pose (eyes horizontal, nose centred) for consistent embedding computation.
MTCNN is used rather than simpler detectors because it explicitly handles the partial face occlusions common in ID document photos (shadows, plastic sheen, hologram interference).
Dual-Model Embedding Computation
Compute face embeddings using both FaceNet512 (512-dim) and ArcFace (512-dim) on the aligned face crops. FaceNet512 was trained on VGGFace2 (3.3M images); ArcFace uses angular margin loss for better inter-class separability. Both embeddings are L2-normalised.
Using two independent models provides ensemble robustness: a positive identification requires both models to agree above threshold, reducing false positive rate.
ID / Selfie Cross-Verification
Compute cosine similarity between the ID face embedding and the selfie face embedding. Threshold 0.68 is calibrated for FaceNet512 to achieve FAR (False Accept Rate) < 0.1% at TAR (True Accept Rate) 99.0%. Mismatch below threshold flags FLAG_IDENTITY_MISMATCH.
The similarity threshold must be calibrated per-model. A threshold of 0.68 for FaceNet512 does not apply to other models — this is a common implementation error.
Liveness Detection (Anti-Spoofing)
Apply a Silent Face Anti-Spoofing network to the selfie to detect if it is a live capture or a photo-of-photo attack (pointing a phone at a printed photo or another phone screen). The anti-spoofing model outputs a liveness score. Scores below 0.5 flag FLAG_LIVENESS_FAIL.
Photo injection attacks (submitting a pre-existing photo as a selfie) are the most common way fraudsters attempt to bypass facial recognition.
Blacklist Database Search (Faiss ANN)
Run approximate nearest-neighbour search using Faiss on the internal blacklist embedding database. The database contains embeddings of all previously confirmed insurance fraudsters across all insurers (shared via industry consortium). Any match with cosine similarity ≥ match_threshold triggers FLAG_BLACKLIST_HIT.
Faiss HNSW index enables sub-10 ms search over a 500,000-entry blacklist database on CPU.
ID Number OCR Verification
Extract the ID number text from the ID document image using Tesseract OCR (with custom national ID font tessdata). Compare against the declared claimant_id_number. A mismatch suggests the ID document has been modified or does not belong to the claimant.
Vietnamese CCCD (citizen card) numbers follow a specific 12-digit format with province and date codes that can be validated independently.
Risk Escalation
A FLAG_BLACKLIST_HIT with similarity ≥ 0.85 triggers an immediate CRITICAL escalation (bypassing normal queue) to the anti-fraud investigation team, with the case reference from the blacklist database. This is the only agent in the system that can trigger an immediate policy freeze.
The policy freeze authority requires human confirmation within 4 hours or it is automatically lifted — preventing false positives from wrongly blocking legitimate customers.
Thinking Tree
-
Root Question: Is the claimant's identity legitimate and not on the fraud blacklist?
-
Face detection on ID document
- Face detected → proceed to embedding
- No face detected → FLAG_FACE_NOT_FOUND
-
ID vs selfie cross-verification (if selfie provided)
- Similarity ≥ 0.68 → same person
- Similarity < 0.68 → FLAG_IDENTITY_MISMATCH
-
Liveness detection on selfie
- Liveness score ≥ 0.5 → genuine selfie
- Liveness score < 0.5 → FLAG_LIVENESS_FAIL
-
Blacklist database search
- No match above threshold → PASS
- Match found (0.68–0.84) → FLAG_BLACKLIST_HIT (review)
- Match found (≥ 0.85) → FLAG_BLACKLIST_HIT (CRITICAL escalation)
-
ID number OCR validation
- OCR matches declared ID number → PASS
- OCR mismatch → FLAG_ID_NUMBER_MISMATCH
-
Face detection on ID document
Decision Tree
Is a face successfully detected in the ID document image?
Is a selfie provided?
ID and selfie cosine similarity ≥ 0.68 (same person)?
Selfie liveness score ≥ 0.50?
Blacklist search returns match ≥ threshold?
OCR-extracted ID number matches declared ID number?
FLAG — FACE_NOT_FOUND: Cannot extract face from submitted documents
FLAG — IDENTITY_MISMATCH: Selfie does not match ID document face
FLAG — LIVENESS_FAIL: Selfie failed anti-spoofing check; may be a photograph
FLAG — BLACKLIST_HIT: Face matches confirmed insurance fraudster in blacklist database
FLAG — ID_NUMBER_MISMATCH: OCR-extracted number differs from claimant's declared number
PASS — Identity verified, no blacklist match, liveness confirmed
Technical Design
Architecture
AGT-BEH-014 is a synchronous FastAPI microservice (low latency required for real-time onboarding). All models are loaded into memory at startup. Faiss HNSW index is loaded from disk and kept in memory. The blacklist database is updated daily from the industry consortium data feed. Total request latency: 800–1500 ms for dual-model embedding + Faiss search.
Components
| Component | Role | Technology |
|---|---|---|
| MTCNNDetector | Face detection and landmark-based alignment | MTCNN via facenet-pytorch |
| FaceNet512Embedder | Primary 512-dim face embedding | DeepFace FaceNet512 model |
| ArcFaceEmbedder | Secondary 512-dim face embedding for ensemble | InsightFace ArcFace (buffalo_l) |
| LivenessDetector | Anti-spoofing classification on selfie | Silent Face Anti-Spoofing (MiniFASNet) |
| FaissIndexSearcher | Approximate NN search over blacklist embeddings | Faiss HNSW index (CPU) |
| IDNumberOCR | Extracts ID number text from document image | Tesseract 5.x + custom CCCD tessdata |
| EmbeddingStore | Persists new embeddings for future matching | PostgreSQL pgvector extension |
| EscalationRouter | Triggers immediate alert on critical blacklist hit | FastAPI BackgroundTasks + webhook |
Architecture Diagram
┌───────────────────────────────────┐
│ POST /verify │
│ (id_document + selfie) │
└────────────────┬──────────────────┘
│
┌──────┴──────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ ID Document │ │ Selfie │
│ MTCNNDetect │ │ MTCNNDetect + │
│ │ │ LivenessDetect │
└──────┬───────┘ └──────┬───────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ FaceNet512 │ │ ArcFace │
│ Embedder │ │ Embedder │
└──────┬───────┘ └──────┬───────────┘
│ │
└───────┬─────────┘
│ ensemble
▼
┌──────────────────────────────┐
│ FaissIndexSearcher │
│ (blacklist ANN search) │
└──────────────┬───────────────┘
│
┌──────┴──────┐
▼ ▼
┌──────────────┐ ┌────────────────┐
│ IDNumberOCR │ │ EscalationRouter│
│ │ │ (if critical) │
└──────┬───────┘ └────────────────┘
│
▼
JSON verdict
Data Flow