Sentinel Core

Manager Portal

M
AI Agents / Risk Aggregator
AGT-REA-002 Reasoning & Synthesis AI-based

Risk Aggregator

AGT-REA-002 is the central reasoning engine that synthesises all individual agent outputs into a single, calibrated fraud risk score with full explainability. Rather than simply averaging scores, the agent uses Bayesian inference (PyMC3) to properly model the statistical dependencies between agent signals — if both the ELA Pixel Detective and the EXIF Metadata Analyst flag the same image, this is not twice the evidence, but evidence of a systematic image manipulation that should be weighted accordingly. An XGBoost meta-learner, trained on thousands of historical adjudication decisions, maps the multi-agent signal vector to a final risk probability. SHAP (SHapley Additive exPlanations) provides per-agent contribution explanations that are surfaced directly to adjudicators, showing exactly which agents drove the final score.

Tech Stack

Python 3.11 Runtime
PyMC3 / PyMC 4.x Bayesian probabilistic inference and signal correlation modelling
XGBoost 2.x Gradient boosting meta-learner for final risk probability
SHAP 0.43 Shapley value computation for per-agent contribution explanations
scikit-learn 1.x Preprocessing, calibration, and baseline classifiers
pandas 2.x Signal vector construction and feature assembly
Platt scaling Probability calibration to ensure output is a true probability
FastAPI REST API endpoint

Input

The structured output objects from all upstream agents for a single claim, plus claim metadata.

Accepted Formats

JSON

Fields

Name Type Req Description
claim_id string Yes Claim identifier
agent_results object Yes Dict keyed by agent ID containing each agent's full output JSON: {AGT-FOR-001: {...}, AGT-FOR-002: {...}, ...}
claim_metadata object Yes Claim context: {claim_type, claim_amount_vnd, claimant_age, policy_tenure_years, prior_claims_count}
run_bayesian boolean No Whether to run full Bayesian inference (slower but more calibrated) vs XGBoost-only (default: true)

Output

Final calibrated fraud risk score, per-agent SHAP contributions, Bayesian posterior, and a human-readable verdict narrative.

Format:

JSON

Fields

Name Type Description
final_risk_score float Calibrated fraud probability 0.0–1.0
risk_tier string LOW (< 0.25) | MEDIUM (0.25–0.60) | HIGH (0.60–0.85) | CRITICAL (> 0.85)
xgboost_probability float XGBoost meta-learner raw probability before calibration
bayesian_posterior_mean float Posterior mean from Bayesian inference model
bayesian_95ci array<float> [lower, upper] 95% credible interval of fraud probability
shap_contributions array<object> Per-agent SHAP values: {agent_id, agent_name, contribution, direction, raw_score}
top_fraud_signals array<string> Top 3 human-readable fraud signals driving the score
recommended_action string AUTO_APPROVE | STANDARD_REVIEW | PRIORITY_REVIEW | INVESTIGATE | AUTO_DENY
confidence_interval object {lower_95: float, upper_95: float} — model uncertainty range
verdict_narrative string 2–3 sentence human-readable summary for the adjudicator
flags_summary array<string> All unique flags raised by any agent
agents_run int Number of agents that provided valid results
risk_score float Same as final_risk_score (unified interface field)
verdict string PASS | FLAG | ESCALATE

Example Response

{
  "final_risk_score": 0.92,
  "risk_tier": "CRITICAL",
  "bayesian_posterior_mean": 0.91,
  "bayesian_95ci": [0.84, 0.97],
  "shap_contributions": [
    {"agent_id": "AGT-FOR-001", "agent_name": "EXIF Metadata Analyst", "contribution": 0.31, "direction": "increase", "raw_score": 0.92},
    {"agent_id": "AGT-BEH-014", "agent_name": "Identity Matcher", "contribution": 0.28, "direction": "increase", "raw_score": 0.96},
    {"agent_id": "AGT-MOT-006", "agent_name": "ALPR Detective", "contribution": 0.19, "direction": "increase", "raw_score": 0.95}
  ],
  "top_fraud_signals": [
    "Image GPS coordinates 34.7 km from declared incident location",
    "Claimant face matches confirmed fraudster BL-0047 (similarity 0.91)",
    "Vehicle plate is subject to active court seizure order"
  ],
  "recommended_action": "INVESTIGATE",
  "verdict_narrative": "This claim exhibits three independent high-confidence fraud indicators: GPS metadata mismatch, identity blacklist match, and a legally seized vehicle. The Bayesian model assigns a 92% fraud probability (95% CI: 84%–97%). Immediate referral to the Special Investigation Unit is recommended.",
  "risk_score": 0.92,
  "verdict": "ESCALATE"
}

How It Works

The Risk Aggregator is the final stage in the multi-agent fraud detection pipeline and addresses a fundamental challenge in ensemble learning: how to combine multiple imperfect, correlated detectors into a single calibrated probability estimate.

Simple averaging — or even weighted averaging — fails in practice because it ignores the statistical dependencies between agents. When both the EXIF Analyst and the ELA Pixel Detective flag the same image, this is not two independent pieces of evidence — they both examined the same file and may have detected different manifestations of the same underlying manipulation. Treating them as independent doubles the weight inappropriately.

The Bayesian inference layer, implemented in PyMC3, explicitly models these correlations using a hierarchical generative model. Agent signals are treated as noisy observations of the latent fraud variable, with correlation structure learned from historical data. The posterior distribution over the fraud probability correctly accounts for shared evidence while properly weighting independent signals from different modalities (image forensics vs. behavioural vs. vehicle databases).

The XGBoost meta-learner serves a complementary role: it learns non-linear interaction patterns from the training data that the Bayesian model cannot capture without explicit prior specification. Together, the two approaches produce a final score that is both statistically rigorous (Bayesian) and empirically optimised on historical adjudication outcomes (XGBoost).

The SHAP explanation layer is not merely an add-on — it is a core requirement for regulatory compliance. Insurance regulators in many jurisdictions require that denial decisions be explainable. SHAP provides the mathematically grounded, per-agent attribution that enables adjudicators to understand and justify their decisions.

The output is designed to support the adjudicator's workflow directly: a tier label, a recommended action, the top three signals in plain language, and a confidence interval that quantifies how certain the model is. For CRITICAL-tier claims, the system generates a pre-filled investigation report template.

Thinking Steps

1

Signal Vector Construction

Parse each agent's output and extract the key numerical signals: risk_score, flags count, confidence values for specific checks. Handle missing agent results gracefully — if an agent could not run (file format issue, timeout), impute its contribution as the prior probability 0.15 (base rate fraud in the portfolio).

Imputing missing agents with the prior rather than zero prevents the model from artificially lowering risk scores when data is unavailable.

2

Feature Engineering for Meta-Learner

Beyond raw agent scores, compute interaction features: (AGT-FOR-001 score × AGT-FOR-002 score) captures cases where both image forensic agents flag the same image; (AGT-BEH-001 score × AGT-MOT-006 score) captures the combination of network fraud and vehicle fraud. Also include claim metadata features: claim amount z-score relative to type, policy tenure, and prior claims count.

Interaction features are among the most important features in the XGBoost model — they capture the synergistic evidence from multiple agents flagging simultaneously.

3

XGBoost Meta-Learner Inference

Pass the feature vector through the XGBoost gradient boosting model. The model was trained on 28,000 historical claims with ground-truth adjudication labels (FRAUD/LEGITIMATE/INCONCLUSIVE). It learns optimal non-linear combinations of agent scores, outperforming simple weighted averaging by 8.3 AUC points.

XGBoost handles missing values natively, which is critical when some agents cannot process certain document types.

4

Bayesian Inference (Signal Dependency Modelling)

Run a PyMC3 hierarchical Bayesian model that explicitly models the statistical dependencies between agent signals. For instance, AGT-FOR-001 and AGT-FOR-002 both analyse images from the same file — their signals are correlated. The Bayesian model uses MCMC sampling to compute a posterior distribution over fraud probability, accounting for these correlations.

The Bayesian model produces uncertainty estimates (credible intervals) that the XGBoost model cannot. These intervals are valuable for borderline cases — a 0.60 score with CI [0.55, 0.65] is very different from [0.40, 0.80].

5

Probability Calibration

Both the XGBoost and Bayesian outputs are calibrated using Platt scaling, fitted on the held-out validation set. Calibration ensures the output is a true probability: a score of 0.8 should mean 80% of claims at that score are actually fraudulent, not just 'relatively high risk'.

Calibration is critical for the AUTO_APPROVE threshold: if the model outputs 0.10 for a claim, we need to be confident that represents a 10% fraud rate, not a poorly calibrated 'low score'.

6

SHAP Explainability

Compute SHAP (Shapley Additive exPlanations) values for the XGBoost prediction. SHAP values provide the theoretically correct attribution of each feature's contribution to the final score, based on cooperative game theory. Aggregate individual feature contributions back to agent level for the shap_contributions output.

SHAP provides the only attribution method with mathematical guarantees of consistency and local accuracy — other methods like LIME or simple feature importance can be misleading.

7

Action Recommendation & Narrative Generation

Map final_risk_score to recommended_action thresholds (configured per claim type and amount). Generate a verdict_narrative that summarises the top 3 SHAP contributors in plain language, designed to be read by a non-technical adjudicator in under 30 seconds.

Action thresholds are configurable and can be adjusted based on risk appetite. Higher claim amounts lower the AUTO_APPROVE threshold.

Thinking Tree

  • Root Question: What is the calibrated fraud probability for this claim, and which agents drove it?
    • Signal vector assembly from all agents
      • All agents responded → full feature vector
      • Some agents missing → impute with prior 0.15
    • XGBoost meta-learner inference
      • Considers interaction features between agent pairs
      • Considers claim metadata (amount, tenure, prior claims)
    • Bayesian posterior computation
      • Models correlations between same-modality agents
      • Outputs posterior mean + 95% credible interval
    • Final risk tier assignment
      • < 0.25 → LOW → AUTO_APPROVE
      • 0.25–0.60 → MEDIUM → STANDARD_REVIEW
      • 0.60–0.85 → HIGH → PRIORITY_REVIEW
      • > 0.85 → CRITICAL → INVESTIGATE

Decision Tree

Are at least 3 agent results available?

Yes → d2 No → d2_partial
d1

Final calibrated risk score > 0.85?

Yes → escalate No → d3
d2

Available agents show consensus risk > 0.70?

Yes → escalate No → d3
d2_partial

Risk score 0.60–0.85?

Yes → priority_review No → d4
d3

Risk score 0.25–0.60?

Yes → standard_review No → d5
d4

Risk score < 0.25 AND Bayesian 95% CI upper bound < 0.30?

Yes → auto_approve No → standard_review
d5

ESCALATE — CRITICAL tier: Risk score > 85%. Refer to Special Investigation Unit immediately

escalate

FLAG — HIGH tier: Risk score 60–85%. Priority queue for senior adjudicator review

priority_review

FLAG — MEDIUM tier: Risk score 25–60%. Standard adjudicator review required

standard_review

PASS — LOW tier: Risk score < 25% with narrow CI. Eligible for auto-approval

auto_approve

INCONCLUSIVE — Insufficient agent data; default to manual review

inconclusive

Technical Design

Architecture

AGT-REA-002 is a synchronous FastAPI microservice. XGBoost inference runs on CPU in <10 ms. PyMC3 Bayesian inference uses pre-compiled MCMC chains from a variational inference warm-start, completing in ~200–500 ms. SHAP values are computed post-hoc using the TreeExplainer (optimised for tree-based models). The service is the final stage in the claim processing pipeline and is called after all other agents have returned results.

Components

Component Role Technology
SignalVectorBuilder Assembles feature vector from all agent outputs pandas + custom feature engineering
XGBoostMetaLearner Gradient boosting final risk probability XGBoost 2.x ONNX or native
BayesianInferenceEngine Hierarchical Bayesian model with agent correlation structure PyMC3 / NUTS sampler
ProbabilityCalibrator Platt scaling to calibrate raw probabilities scikit-learn CalibratedClassifierCV
SHAPExplainer Shapley value computation for XGBoost model SHAP TreeExplainer
ActionRouter Maps risk score to recommended action thresholds Python rule engine + config
NarrativeGenerator Generates human-readable verdict summary Python f-string templates + SHAP top features
AuditLogger Persists full decision record for compliance PostgreSQL + SQLAlchemy

Architecture Diagram

┌────────────────────────────────────┐
│  POST /aggregate                   │
│  (agent_results{} + claim_metadata)│
└────────────────┬───────────────────┘
                 │
                 ▼
┌────────────────────────────────────┐
│       SignalVectorBuilder          │
│  (raw scores + interaction features│
│   + claim metadata features)       │
└────────────────┬───────────────────┘
                 │
          ┌──────┴──────┐
          ▼             ▼
┌──────────────┐  ┌──────────────────┐
│  XGBoost     │  │   Bayesian       │
│  Meta-       │  │   Inference      │
│  Learner     │  │   Engine (PyMC3) │
└──────┬───────┘  └──────┬───────────┘
       │                 │
       ▼                 │
┌──────────────┐         │
│ Probability  │◄────────┘
│ Calibrator   │  ensemble merge
└──────┬───────┘
       │
       ▼
┌──────────────────────┐
│    SHAPExplainer     │
│  (TreeExplainer)     │
└──────────┬───────────┘
           │
    ┌──────┴──────┐
    ▼             ▼
┌──────────┐  ┌──────────────┐
│ Action   │  │  Narrative   │
│ Router   │  │  Generator   │
└──────┬───┘  └──────┬───────┘
       └──────┬───────┘
              │
              ▼
    ┌──────────────────┐
    │   AuditLogger    │
    └──────────────────┘

Data Flow

Pipeline Orchestrator SignalVectorBuilder | All agent result objects + claim metadata
SignalVectorBuilder XGBoostMetaLearner | Numerical feature vector (45 features)
SignalVectorBuilder BayesianInferenceEngine | Agent score array + correlation priors
XGBoostMetaLearner ProbabilityCalibrator | Raw XGBoost probability
BayesianInferenceEngine ProbabilityCalibrator | Posterior mean + credible interval
ProbabilityCalibrator SHAPExplainer | Calibrated probability + XGBoost model reference
SHAPExplainer NarrativeGenerator | Per-agent SHAP contributions (sorted)
NarrativeGenerator ActionRouter | Verdict narrative text
ActionRouter AuditLogger | Full decision record
AuditLogger Pipeline Orchestrator | JSON verdict with SHAP + narrative + recommendation