Risk Aggregator
AGT-REA-002 is the central reasoning engine that synthesises all individual agent outputs into a single, calibrated fraud risk score with full explainability. Rather than simply averaging scores, the agent uses Bayesian inference (PyMC3) to properly model the statistical dependencies between agent signals — if both the ELA Pixel Detective and the EXIF Metadata Analyst flag the same image, this is not twice the evidence, but evidence of a systematic image manipulation that should be weighted accordingly. An XGBoost meta-learner, trained on thousands of historical adjudication decisions, maps the multi-agent signal vector to a final risk probability. SHAP (SHapley Additive exPlanations) provides per-agent contribution explanations that are surfaced directly to adjudicators, showing exactly which agents drove the final score.
Tech Stack
Input
The structured output objects from all upstream agents for a single claim, plus claim metadata.
Accepted Formats
Fields
| Name | Type | Req | Description |
|---|---|---|---|
| claim_id | string | Yes | Claim identifier |
| agent_results | object | Yes | Dict keyed by agent ID containing each agent's full output JSON: {AGT-FOR-001: {...}, AGT-FOR-002: {...}, ...} |
| claim_metadata | object | Yes | Claim context: {claim_type, claim_amount_vnd, claimant_age, policy_tenure_years, prior_claims_count} |
| run_bayesian | boolean | No | Whether to run full Bayesian inference (slower but more calibrated) vs XGBoost-only (default: true) |
Output
Final calibrated fraud risk score, per-agent SHAP contributions, Bayesian posterior, and a human-readable verdict narrative.
Format:
JSONFields
| Name | Type | Description |
|---|---|---|
| final_risk_score | float | Calibrated fraud probability 0.0–1.0 |
| risk_tier | string | LOW (< 0.25) | MEDIUM (0.25–0.60) | HIGH (0.60–0.85) | CRITICAL (> 0.85) |
| xgboost_probability | float | XGBoost meta-learner raw probability before calibration |
| bayesian_posterior_mean | float | Posterior mean from Bayesian inference model |
| bayesian_95ci | array<float> | [lower, upper] 95% credible interval of fraud probability |
| shap_contributions | array<object> | Per-agent SHAP values: {agent_id, agent_name, contribution, direction, raw_score} |
| top_fraud_signals | array<string> | Top 3 human-readable fraud signals driving the score |
| recommended_action | string | AUTO_APPROVE | STANDARD_REVIEW | PRIORITY_REVIEW | INVESTIGATE | AUTO_DENY |
| confidence_interval | object | {lower_95: float, upper_95: float} — model uncertainty range |
| verdict_narrative | string | 2–3 sentence human-readable summary for the adjudicator |
| flags_summary | array<string> | All unique flags raised by any agent |
| agents_run | int | Number of agents that provided valid results |
| risk_score | float | Same as final_risk_score (unified interface field) |
| verdict | string | PASS | FLAG | ESCALATE |
Example Response
{
"final_risk_score": 0.92,
"risk_tier": "CRITICAL",
"bayesian_posterior_mean": 0.91,
"bayesian_95ci": [0.84, 0.97],
"shap_contributions": [
{"agent_id": "AGT-FOR-001", "agent_name": "EXIF Metadata Analyst", "contribution": 0.31, "direction": "increase", "raw_score": 0.92},
{"agent_id": "AGT-BEH-014", "agent_name": "Identity Matcher", "contribution": 0.28, "direction": "increase", "raw_score": 0.96},
{"agent_id": "AGT-MOT-006", "agent_name": "ALPR Detective", "contribution": 0.19, "direction": "increase", "raw_score": 0.95}
],
"top_fraud_signals": [
"Image GPS coordinates 34.7 km from declared incident location",
"Claimant face matches confirmed fraudster BL-0047 (similarity 0.91)",
"Vehicle plate is subject to active court seizure order"
],
"recommended_action": "INVESTIGATE",
"verdict_narrative": "This claim exhibits three independent high-confidence fraud indicators: GPS metadata mismatch, identity blacklist match, and a legally seized vehicle. The Bayesian model assigns a 92% fraud probability (95% CI: 84%–97%). Immediate referral to the Special Investigation Unit is recommended.",
"risk_score": 0.92,
"verdict": "ESCALATE"
}
How It Works
The Risk Aggregator is the final stage in the multi-agent fraud detection pipeline and addresses a fundamental challenge in ensemble learning: how to combine multiple imperfect, correlated detectors into a single calibrated probability estimate.
Simple averaging — or even weighted averaging — fails in practice because it ignores the statistical dependencies between agents. When both the EXIF Analyst and the ELA Pixel Detective flag the same image, this is not two independent pieces of evidence — they both examined the same file and may have detected different manifestations of the same underlying manipulation. Treating them as independent doubles the weight inappropriately.
The Bayesian inference layer, implemented in PyMC3, explicitly models these correlations using a hierarchical generative model. Agent signals are treated as noisy observations of the latent fraud variable, with correlation structure learned from historical data. The posterior distribution over the fraud probability correctly accounts for shared evidence while properly weighting independent signals from different modalities (image forensics vs. behavioural vs. vehicle databases).
The XGBoost meta-learner serves a complementary role: it learns non-linear interaction patterns from the training data that the Bayesian model cannot capture without explicit prior specification. Together, the two approaches produce a final score that is both statistically rigorous (Bayesian) and empirically optimised on historical adjudication outcomes (XGBoost).
The SHAP explanation layer is not merely an add-on — it is a core requirement for regulatory compliance. Insurance regulators in many jurisdictions require that denial decisions be explainable. SHAP provides the mathematically grounded, per-agent attribution that enables adjudicators to understand and justify their decisions.
The output is designed to support the adjudicator's workflow directly: a tier label, a recommended action, the top three signals in plain language, and a confidence interval that quantifies how certain the model is. For CRITICAL-tier claims, the system generates a pre-filled investigation report template.
Thinking Steps
Signal Vector Construction
Parse each agent's output and extract the key numerical signals: risk_score, flags count, confidence values for specific checks. Handle missing agent results gracefully — if an agent could not run (file format issue, timeout), impute its contribution as the prior probability 0.15 (base rate fraud in the portfolio).
Imputing missing agents with the prior rather than zero prevents the model from artificially lowering risk scores when data is unavailable.
Feature Engineering for Meta-Learner
Beyond raw agent scores, compute interaction features: (AGT-FOR-001 score × AGT-FOR-002 score) captures cases where both image forensic agents flag the same image; (AGT-BEH-001 score × AGT-MOT-006 score) captures the combination of network fraud and vehicle fraud. Also include claim metadata features: claim amount z-score relative to type, policy tenure, and prior claims count.
Interaction features are among the most important features in the XGBoost model — they capture the synergistic evidence from multiple agents flagging simultaneously.
XGBoost Meta-Learner Inference
Pass the feature vector through the XGBoost gradient boosting model. The model was trained on 28,000 historical claims with ground-truth adjudication labels (FRAUD/LEGITIMATE/INCONCLUSIVE). It learns optimal non-linear combinations of agent scores, outperforming simple weighted averaging by 8.3 AUC points.
XGBoost handles missing values natively, which is critical when some agents cannot process certain document types.
Bayesian Inference (Signal Dependency Modelling)
Run a PyMC3 hierarchical Bayesian model that explicitly models the statistical dependencies between agent signals. For instance, AGT-FOR-001 and AGT-FOR-002 both analyse images from the same file — their signals are correlated. The Bayesian model uses MCMC sampling to compute a posterior distribution over fraud probability, accounting for these correlations.
The Bayesian model produces uncertainty estimates (credible intervals) that the XGBoost model cannot. These intervals are valuable for borderline cases — a 0.60 score with CI [0.55, 0.65] is very different from [0.40, 0.80].
Probability Calibration
Both the XGBoost and Bayesian outputs are calibrated using Platt scaling, fitted on the held-out validation set. Calibration ensures the output is a true probability: a score of 0.8 should mean 80% of claims at that score are actually fraudulent, not just 'relatively high risk'.
Calibration is critical for the AUTO_APPROVE threshold: if the model outputs 0.10 for a claim, we need to be confident that represents a 10% fraud rate, not a poorly calibrated 'low score'.
SHAP Explainability
Compute SHAP (Shapley Additive exPlanations) values for the XGBoost prediction. SHAP values provide the theoretically correct attribution of each feature's contribution to the final score, based on cooperative game theory. Aggregate individual feature contributions back to agent level for the shap_contributions output.
SHAP provides the only attribution method with mathematical guarantees of consistency and local accuracy — other methods like LIME or simple feature importance can be misleading.
Action Recommendation & Narrative Generation
Map final_risk_score to recommended_action thresholds (configured per claim type and amount). Generate a verdict_narrative that summarises the top 3 SHAP contributors in plain language, designed to be read by a non-technical adjudicator in under 30 seconds.
Action thresholds are configurable and can be adjusted based on risk appetite. Higher claim amounts lower the AUTO_APPROVE threshold.
Thinking Tree
-
Root Question: What is the calibrated fraud probability for this claim, and which agents drove it?
-
Signal vector assembly from all agents
- All agents responded → full feature vector
- Some agents missing → impute with prior 0.15
-
XGBoost meta-learner inference
- Considers interaction features between agent pairs
- Considers claim metadata (amount, tenure, prior claims)
-
Bayesian posterior computation
- Models correlations between same-modality agents
- Outputs posterior mean + 95% credible interval
-
Final risk tier assignment
- < 0.25 → LOW → AUTO_APPROVE
- 0.25–0.60 → MEDIUM → STANDARD_REVIEW
- 0.60–0.85 → HIGH → PRIORITY_REVIEW
- > 0.85 → CRITICAL → INVESTIGATE
-
Signal vector assembly from all agents
Decision Tree
Are at least 3 agent results available?
Final calibrated risk score > 0.85?
Available agents show consensus risk > 0.70?
Risk score 0.60–0.85?
Risk score 0.25–0.60?
Risk score < 0.25 AND Bayesian 95% CI upper bound < 0.30?
ESCALATE — CRITICAL tier: Risk score > 85%. Refer to Special Investigation Unit immediately
FLAG — HIGH tier: Risk score 60–85%. Priority queue for senior adjudicator review
FLAG — MEDIUM tier: Risk score 25–60%. Standard adjudicator review required
PASS — LOW tier: Risk score < 25% with narrow CI. Eligible for auto-approval
INCONCLUSIVE — Insufficient agent data; default to manual review
Technical Design
Architecture
AGT-REA-002 is a synchronous FastAPI microservice. XGBoost inference runs on CPU in <10 ms. PyMC3 Bayesian inference uses pre-compiled MCMC chains from a variational inference warm-start, completing in ~200–500 ms. SHAP values are computed post-hoc using the TreeExplainer (optimised for tree-based models). The service is the final stage in the claim processing pipeline and is called after all other agents have returned results.
Components
| Component | Role | Technology |
|---|---|---|
| SignalVectorBuilder | Assembles feature vector from all agent outputs | pandas + custom feature engineering |
| XGBoostMetaLearner | Gradient boosting final risk probability | XGBoost 2.x ONNX or native |
| BayesianInferenceEngine | Hierarchical Bayesian model with agent correlation structure | PyMC3 / NUTS sampler |
| ProbabilityCalibrator | Platt scaling to calibrate raw probabilities | scikit-learn CalibratedClassifierCV |
| SHAPExplainer | Shapley value computation for XGBoost model | SHAP TreeExplainer |
| ActionRouter | Maps risk score to recommended action thresholds | Python rule engine + config |
| NarrativeGenerator | Generates human-readable verdict summary | Python f-string templates + SHAP top features |
| AuditLogger | Persists full decision record for compliance | PostgreSQL + SQLAlchemy |
Architecture Diagram
┌────────────────────────────────────┐
│ POST /aggregate │
│ (agent_results{} + claim_metadata)│
└────────────────┬───────────────────┘
│
▼
┌────────────────────────────────────┐
│ SignalVectorBuilder │
│ (raw scores + interaction features│
│ + claim metadata features) │
└────────────────┬───────────────────┘
│
┌──────┴──────┐
▼ ▼
┌──────────────┐ ┌──────────────────┐
│ XGBoost │ │ Bayesian │
│ Meta- │ │ Inference │
│ Learner │ │ Engine (PyMC3) │
└──────┬───────┘ └──────┬───────────┘
│ │
▼ │
┌──────────────┐ │
│ Probability │◄────────┘
│ Calibrator │ ensemble merge
└──────┬───────┘
│
▼
┌──────────────────────┐
│ SHAPExplainer │
│ (TreeExplainer) │
└──────────┬───────────┘
│
┌──────┴──────┐
▼ ▼
┌──────────┐ ┌──────────────┐
│ Action │ │ Narrative │
│ Router │ │ Generator │
└──────┬───┘ └──────┬───────┘
└──────┬───────┘
│
▼
┌──────────────────┐
│ AuditLogger │
└──────────────────┘
Data Flow