Development of AI-XR Risk Framework for Safety-Critical Domains

Current AI- and XR-based systems in safety-critical domains lack a unified, systematic framework to identify, assess, and prioritize human-centered risks during design and deployment.

This gap leads to inconsistent risk evaluation and delayed mitigation, increasing the likelihood of human error, safety incidents, and inefficient technology integration.

Methodology

Decision Points Identification

Systematic mapping of task branching points where AI or XR output affects technician decision-making.

SHERPA

Structured Human Error and Reliability Predictive Analysis — systematic error taxonomy for maintenance workflows.

Failure Modes and Effect Analysis

FMEA applied to AI and XR subsystems, identifying failure modes and their impact on task performance and safety.

Example of Decision Points

Do we have all the tools and materials required for the tire inspection task?

Predict task disruption for not having a required toolLevel 3

Assess the impact of safety for not having the right tool

Match tools to task-specific needsLevel 2

Comprehend FAA-specific guidelines

Interpretation of the job card

Job card / ChecklistLevel 1

Aircraft Type

List of STD. tools

Failure Modes and Effect Analysis (FMEA)

AI Subsystem56%Failures

14 of 25 failure modes

Human Interaction32%Failures

8 of 25 failure modes

XR Hardware12%Failures

3 of 25 failure modes

Failure Modes by SA Level and Source

L1 Perception

L2 Comprehension

L3 Projection

Human

Key Risk Themes

THEME 01

Automation bias is the dominant human risk

All 8 human failure modes are omission or commission errors — technicians skip independent checks because AI overlays appear authoritative. No forcing function exists to require manual verification.

Strongest at L2 + L3

THEME 02

AI comprehension failures are the largest cluster

8 of 14 AI failures sit at L2: wrong tolerance bands, hallucinated AMM values, missing load corrections, non-compliant procedures. The AI interprets confidently from incomplete rule bases.

Concentrated at L2

THEME 03

XR degrades the very perception it should enhance

All 3 XR failures hit L1: display resolution too low to read fine print, motion blur on gauge needles, and gloved-hand / noisy-hangar input failures. The technician loses independent verification ability.

Isolated to L1

THEME 04

Projection capability is underdeveloped

Only 1 AI failure at L3 — and it's a gap, not a malfunction: no degradation model, no service-life trend, no persistent tire history. The system can't forecast, so humans inherit all projection risk.

L3 gap

Stage 3: MRAG Back to Overview