Stage 03 — MRAG Pipeline

Research Summary

Reducing Manual Search Burden with a Multimodal RAG for the Cessna 172 Maintenance Manual

A study on building a multimodal retrieval-augmented generation pipeline that helps aircraft technicians find maintenance information faster — across text, diagrams, and tables.


The Problem

Why Technicians Skip the Manual

Aircraft technicians spend 20–40% of their maintenance time just searching for information. When it's hard to find, they sometimes skip the manual entirely — risking safety.

Time-Consuming Search

Navigating hundreds of pages across chapters, sub-chapters, and ATA codes takes significant time under strict schedules.

Keyword Search Falls Short

PDF keyword search struggles with varying terminology across manufacturers and returns overwhelming, unfiltered results.

Text-Only RAGs Miss Visuals

Existing RAG solutions only retrieve text — but technicians rely heavily on diagrams, tables, and figures for maintenance tasks.

The Solution

MRAG Pipeline for C172 Maintenance Manual

A two-stage pipeline: a Multimodal Manual Retriever (MMR) finds the right pages, then a Vision-Language Model generates answers.

1

Natural Language Query

Technician asks a question in plain English

2

MMR (ColQwen2)

Retrieves top-5 most relevant manual pages as images

3

VLM (GPT-4.1)

Reads retrieved pages and generates a grounded answer

4

Answer + Sources

Technician gets a response with traceable manual pages

Key Results

How Well Does It Work?

Evaluated on 400 synthetic test questions across 4 chapter groups and 4 question types, validated by certified A&P technicians.

Retrieval Accuracy92.9%Recall @ top-5
Answer Quality87.2%BERT F-1 Similarity
Response Time~17sRetrieval + Generation
Operational Cost$0.009Per Question (GPT-4.1)

Recall Performance by Question Type at Different Top-k Settings

Procedural
69%k=1
86.5%k=3
91.5%k=5
Diagram
50.8%k=1
77.5%k=3
85.5%k=5
Caution / Safety
77%k=1
88%k=3
95.5%k=5
Specification
75.5%k=1
94%k=3
99%k=5
Recall @ k=1
Recall @ k=3
Recall @ k=5
Limitations

Known Challenges

01

Page Disconnection

ColPali embeds each page independently — it cannot connect information that spans across consecutive pages, fragmenting continuous procedures.

02

Diagram Retrieval Gaps

Figures with minimal text labels are harder to retrieve. The manual's indirect figure references (e.g., 'see Figure 201') add further ambiguity.

03

Cascading Errors

When the MMR retrieves wrong pages, the VLM generates incorrect answers — errors propagate through the pipeline.

04

Synthetic Evaluation Only

The study used AI-generated test questions (validated by experts) — no real-world user study with actual technicians was conducted.

Key Takeaway

Multimodal context in under 17 seconds.

The MRAG pipeline for the Cessna 172 Maintenance Manual shows strong potential to reduce the 20–40% of maintenance time technicians spend searching for information — delivering relevant multimodal content in under 17 seconds at less than a penny per query.

Low Interaction CostFast PinpointingVisual Context SupportMinimal Navigation Depth

NSF Award #2326187 · Study Summary · Cessna 172 MRAG Research