← Writings
April 7, 20266 min read

Your Smartwatch Knows You're Overwhelmed Before You Do

Heart rate variability can detect cognitive overload before a person consciously feels it. Here's what that means for how we design AI systems.


A participant once told me a task was easy. His heart said something different.

We were running a cognitive workload study using ECG data from a smartwatch. The participant had just completed a mentally demanding sequence - tracking multiple variables, making decisions under time pressure. When I asked him to rate the difficulty, he gave it a 3 out of 10. "Pretty straightforward," he said.

His heart rate variability data from that same window told a different story. The markers we were analyzing - specifically, a metric called RMSSD, which reflects beat-to-beat variation in the intervals between heartbeats - had dropped significantly during the task. That drop is a physiological signature of elevated cognitive load. His nervous system was working harder than his self-report suggested.

This is the core finding that made me interested in physiological measures for cognitive workload in the first place. What people say they experience and what their bodies are doing are not always the same thing. And in contexts where workload matters - aviation, healthcare, air traffic control - that gap is worth measuring.

Why Cognitive Workload Measurement Matters for AI Design

Cognitive workload is not just an academic concept. It is a direct predictor of performance, error rate, and safety in high-stakes environments. A controller managing an overloaded sector is more likely to miss a conflict. A technician whose working memory is saturated is more likely to skip a step. A surgeon under excessive cognitive load makes decisions differently than one who isn't.

For decades, the standard tool for measuring cognitive workload in research was the NASA Task Load Index - NASA-TLX. It's a subjective rating scale: after completing a task, you rate your mental demand, physical demand, temporal demand, performance, effort, and frustration. It's reliable, widely validated, and extraordinarily useful for comparing conditions across participants.

It also only tells you what someone thinks their workload was after the fact.

This is fine for many research questions. But it has limits. People are not always accurate reporters of their own cognitive states - particularly when they're busy, stressed, or in a professional context where admitting difficulty feels problematic. And it can't tell you anything about dynamic fluctuations within a task - the moments when workload spikes and then recovers.

ECG-based measures - specifically heart rate variability - can.

How HRV Detects Cognitive Load

The heart does not beat like a metronome. Even at rest, the intervals between heartbeats vary slightly from one beat to the next. This variation - heart rate variability - reflects the activity of the autonomic nervous system, which regulates the body's response to demands and stressors.

link to the article

When cognitive load increases, the sympathetic nervous system becomes more active. One measurable consequence is a reduction in HRV - the beat-to-beat intervals become more regular, less variable. This isn't a perfect signal, but it's a real one, and it has been validated across a substantial body of research in human factors, medicine, and cognitive science.

The challenge is extracting this signal cleanly from wrist-based smartwatch ECG data. Unlike clinical ECG devices with multiple electrodes, a single-lead smartwatch ECG picks up a noisy signal. The Pan-Tompkins algorithm - first published in 1985 and still the standard approach - processes the raw waveform to identify R-peaks (the sharp upward spikes in each heartbeat cycle) and compute the R-R intervals from which HRV metrics are derived.

In the study I published at ECCE 2025, we applied this pipeline to smartwatch ECG data to assess cognitive workload during mental task conditions. The process involves: bandpass filtering to isolate the frequency range where cardiac signals live; derivative-based enhancement to sharpen the peaks; squaring and moving-window integration to amplify the signal; and adaptive thresholding to identify true R-peaks while rejecting noise and artifacts.

Getting this right matters. A missed beat or a falsely detected one changes the HRV metrics and can produce spurious findings. The preprocessing pipeline is where most of the analytical work happens - and where most of the differences between studies come from.

What Physiological Measures Can Tell You (And What They Can't)

HRV-based workload measurement gives you something NASA-TLX cannot: a continuous, objective signal that doesn't require interrupting the task or relying on the participant's self-perception. In a study where workload fluctuates - because the task demands vary over time, or because different design conditions create different cognitive loads - this is enormously valuable.

But it comes with real limitations that researchers and practitioners need to understand.

Individual differences are large. HRV baselines vary substantially between people. A decrease that represents high workload for one participant may be within normal range for another. This means you need individual baseline recordings and you need to be cautious about population-level thresholds.

Physical activity contaminates the signal. HRV responds to physical as well as cognitive demands. In a purely sedentary lab study, this is manageable. In a field study where technicians are physically moving around an aircraft, the signal is much harder to interpret. Separating cognitive from physical load in active work environments remains an open problem.

It measures arousal, not always workload specifically. HRV reflects autonomic nervous system activity broadly - which includes emotional states, stress responses, and physical demands, not only cognitive load. High emotional arousal (excitement, anxiety) can produce HRV patterns similar to high cognitive load. Context matters for interpretation.

The signal needs time. HRV metrics calculated over 5-minute windows are more stable and valid than those over shorter windows. This limits temporal resolution - you can detect that workload was elevated over a 5-minute period, but not exactly when within that period the peak occurred.

What This Means for AI System Design

Here's where this gets interesting for anyone building AI systems for high-stakes environments.

If you can measure cognitive workload in real time - imperfectly but reliably - you can design AI systems that respond to it. A controller whose HRV suggests elevated load could receive a different level of AI support than one who is operating comfortably within their capacity. An AI system that tracks operator state and adjusts its behavior accordingly - offering more proactive alerts when a human is overloaded, reducing information density when working memory is near saturation - is a meaningfully different kind of system than one that delivers the same outputs regardless of the human's current state.

This concept goes by several names: adaptive automation, state-aware AI, human-in-the-loop systems with closed-loop feedback. The research case for it is strong. The implementation is genuinely hard - because real-time physiological monitoring introduces its own practical problems, and because the ethics of monitoring workers' physiological states during professional activities requires careful handling.

But the direction matters. AI systems in safety-critical environments that treat the human as a fixed component - inputs in, decisions out - are missing something fundamental about how humans work. Workload is dynamic. Attention fluctuates. The same person at different points in a shift is not the same operator. AI systems designed with this reality in mind will be safer, more effective, and more appropriately humble about what human cognition can be asked to do.

The participant who rated his task as easy had already finished it by the time I showed him the HRV data. He looked at the graph for a moment. Then he said: "Huh. I guess it was harder than I thought."

That's the gap this method is designed to close.

This post extends work published at the 2024 International Conference on Electrical, Computer and Communication Engineering (ECCE ICHI). More at mdrashi.com/research.