As some of you may know, I have spent nearly every weekday since graduating from Berkeley helping people sleep better at a startup called Somnee. Somnee’s flagship service is a 15-minute neural stimulation session before bedtime that gently nudges the brain toward sleep using personalized electrical stimulation.

But what happens after those 15 minutes is what I’ll talk about today.

For most of modern history, if you wanted to truly understand what was happening in your sleep — your brain waves, your sleep stages, your awakenings — you had to go to a clinical sleep lab, get wired up with electrodes, and spend the night under observation. Today, many of those measurements (or at least reasonable approximations of them) are available in consumer devices you can wear on your finger, wrist, forehead, or even place under your mattress.

So I wanted to step back and look at the landscape. How accurate are the most popular sleep trackers — Oura, WHOOP, Apple Watch, Fitbit, Garmin, and others? How do EEG-based devices compare? And where does Somnee’s tracking sit in that ecosystem?

Quick note: I’m not writing this on behalf of Somnee. These are my own views based on publicly available research and peer-reviewed validation studies.

The Basics

Let’s start with a ten-second crash course on what a “good night of sleep” actually is — because what we measure shapes what we optimize.

A good night of sleep is not a steady plunge into “deep” and then back out. Instead, your brain cycles through stages roughly every 90 minutes, four to six times per night. You begin in lighter non-REM sleep (N1 and N2), descend into deep slow-wave sleep (N3), then move into REM sleep — the dream-heavy stage — before cycling back again.

Early in the night, deep sleep dominates. Later in the night, REM expands. Across seven to nine hours, healthy adults typically spend about half their night in N2, 15–25% in deep sleep (N3), and 20–25% in REM. N1 is brief — the on-ramp.

Deep sleep is particularly important for physical recovery, immune function, hormone release, and metabolic clearance in the brain. REM appears central to emotional regulation, memory integration, and creativity. But the key takeaway is this: you don’t micromanage stages. You “earn” healthy stage distribution by protecting total sleep time, consistency, and minimizing fragmentation (I say as I write this way past my bedtime).

This context matters because consumer sleep trackers promise to measure precisely these stages. The question is: how well do they actually do it?

The Gold Standard

The gold standard remains polysomnography (PSG) — the full sleep lab setup. PSG measures EEG (brain waves), EOG (eye movements), EMG (muscle tone), ECG (heart rhythm), breathing airflow, respiratory effort, oxygen saturation, and more. Sleep is scored in 30-second epochs by trained technicians following standardized AASM rules. If you want to definitively identify N1 versus N2 versus N3 versus REM, PSG is still the benchmark.

The Consumer Landscape

Consumer devices approximate this in different ways.

The most common approach, used by Oura, WHOOP, Apple Watch, Fitbit, Garmin, Samsung, and similar wearables, combines accelerometry (movement) and photoplethysmography (PPG), which measures pulse waveforms from the skin. Movement helps determine sleep versus wake. Heart rate and heart rate variability shift across sleep stages, providing additional signal for machine learning models to infer REM versus non-REM and, sometimes, deep sleep.

These devices are generally very good at detecting sleep versus wake. Multiple validation studies show two-stage (sleep/wake) accuracy around 85–93% compared to PSG for leading devices like Oura and Apple Watch. Sensitivity (detecting actual sleep) is high — often above 90%. Specificity (detecting wake) is lower — often 50–75%. In plain English: they are good at knowing when you’re asleep, but they often miss quiet wakefulness and overestimate sleep efficiency.

When it comes to stage classification, performance drops. Deep sleep is commonly under- or over-estimated depending on the algorithm version. REM detection is better in some devices (Apple and Oura perform relatively well), but agreement with PSG is still moderate, not perfect. Cohen’s kappa values for stage agreement in wrist wearables typically range from ~0.4 to 0.6 — respectable for consumer tech, but not clinical-grade.

EEG-based consumer devices sit a tier above. Products like Dreem (now discontinued but well studied) and Muse S place electrodes on the forehead or around the ears and directly record brain activity. Because they measure actual slow waves, sleep spindles, and REM-associated patterns, their staging accuracy approaches 80–90% agreement with PSG in published studies. That’s meaningfully closer to lab-grade sleep scoring.

However, these devices still lack full PSG montages — no eye leads, limited muscle channels — so they are not identical to a sleep lab. They are, though, far more physiologically grounded than PPG-only devices.

Contactless systems — under-mattress mats, radar units like Nest Hub, or smart beds like Eight Sleep — measure motion and respiration through ballistocardiography or radar. They can reliably detect when someone is in bed and broadly asleep, and some are validated for moderate-to-severe sleep apnea screening. But they struggle with precise staging, shared beds, and environmental noise.

Smartphone-only sleep apps are at the bottom of the hierarchy. Without direct physiological measurement, their staging accuracy is poor and should be treated as very rough approximations of bedtime and wake time.

Where does Somnee sit?

Somnee occupies a distinct category within the sleep technology landscape because it combines high-resolution physiological tracking with active neuromodulation. Unlike wrist- or ring-based devices that infer sleep stages indirectly through motion and cardiovascular signals, Somnee incorporates frontal EEG sensors capable of directly measuring cortical activity. This allows it to detect hallmark features of sleep architecture — slow-wave activity, sleep spindles, and REM-associated patterns — with a level of physiological specificity that aligns more closely with consumer EEG systems than with PPG-based wearables.

From a tracking standpoint, this matters. EEG provides direct access to the neural oscillations that define sleep stages. While full polysomnography includes additional channels (EOG, EMG, respiratory effort, airflow), frontal EEG alone captures the core dynamics necessary for distinguishing wake, N2, N3 (slow-wave sleep), and REM with substantially higher fidelity than movement-plus-PPG systems. In this respect, Somnee’s measurement capability sits in the same technical tier as other validated EEG headbands, which have demonstrated staging accuracies in the ~80–90% range versus PSG in controlled studies.

In comparative terms, wrist-based devices are optimized for scalable, unobtrusive longitudinal trend monitoring. Consumer EEG headbands are optimized for higher-fidelity staging.

Popularity does not = accuracy.

There’s also an important distinction between popularity and accuracy. Apple, Samsung, and Fitbit dominate in unit sales. Oura and WHOOP have strong brand recognition in performance and longevity circles. But the most widely used devices are not necessarily the most scientifically accurate.

A useful mental framework looks something like this:

At the top sits full PSG. Just below it are consumer EEG headbands like Somnee. Then come best-in-class wearables (Oura, Apple, recent Fitbit). Below that are mid-tier wearables and contactless systems. At the bottom are smartphone-only apps.

For most people, best-in-class wearables are somewhat accurate enough for trend monitoring — tracking total sleep time, sleep consistency, and broad shifts over weeks and months. They are not reliable enough for depth: diagnosing insomnia, quantifying exact minutes of deep sleep, or replacing medical evaluation.

That nuance matters.

A tracker telling you that you had 47 minutes of deep sleep should be interpreted as a directional estimate, not a precise physiological truth. But if your average deep sleep drops sharply after a week of heavy drinking or stress, that trend is meaningful. The signal lives in patterns, not single nights.

What this means

Zooming out, what strikes me most is how extraordinary this moment is.

We now have near-clinical sleep science — multi-sensor physiological tracking, real-time EEG, algorithmic sleep staging — available commercially to millions of people. Ten years ago, this would have required a hospital visit. Today it fits on your finger or forehead.

This democratization of sleep measurement will likely accelerate two parallel trends. First, individuals will become far more aware of sleep as a foundational health behavior — quantified, visualized, and optimized. Second, we will see tighter integration between measurement and intervention: closed-loop systems that detect, stimulate, adapt, and personalize in real time.

The future of sleep tech is not just tracking. It’s adaptive, physiology-informed modulation.

We are at the early stages of bringing lab-grade neuroscience into the bedroom — and if done responsibly, with scientific rigor and thoughtful validation, that could meaningfully shift how we think about recovery, mental health, performance, and long-term brain health.

And that’s pretty amazing.

Until next time,

—Daniel

The Neurotech Napkin