How accurate are my sleep metrics? Here’s the ultimate guide to sleep monitoring with wearables

Dr. Adam Bataineh, MD

Chief Medical Officer

Reviewer

Quick read

  • Wearables are not as accurate as lab tests yet but they can still be useful
  • Wearables are accurate enough when measuring basic sleep metrics such as time spent asleep and time awake
  • When interpreting sleep stage metrics such as deep and REM sleep you should focus more on the trend rather than the number
  • Always correlate your data with other related metrics and more importantly, how you feel.


How accurate are my wearable sleep metrics?

The gold standard for sleep tracking is a lab test called polysomnography. In order to assess the accuracy of sleep metrics obtained from a wearable device, they are compared to that gold standard. There have been a few studies comparing different wearables to polysomnography. In this article I focus primarily on Whoop and Oura as they are the most common devices owned by Span users. 


Both Whoop and oura give reliable measurements of simple sleep measurement such as total sleep and wake time. Whoop was shown to give a small overestimation of total sleep time on average but did not vary significantly. Oura was mostly accurate in estimating the time it takes to fall asleep (sleep onset latency or SOL), how long you spend awake after falling asleep (wakefulness after sleep onset or WASO) and total sleep time.


When it comes to more detailed metrics the accuracy of these devices starts to get less reliable. Whoop was the more accurate of the two at 68% similarity to polysomnography when measuring deep sleep and 70% for REM sleep. However for estimating minutes awake after sleep onset it was only similar in around 51%. Oura was less accurate for deep sleep with an agreement of only 51% with polysomnography and 61% for REM sleep.


From these results, we can confidently say that in general, both devices are accurate enough in measuring basic sleep metrics such as time spent asleep and time awake. Both devices however, are significantly flawed when measuring sleep stages such as deep sleep and REM sleep, with Whoop being the better of the two. This means that when looking at our data we need to keep in mind that there will be a significant amount of bias.


This doesn’t mean we can’t use the data. It means we have to put them in the correct context.


The next question is, is this bias constant? If my deep sleep is off by a certain percentage, is that percentage fixed every time? If yes, then there is utility in tracking the trend instead of the specific number.


We actually don’t know the answer for sure. There are studies showing that using these devices to track sleep and gather feedback does improve sleep quality. This suggests that the data we are getting is meaningful enough to make decisions. When Matthew Walker, a renowned sleep expert was asked by Peter Attia about this he seemed to concur. From my experience working with users of Span, I have seen meaningful changes in sleep metrics after implementing certain lifestyle changes. This suggests to me that there is a lot of value in the data when taken in context.


"Sleep is so significant to our health , you really can't sacrifice non-REM deep sleep or REM sleep without damage"

– Mathew Walker, author of Why We Sleep


How to interpret my data

The whole point of continuous monitoring is to show us continuous data that changes over time. This is what is useful. In a sense, we forgo accuracy of a single snapshot in favour of more data that shows change over time.


The main things to consider when understanding your data are the following.


The trend

Notice if the trend is in the right or wrong direction. There are two types of trends, short term and long term. Short term trends (one day to another) can give you feedback on a specific behaviour you may have undergone the day before such as drinking alcohol. Long term trends (7 days or more) can give you feedback on longer term lifestyle changes such as starting a new job or moving homes.


Correlation with other related markers

It is rare in medicine to make decisions based on one specific metric or test no matter how accurate it may be. It is always best to correlate metrics with other related markers of health. For example, if my total sleep time is getting worse, I would expect my heart rate variability to go down and possibly a rise in my resting heart rate.


How you feel

More importantly, correlate the data with how you feel. If your device is telling you that you are getting enough high quality sleep but you still feel like your sleep is inadequate you should listen to your body first. At the end of the day the goal of wearables is to improve our health and help us feel better not just give us numbers on a screen.

A validation study of the WHOOP strap against polysomnography to assess sleep
https://pubmed.ncbi.nlm.nih.gov/32713257/

The Sleep of the Ring: Comparison of the ŌURA Sleep Tracker Against Polysomnography
https://www.tandfonline.com/doi/full/10.1080/15402002.2017.1300587?src=recsys&

March 4, 2021
Explore Span in...
Span app
Open
Website
Continue