Pillow App Science Test: Apple Watch Sleep Review
Summary
TLDRIn this video, Rob, a postdoctoral scientist in Vienna, compares the sleep tracking capabilities of the Pillow app on the Apple Watch against a scientific EEG device. Over 10 nights, he records sleep stages and movements, analyzing the accuracy of sleep stage detection. Results show the Pillow app often confuses deep, light, and REM sleep stages, with only 13.9% of actual REM sleep correctly identified. Awake detection performs well, but the app sometimes registers sleep while the user is still awake or not in bed. Rob concludes that other Apple Watch sleep apps, like Sleep Cycle and AutoSleep, offer better tracking.
Takeaways
- 🧪 Rob, a postdoctoral scientist in Vienna, conducted a 10-night test comparing the Pillow app on Apple Watch to a scientific EEG device for sleep tracking.
- 📱 The Pillow app tracks sleep stages including deep sleep, light sleep, REM sleep, and wakefulness, and provides a sleep score and heart rate analysis.
- 🔬 Rob manually scored his sleep stages from EEG recordings and compared them to the Pillow app's data, noting discrepancies in sleep stage detection.
- 📈 The Pillow app showed poor correlation with the EEG device, particularly in detecting REM sleep, which was often misclassified as light or deep sleep.
- 🛌 Awake detection by the Pillow app was accurate, with only slight delays in detecting sleep onset and wake times.
- 🔄 The app seemed to follow hard-coded patterns, which may have contributed to the fragmented sleep stages observed.
- 📊 Statistical analysis revealed significant confusion between sleep stages by the Pillow app, with REM sleep particularly misidentified.
- 🆚 When compared to other Apple Watch sleep apps, the Pillow app was found to be less accurate and informative.
- 💤 Rob did not use the premium features of the Pillow app, which might offer additional insights but are behind a paywall.
- 📝 Limitations of the study include the small sample size, lack of a full polysomnography setup for comparison, and the subjective nature of manual sleep stage scoring.
Q & A
What is the main purpose of the Pillow app?
-The Pillow app is used in combination with the Apple Watch to track sleep stages, including deep sleep, light sleep, REM sleep, and awake time. It also provides a sleep score, heart rate analysis, and helps users take naps.
What scientific device did Rob use to compare the accuracy of the Pillow app?
-Rob used a portable scientific EEG device called the Hypnoline Z-Max, which is used by several of his colleagues in scientific studies, to compare the accuracy of the Pillow app.
How long did Rob wear both the Apple Watch with the Pillow app and the EEG device?
-Rob wore both the Apple Watch with the Pillow app and the EEG device for 10 nights to collect data for comparison.
What did Rob manually record and analyze from the EEG device and the Pillow app?
-Rob manually went through the recordings of the EEG device and scored each part of the night for different sleep stages. He also manually went through the Pillow app sleep stages and noted them down in an Excel table for analysis.
How did Rob evaluate the Pillow app's detection of when he fell asleep and woke up?
-Rob used an infrared camera recording to check his movements and see if the Pillow app correctly predicted when he was awake. He also evaluated the app's automatic detection of the moment he fell asleep.
What were the main issues Rob found with the Pillow app's sleep stage tracking?
-Rob found that the Pillow app had a poor match for deep sleep and REM sleep detection, often confusing these stages with light sleep. The app also had issues detecting sleep cycles accurately.
How did the Pillow app perform in detecting the times Rob was awake?
-The Pillow app performed quite well in detecting when Rob woke up during the night, although there was a slight delay in detecting the start of sleep.
Did Rob find any patterns in the Pillow app's sleep stage tracking?
-Yes, Rob noticed that the Pillow app's algorithm seemed to have hard-coded rules, such as always preceding deep sleep with light sleep and following any sleep stage with light or wake.
What were the statistical findings from Rob's comparison of the Pillow app and the EEG device?
-The Pillow app predicted almost double the amount of deep sleep, about half the amount of light sleep, and more than double the awake time compared to the EEG device. It also confused most REM sleep with deep and light sleep.
How does the Pillow app compare to other Apple Watch sleep apps according to Rob's tests?
-Rob found that other Apple Watch apps like AutoSleep and Sleep Cycle performed better in sleep tracking. He plans to make a dedicated video comparing different Apple Watch sleep apps soon.
What are some limitations Rob mentioned in his analysis of the Pillow app?
-Rob mentioned that he only tested the app on himself for 10 nights, which is a limited sample size. He also noted that a full scientific polysomnography setup would be needed for a complete sleep comparison, and he is not a professional sleep stage scorer, which could introduce some subjectivity.
Outlines
📊 Pillow App Sleep Tracking Accuracy Test
Rob, a postdoctoral scientist based in Vienna, Austria, conducted a test comparing the Pillow app on the Apple Watch to a scientific EEG device, the Hypnoline Z-Max, used in research projects. Over 10 nights, he wore both devices to compare their sleep stage tracking accuracy. The Pillow app tracks sleep stages like deep sleep, light sleep, REM sleep, and awake time, providing a sleep score and heart rate analysis. Rob manually analyzed the data from both devices and found discrepancies in sleep stage detection, particularly with REM sleep. The app also detected sleep onset and awake times with reasonable accuracy.
🔍 Detailed Analysis of Sleep Stage Tracking Discrepancies
In the second paragraph, Rob discusses the detailed analysis of the sleep data. He found that the Pillow app often confused deep sleep with light sleep and REM sleep. The app predicted more deep sleep than the EEG device and had difficulty accurately tracking REM sleep, often mistaking it for deep or light sleep. The sleep cycles, which should show a pattern of deep and light sleep followed by REM sleep, were not clearly visible in the Pillow app's data. Awake detection was mostly accurate, but there were slight delays in sleep onset detection. Rob also noticed that the app seemed to have hardcoded rules for sleep stage progression, which may contribute to the fragmented sleep stages observed.
📊 Statistical Overview and Comparison with Other Apps
The third paragraph summarizes the statistical findings from the sleep tracking experiment. Rob found significant discrepancies between the Pillow app's predictions and the EEG device's measurements, particularly for REM sleep, which was severely underrepresented in the app's tracking. He also compared the Pillow app's performance to other Apple Watch sleep apps, like Sleep Cycle and AutoSleep, finding that they performed better in tracking sleep stages. Rob concludes that he cannot recommend the Pillow app for accurate sleep stage tracking and suggests that there are better alternatives available. He also acknowledges the limitations of his study, such as testing the app only on himself for a limited duration and not using a full polysomnography setup for comparison.
Mindmap
Keywords
💡Postdoctoral Scientist
💡Pillow App
💡Apple Watch
💡EEG Device
💡Sleep Stages
💡REM Sleep
💡Infrared Camera
💡Sleep Score
💡Hypnoline Z-Max
💡Calibration
💡Polysomnography
Highlights
Postdoctoral scientist Rob tests the Pillow app on the Apple Watch against a scientific EEG device.
The experiment involved wearing both devices for 10 nights to compare sleep stage tracking results.
Pillow app tracks deep sleep, light sleep, REM sleep, and awake time, providing a sleep score and heart rate analysis.
The EEG device, Hypnoline Z-Max, is used in scientific studies and measures brain waves and muscle movements.
Rob manually scored sleep stages from the EEG recordings and compared them with the Pillow app data.
The Pillow app's deep sleep detection showed a partial match with the EEG device but overestimated deep sleep at later time points.
REM sleep detection by the Pillow app was poor, often misclassified as light sleep.
Sleep cycles were not accurately represented by the Pillow app, unlike the EEG device which showed clear cycles.
The Pillow app accurately detected when Rob woke up during the night.
There was a slight delay in detecting the moment Rob fell asleep, but overall it was quite accurate.
The Pillow app overestimated deep sleep and underestimated light sleep compared to the EEG device.
Only 13.9% of actual REM sleep was detected as REM sleep by the Pillow app.
Awake detection was the most accurate feature of the Pillow app.
The Pillow app sometimes detected sleep while Rob was not even in bed.
Other Apple Watch apps like AutoSleep and Sleep Cycle performed better in previous tests.
Rob suspects the app's poor performance might be due to hard-coded patterns in its algorithm.
The study's limitations include testing on a single participant and a small sample size.
Rob plans to build his own polysomnography device for more comprehensive sleep tracking tests.
Rob does not recommend the Pillow app for accurate sleep stage tracking based on his tests.
Transcripts
hello everyone
my name is rob and i'm a postdoctoral
scientist
based in vienna austria in this video i
test the pillow app on the apple watch
against this small scientific eeg device
that's being used in several research
projects
i wore both of these for 10 nights and i
will directly compare their results
as always i do not want to waste your
time so timestamps are in the
description below and also on the
timeline
[Music]
[Applause]
[Music]
for those of you who are not familiar
with the app called pillow
it's used in combination with the apple
watch to track your sleep
among other things the ab tracks the
sleep stages you go through each night
specifically it tracks deep sleep light
sleep rem sleep and awake
it also provides a sleep score does a
heart rate analysis and helps you take
naps
in this video i'll focus on analyzing
the accuracy of the sleep stage tracking
once i've collected many more eyes of
data i might have a look at the sleep
score accuracy as well
in order to do the sleep comparison i
wore the apple watch to bed for 10
nights
at the same time i also wore this
portable scientific eeg device
and i recorded myself using an infrared
camera
the eeg device measures brain waves and
muscle movements
it's called the hypnoline z-max and is
used by several of my colleagues in
scientific studies
if you're interested in this device for
scientific studies i will link it below
i manually went through the recording of
the eeg and scored each part of the
night for the different sleep stages
i also manually went through the pillow
app sleep stages and noted those down in
an excel
table so i could actually analyze them i
had to do this because the export i got
from the app did not
include the details that i needed in
addition to tracking sleep stages the
app automatically detects when you fall
asleep and when you wake up
so i'll also test how accurate this was
with the infrared recording i can
actually check what my movements were
like
and see if the pillow app correctly
predicts when i'm awake
let's first have a look at the 10
individual nights
where i compare the sleep stages of the
pillow app to the sleep stages i went
through according to the eeg device
i will go through the first few nights
in detail and i will just highlight the
most important parts of some of the
later nights
here we see the first night i recorded
on top you see the sleep stages as they
were recorded using the eeg device
on the horizontal axis we have the time
of the night and as you can see i went
to bed quite late a little bit after
midnight
on the vertical axis you have the
different sleep stages
deep sleep light sleep rem sleep and
awake
the sleep stages are plotted in the
order that are usually displayed in
research
on the bottom you can see a similar plot
but now for the sleep stages as they
were recorded using the pillow app
if we first look at deep sleep according
to the eeg device which is marked here
in purple
we do see there's a partial match
between the pillow app and the eeg
device
the first deep sleep section matches
pretty well however the pillow app
predicts much more deep sleep at later
time points also the last deep sleep
stage is recognized as rem sleep by the
pillow app
overall the match between the deep sleep
stages is rather poor
next if we look at ram sleep we see a
pretty bad match between the eeg device
and the pillow app
there's almost no overlap and rem sleep
according to the pillow app appears to
have been mostly light sleep in reality
to see the sleep cycles i added non-ram
sleep in blue
and again marked rem sleep in red each
sleep cycle starts with a combination of
deep sleep and light sleep together
called non-ram
and always ends in ram again non-rem is
marked in blue
and ram in red looking at the sleep
cycles there's quite bad overlap between
the pillow app and the eeg device
this is not unexpected since we already
saw problems with the detection of rem
sleep
which is vital to the detection of sleep
cycles looking just at the pillow app
data i would not have been able to see
any of my sleep cycles
next let's have a look at the times that
i was awake which i marked here in green
here the pillow app did perform quite
well it detected correctly when i woke
up during the night
when we evaluate the quality of the
automatic detection of the moment i fell
asleep
this was quite okay there was a slight
delay in the moment i fell asleep
but otherwise it was quite accurate now
let's have a look at the next night
this was a night where i woke up quite a
bit as you can see here on top in the
eeg plot
if we first look at deep sleep again we
only see a partial match between the eeg
device on top
and the pillow app on the bottom pillow
shows many very short deep sleep
segments which actually appear more
frequent at the end of the night
normally to put it very generally deep
sleep should decrease at the end of the
night
whereas ramp sleep should increase which
is not what we see here
again also for rem sleep we see very
little overlap
most rem sleep actually appears to be
tracked by the pillow app as deep sleep
and light sleep
this also means that the sleep cycles
are not really visible in the graph
produced by the pillow app
just viewing the pillow app output i
would honestly not be able to see any of
my sleep cycles
awake detection was quite okay again it
appears to have detected the longer
awake moments
and the others were marked mostly as
light sleep if we look at sleep start
detection
again there was a slight delay in
detecting my start of sleep according to
the pillow app
but no major problems the wake up time
detection was a bit worse with it not
detecting the final part of my night
if we look at the next night deep sleep
again seems to only have a partial match
with way too much deep sleep predicted
by the pillow app
rem sleep again was mostly predicted as
light sleep and deep sleep
which also means that the sleep cycles
are not really visible in the pillow
app awake detection was again okay
and we also see the same slight delay
and sleep start we saw before
but overall detecting the moment i fell
asleep has been of ok quality so far
i will not go through all the nights for
the final nights i will just show the
most important parts
for this knight here we again see pretty
poor deep sleep detection as is marked
here in blue
however the awake detection is okay as
you see in green
again there's some delay in detecting
sleep onset which we saw more often
however in the next few nights i
actually saw the opposite
where the pillow app detected sleep when
i was still awake
let me show you what that looked like
here we have the first example where the
pillow app detected some light sleep
when i was still fully awake and not
even in bed
if we look at the next night we see that
it even detected some deep sleep in a
moment where i was still working on my
computer
interestingly the next night is actually
the opposite where i'd had trouble
detecting the moment i fell asleep
and it predicted this at a much later
time than i actually fell asleep
however for the last two nights i want
to show you these fake sleep detections
were even worse
as you can see here especially for this
last night here
here you can see that the watch
basically detected the equivalent of a
whole night's sleep
before i even went to bed one thing i
noticed while looking at these sleep
stages is that the algorithm appears to
have some rules hard-coded into it
let me show you what i mean here you can
see one of the knights tracked with the
pillow app
first of all what i noticed that if it
tracks deep sleep this is always
preceded by light sleep
so there always needs to be light sleep
before it will track deep sleep
now as a second rule if any rem sleep
was tracked before that there was always
deep sleep
finally light sleep and wake seem to
follow any sleep stage
however having these strict rules
encoded in the algorithm does have
consequences
and my knights as tracked by the pillow
app seem to be basically a combination
of two patterns
the first pattern is light sleep
followed by deep sleep
followed by rem sleep and the second
pattern is light sleep followed by deep
sleep
if we look at this knight we can see
that it's basically a combination of
just these two patterns
and periods of awake here i marked the
first pattern in purple
and as you can see it occurs eight times
now here i also mark the second pattern
in green
and this occurs six times it does make
me wonder if the fact that these
patterns seem to be hard-coded in the
algorithm
is actually the cause of the poor
performance we've seen so far
it does seem to increase the likelihood
of having small fragmented sleep stages
which is one of the problems of the
pillow app
now that we've visually inspected the
individual knights what does it look
like in terms of statistics
based on what we saw in the individual
nights i would expect
a lot of confusion between most sleep
stages
i expect especially rem sleep to be
often detected as light sleep
though awake detection appears to be
quite good let's take a look
first let's look at the total percentage
of each sleep stage that the eeg
and pillow have predicted overall we can
see that these percentages are pretty
far
off the pillow app predicts almost
double the amount of deep sleep i had
about half the amount of light sleep and
more than double the awake time
this is very much in line with what we
saw for the individual plots before
we can actually check which sleep stages
are mostly confused by the pillow app
and that's what i displayed here on top
we have the sleeve stages according to
the eeg device
and on the left we have the sleep stages
according to the pillow app
now each column here sums to 100
meaning that we can see what percentage
of each of the actual sleep stages
was recorded as each sleep stage by the
pillow app
first we indeed see that what was
actually deep sleep is basically tracked
as an equal amount of deep sleep
but also light sleep and rem sleep by
the pillow app
this is much worse than many of the
other devices and apps i've tested
the only good thing is that deep sleep
is almost never confused with the wake
time
next if we look at light sleep we indeed
see that this was mostly detected as
light sleep
though almost the same amount was
predicted as deep sleep and rem sleep
rem sleep is even more problematic only
13.9 percent of what was actually rem
sleep
was predicted as rem sleep most of it
was actually tracked as deep sleep
and light sleep by the pillow app
finally looking at awake time
this is the most positive thing about
these results most awake time was indeed
detected as awake
and what was confused was tracked as
light sleep
so far the pillow app does not yet look
very promising for me
but before i draw my final conclusions i
want to put a pillow app in the context
of two other apple watch apps that i
looked at in previous videos
the sleep cycle app and the autosleep
app here i plotted the results from
several apps at once
on top we have the eeg device below that
we have the sleep cycle app
the third app is the autosleep app and
on the bottom we have the pillow app
if we first look at deep sleep according
to the eeg
we see that sleep cycle indeed shows
some deeper sleep around these areas
also to some degree autosleep has some
deeper sleep here
however for pillow it's really a mix of
light sleep deep sleep and rem sleep
most interestingly if we look at sleep
cycles we can clearly see those depicted
in the app called sleep cycles
we see higher values when in rem and
lower values when in non-rem
if we look at auto sleep this is not as
well represented
however we can still very roughly see
the sleep cycles
and as many people commented if i
recalibrated the app it might look even
better
however if we look at pillow i don't
really see any of the sleep cycles
out of these three apps i would judge
pillow to be the least informative for
me
however i will make a dedicated more
detailed video comparing different apple
watch sleep apps soon
so to summarize deep sleep light sleep
and ram sleep are very often confused by
the pillow app
ram sleep is the most problematic only
13.9 percent of what was actually ram
sleep
was also predicted as rem sleep by the
pillow app
most of it was actually tracked as deep
sleep and light sleep
awake detection was quite okay though
additionally on several occasions the
pillow app detected sleep while i was
not even in bed yet
other apps for the apple watch like auto
sleep but especially sleep cycle
performed much better in sleep tracking
at least in my previous tests
overall i cannot really recommend the
pillow app for the tracking of your
sleep stages
there are in my opinion better apps
available on the apple watch
and also many other fitness trackers
have better sleep tracking capabilities
as i mentioned i wonder if the poor
performance is partially due to the
hard-coded patterns that seem to be
included in the algorithm
there are a few things i should mention
before i finish
first of all i entered all the
information that the pillow app asked of
me
but i did not tweak my results in the
morning you can
re-analyze the night by tweaking the
awake time but i decided not to do this
since this basically means that i'm
adding subjective data to my tracking
i want a sleep tracking device to give
me objective tracking of my sleep
since i want to find out if these
patterns match my subjective feelings
second when i released my video on the
autosleep app
many people commented that i could
improve results by tweaking the
sensitivity
i really appreciated that input so if
you have any more information or
thoughts on the pillow app please leave
it in the comments below
finally i did not use any of the premium
features of the app in my analysis
a big downside of the app is that you
can only see your previous night
if you do not pay the premium
subscription
i should mention some of the limitations
of the data that i showed here
first of all i just tested the app on me
and just for 10 nights
a better study would include multiple
participants of different demographic
backgrounds
second to do a full sleep comparison it
would be good to also test the apple
watch apps against a full scientific
polysomnography setup
i'm actually building my own
polysomnography device using
open pci components as we speak that way
i'll not have to rely on sleep labs for
my testing which is especially difficult
in these times of corona
finally i'm not a professional sleep
stage scorer
i think i did a decent job but for some
parts of the night i might have been a
little bit off
in my videos i do scientific tests on
different devices like the auraing the
fitbit
and the scan watch and in the end i hope
to use tracking to improve my life
so if you like that subject and like
this video consider subscribing to my
channel
and also consider giving it a thumbs up
because it makes it easier for other
people to find my videos
thank you so much for watching and see
you in the next video
تصفح المزيد من مقاطع الفيديو ذات الصلة
Samsung Galaxy Watch 6 : Full SCIENTIFIC Review
Apple Watch SE 2022 : Full Scientific Review
A walk through the stages of sleep | Sleeping with Science, a TED series
Huawei Watch Fit 3: Almost an Apple Watch (Scientific Review)
Why the Top Students Never Wake Up Early to Study
Apple Watch Series 10 Review: This is It?
5.0 / 5 (0 votes)