oura-hrv-activity-sleep.ipynb
A notebook adapted from Herman de Vries' blogpost and code on using Oura data to explore the links between HRV, physical activity and sleep
Please log in to comment.from data from oura ring 2022 on like 7th or 8th cell: Error in runSum(x, n): n = 7 is outside valid range: [1, 6] Traceback: 1. df %>% rename
This notebook is based on the work by Herman de Vries, which he published in a very interesting blog post. Herman also presented his work at one of the Open Humans self-research chats. Following the chat Herman was so kind to publish the code he used for analyzing his Oura data as an HTML render of his R code.
If you want to learn more about what Herman learned and how he approached his learning from his Oura data you should definitely read his blog post, as this code is not fully annotated yet.
I took Herman's code and copy & pasted most of it right into the Exploratory framework of Open Humans. I only made minor changes to the preamble of the notebook in which the data is being loaded. Herman used a CSV export from Oura, while this notebook makes use of the JSON file of your historic data that's stored in Open Humans.
What this means: If you want to run this notebook right away on your own Oura data you have to import it into Open Humans by going to https://oura.openhumans.org/. Once your data is stored in Open Humans the notebook should run from start to finish without any issues.
Below is the boring code that loads your personal Oura data and puts it into a format that's in line with Herman's original code. If you're not interested in such things you can straight jump to the next headline.
Do the physical activity variables predict TST?
This notebook is based on the work by Herman de Vries, which he published in a very interesting blog post. Herman also presented his work at one of the Open Humans self-research chats. Following the chat Herman was so kind to publish the code he used for analyzing his Oura data as an HTML render of his R code.
If you want to learn more about what Herman learned and how he approached his learning from his Oura data you should definitely read his blog post, as this code is not fully annotated yet.
I took Herman's code and copy & pasted most of it right into the Exploratory framework of Open Humans. I only made minor changes to the preamble of the notebook in which the data is being loaded. Herman used a CSV export from Oura, while this notebook makes use of the JSON file of your historic data that's stored in Open Humans.
What this means: If you want to run this notebook right away on your own Oura data you have to import it into Open Humans by going to https://oura.openhumans.org/. Once your data is stored in Open Humans the notebook should run from start to finish without any issues.
Below is the boring code that loads your personal Oura data and puts it into a format that's in line with Herman's original code. If you're not interested in such things you can straight jump to the next headline.
library(tidyverse) # Includes the packages dplyr and ggplot2 that I'll be using a lot.
library(TTR) # Used to calculate the rolling averages with the SMA() function.
library('httr')
library('jsonlite')
options(repr.plot.width=15, repr.plot.height=8)
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("https://www.openhumans.org/api/direct-sharing/project/exchange-member/?access_token=",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")
for (data_source in user$data){
if (data_source$source == "direct-sharing-184"){
fitbit_url <- data_source$download_url
}
}
temp <- tempfile()
download.file(fitbit_url,temp,method='wget')
json_data <- fromJSON(txt=temp)
sleep <- json_data$sleep
activity <- json_data$activity
readiness <- json_data$readiness
df <- merge(merge(sleep, readiness, by='summary_date'),activity,by='summary_date')
df_oura <- df %>%
rename(
tbt_min = `duration`,
tst_min = `total.x`,
awake_min = `awake`,
rem_min = `rem`,
light_min = `light`,
deep_min = `deep`,
restless = `restless`,
se = `efficiency`,
sol_min = `onset_latency`,
start = `bedtime_start`,
end = `bedtime_start`,
rhr_avg = `hr_average`,
rhr_low = `hr_lowest`,
hrv = `rmssd`,
temp = `temperature_delta`,
rr = `breath_average`,
cal_act = `cal_active`,
cal_tot = `cal_total`,
cal_tar = `target_calories`,
steps = `steps`,
active = `daily_movement`,
sed = `inactive`,
rest = `rest`,
lig = `met_min_low`,
mod = `met_min_medium`,
vig = `met_min_high`,
nwt = `non_wear`,
met = `average_met`,
) %>%
mutate(
tbt_min = tbt_min / 60,
tbt_hr = tbt_min / 60,
tst_min = tst_min / 60,
tst_hr = tst_min / 60,
awake_min = awake_min / 60,
awake_hr = awake_min / 60,
rem_min = rem_min / 60,
rem_hr = rem_min / 60,
light_min = light_min / 60,
light_hr = light_min / 60,
deep_min = deep_min / 60,
deep_hr = deep_min / 60,
mvpa = mod+vig,
tst_hr_7dra = SMA(tst_hr, 7),
hrv_7dra = SMA(hrv, 7)
)
# Exploring the distribution of the HRV data:
hist_hrv <- ggplot(df_oura, aes(hrv)) +
geom_histogram() +
labs(
title = "Histogram of nocturnal HRV",
subtitle = "Measured with OURA ring (v2)",
x = "RMSSD (ms)",
y = "Count"
) +
theme_light()
hist_hrv
# That distribution actually looks relatively symmetric already. Let's logarithmically transform it just in case (this is a common practice in HRV research) and see if the distribution improves:
df_oura <- df_oura %>%
mutate(lnhrv = log(hrv))
# Create the actual histogram to see that distribution:
hist_lnhrv <- ggplot(df_oura, aes(lnhrv)) +
geom_histogram() +
labs(
title = "Histogram of natural log of nocturnal HRV",
subtitle = "Measured with OURA ring (v2)",
x = "lnRMSSD (ms)",
y = "Count"
) +
theme_light()
hist_lnhrv
# That arguably looks slightly worse. Let's do some normality tests to see explore which of the two is the most normal (gaussian):
shapiro.test(df_oura$hrv)
shapiro.test(df_oura$lnhrv)
# Confirmed. The first is almost significant, the second very significant. I'll work with regular HRV in my analysis.
# Let's also check out the TST distribution:
hist_tst <- ggplot(df_oura, aes(tst_hr)) +
geom_histogram() +
labs(
title = "Histogram of TST",
subtitle = "Measured with OURA ring (v2)",
x = "TST (hr)",
y = "Count"
) +
theme_light()
hist_tst
shapiro.test(df_oura$tst_hr)
install.packages('performance')
library(performance) # A package with useful functions to easily assess the model diagnostics, e.g. with the check_model() function
# First: See if time sedentary, light, moderate and vigorous physical activity predict the nocturnal HRV:
lm_hrv_pa <- lm(lead(hrv) ~ vig + mod + lig + sed, df_oura)
summary(lm_hrv_pa)
install.packages('see')
install.packages('gridExtra')
check_model(lm_hrv_pa)
# Yes they do, all of them except for light PA. Model diagnostics look great. 9.4% explained variance.
# Second: See if my nocturnal HRV predicts my subsequent daytime moderate-to-vigorous physical activity (MVPA).
lm_mvpa_hrv <- lm(mvpa ~ hrv, df_oura)
summary(lm_mvpa_hrv) # It does. 8.8% explained variance.
check_model(lm_mvpa_hrv)
# Model diagnostics look iffy. Redisuals are clearly not normally distributed, and the homoscedasticity graph shows a clear floor effect of the HRV, whereas the Q-Q plot shows that particularly for very high HRV values, the residuals move away from the line. Perhaps this same model but with the logarithmically transformed HRV is better?
lm_mvpa_lnhrv <- lm(mvpa ~ lnhrv, df_oura)
summary(lm_mvpa_lnhrv)
check_model(lm_mvpa_lnhrv)
# No, doesn't look any better, potentially even worse. Which is why I originally chose to use the regular HRV values in the first place off course.
# Third: Does my nocturnal HRV predict my subsequent sedentary time?
lm_sed_hrv <- lm(sed ~ hrv, df_oura)
summary(lm_sed_hrv)
check_model(lm_sed_hrv)
# It does. Model diagnostics look great. 3.0% explained variance.
# Perhaps I can use the 7 day rolling average of HRV as well in my models to account for the multi-day effect?
lm_mvpa_hrv_7dra <- lm(mvpa ~ hrv_7dra, df_oura)
summary(lm_mvpa_hrv_7dra)
check_model(lm_mvpa_hrv_7dra)
lm_sed_hrv_7dra <- lm(sed ~ hrv_7dra, df_oura)
summary(lm_sed_hrv_7dra)
check_model(lm_sed_hrv_7dra)
# Looks like the explained variance actually goes down when you use this. Maybe the 3 day rolling average that the auto arima suggested works?
df_oura <- df_oura %>%
mutate(hrv_3dra = SMA(hrv, 3))
lm_mvpa_hrv_3dra <- lm(mvpa ~ hrv_3dra, df_oura)
summary(lm_mvpa_hrv_3dra)
check_model(lm_mvpa_hrv_3dra)
lm_sed_hrv_3dra <- lm(sed ~ hrv_3dra, df_oura)
summary(lm_sed_hrv_3dra)
check_model(lm_sed_hrv_3dra)
# Also lower explained variance. Looks like incorporating the lagged effect this way dilutes the actual effect of just using the intra-day values. Lastly; perhaps the same but with the 2 day rolling average?
df_oura <- df_oura %>%
mutate(hrv_2dra = SMA(hrv, 2))
lm_mvpa_hrv_2dra <- lm(mvpa ~ hrv_2dra, df_oura)
summary(lm_mvpa_hrv_2dra)
check_model(lm_mvpa_hrv_2dra)
lm_sed_hrv_2dra <- lm(sed ~ hrv_2dra, df_oura)
summary(lm_sed_hrv_2dra)
check_model(lm_sed_hrv_2dra)
# Also a lower explained variance than by just using the intra-day values. Looks like a dead end, I'll stick with the auto.arima() based (3,1,1) model that used the regular HRV data.
# Version with the original HRV data:
chart_hrv <- ggplot(df_oura, aes(as.Date(summary_date), hrv)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$hrv), linetype = 2) +
labs(
title = "Nocturnal Heart Rate Variability (HRV) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "r-MSSD (ms)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_hrv
# Version with the 7 day rolling average HRV data, which I personally prefer:
chart_hrv_7dra <- ggplot(df_oura, aes(as.Date(summary_date), hrv_7dra)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$hrv_7dra, na.rm = TRUE), linetype = 2) +
labs(
title = "7-day rolling average of the nocturnal Heart Rate Variability (HRV) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "r-MSSD (ms)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_hrv_7dra
install.packages('astsa')
install.packages('forecast')
library(astsa) # For the time series analysis tools that I used.
library(forecast) # For the forecast plot that I create below.
# There is clear autocorrelation present in all residuals. Let's assess the (P)ACF.
acf2(diff(df_oura$hrv))
# Looks like (7,1,3) to me, although I'm not completely sure this interpretation is correct. Based on this (https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/time-series/how-to/partial-autocorrelation/interpret-the-results/partial-autocorrelation-function-pacf/) article, you can also interpret the PACF that it just suggests that a moving average term is present. Let's check out what the auto.arima() function that basically does this itself says.
auto.arima(df_oura$hrv)
# Auto arima suggests (3,1,1). Let's compare the (7,1,1), (7,1,3) and (3,1,1) models and see what comes out myself.
arima(df_oura$hrv, order = c(7,1,1))
arima(df_oura$hrv, order = c(7,1,3))
arima(df_oura$hrv, order = c(3,1,1))
arima_711 <- arima(df_oura$hrv, order = c(7,1,1))
arima_711
tsdiag(arima_711)
plot(forecast(arima_711))
arima_713 <- arima(df_oura$hrv, order = c(7,1,3))
arima_713 # Note that this model produces NA's!
tsdiag(arima_713)
plot(forecast(arima_713))
arima_auto <- auto.arima(df_oura$hrv)
arima_auto
tsdiag(arima_auto)
plot(forecast(arima_auto))
# (7,1,3) indeed has the lowest AIC, but note that it apparently found problems in the data and could not complete the model in full. Probably need to disregard this. I'll go for the (3,1,1) model that the auto.arima() function suggested.
# Are TST and nocturnal HRV related?
lm_hrv_tst <- lm(hrv ~ tst_hr, df_oura)
summary(lm_hrv_tst)
check_model(lm_hrv_tst)
# Yes they are, though just 1.7% explained variance. Note that both are measured simultaneously (cross-sectional), so don't interpret this as a causal relationship.
# Are my sleep stages and nocturnal HRV related?
lm_hrv_stages <- lm(hrv ~ awake_min + rem_min + light_min + deep_min, df_oura)
summary(lm_hrv_stages)
check_model(lm_hrv_stages)
# Yes they are, 48% explained variance. However; this is likely due to the use of the HRV data in the sleep stages classification algorithm.
Do the physical activity variables predict TST?
lm_tst_pa <- lm(lead(tst_hr) ~ vig + mod + lig + sed, df_oura)
summary(lm_tst_pa)
check_model(lm_tst_pa)
# Significant multicollinearity, particularly between light PA and sedentary time. Need to disregard this model. I'll assess all relationships univariately.
# Does sedentary time predict TST?
lm_tst_sed <- lm(lead(tst_hr) ~ sed, df_oura)
summary(lm_tst_sed)
check_model(lm_tst_sed)
# Does MVPA predict TST?
lm_tst_mvpa <- lm(lead(tst_hr) ~ mvpa, df_oura)
summary(lm_tst_mvpa)
check_model(lm_tst_mvpa)
# Does light PA predict TST?
lm_tst_lpa <- lm(lead(tst_hr) ~ lig, df_oura)
summary(lm_tst_lpa)
check_model(lm_tst_lpa)
# Does moderate PA predict TST?
lm_tst_mpa <- lm(lead(tst_hr) ~ mod, df_oura)
summary(lm_tst_mpa)
check_model(lm_tst_mpa)
# Does vigorous PA predict TST?
lm_tst_vpa <- lm(lead(tst_hr) ~ vig, df_oura)
summary(lm_tst_vpa)
check_model(lm_tst_vpa)
# Does TST predict MVPA?
lm_mvpa_tst <- lm(mvpa ~ tst_hr, df_oura)
summary(lm_mvpa_tst)
check_model(lm_mvpa_tst)
# Does TST predict sedentary time?
lm_sed_tst <- lm(sed ~ tst_hr, df_oura)
summary(lm_sed_tst)
check_model(lm_sed_tst)
# Are MVPA and sedentary time correlated?
cor.test(df_oura$sed, df_oura$mvpa)
# Regular line chart with my TST over time:
chart_tst_hr <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$tst_hr), linetype = 2) +
labs(
title = "Total Sleep Time (TST) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "TST (hours)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_tst_hr
# Same TST over time chart as above, but just using the 7 day rolling average version:
chart_tst_hr_7dra <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr_7dra)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$tst_hr_7dra, na.rm = TRUE), linetype = 2) +
labs(
title = "7-day rolling average of Total Sleep Time (TST) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "TST (hours)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_tst_hr_7dra
# Regular line chart with my HRV over time:
chart_hrv <- ggplot(df_oura, aes(as.Date(summary_date), hrv)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$hrv), linetype = 2) +
labs(
title = "Nocturnal Heart Rate Variability (HRV) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "r-MSSD (ms)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_hrv
# Same HRV over time chart but using the 7 day rolling average variable:
chart_hrv_7dra <- ggplot(df_oura, aes(as.Date(summary_date), hrv_7dra)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$hrv_7dra, na.rm = TRUE), linetype = 2) +
labs(
title = "7-day rolling average of the nocturnal Heart Rate Variability (HRV) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "r-MSSD (ms)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_hrv_7dra
# I'd like to make a chart with both TST and HRV in it, and compare them. To do so, one way would be to standardize both variables. This means that you center each observation first (subtract the mean from each observation) and then divide that by the standard deviation of the observations up to that point. In doing so, you're putting both variables on the same scale. The number that you get is essentially the distance in the # of standard deviations from the mean value.
# First a for-loop to create the standardized datapoints:
for (i in 1:nrow(df_oura)) {
df_oura$tst_hr_std[i] <- (df_oura$tst_hr[i] - mean(df_oura$tst_hr[0:i])) / sd(df_oura$tst_hr[0:i])
df_oura$tst_hr_7dra_std[i] <- (df_oura$tst_hr_7dra[i] -
mean(df_oura$tst_hr_7dra[0:i], na.rm = TRUE)) / sd(df_oura$tst_hr_7dra[0:i], na.rm = TRUE)
df_oura$hrv_std[i] <- (df_oura$hrv[i] - mean(df_oura$hrv[0:i])) / sd(df_oura$hrv[0:i])
df_oura$hrv_7dra_std[i] <- (df_oura$hrv_7dra[i] -
mean(df_oura$hrv_7dra[0:i], na.rm = TRUE)) / sd(df_oura$hrv_7dra[0:i], na.rm = TRUE)
}
# Chart of the standardized TST over time:
chart_tst_hr_std <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr_std)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$tst_hr_std, na.rm = TRUE), linetype = 2) +
labs(
title = "Standardized Total Sleep Time (TST) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "TST (SD from mean up to that point)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_tst_hr_std
# Chart of the standardized HRV over time:
chart_hrv_std <- ggplot(df_oura, aes(as.Date(summary_date), hrv_std)) +
geom_line() +
geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) +
labs(
title = "Standardized Heart Rate Variability (HRV; r-MSSD in milliseconds) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "HRV (SD from mean up to that point)"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
theme_light()
chart_hrv_std
# Chart that combines the standardized HRV and TST over time:
chart_combined_std <- ggplot(df_oura, aes(as.Date(summary_date))) +
geom_line(aes(y = tst_hr_std, colour = "TST"), alpha = 0.5) +
geom_line(aes(y = hrv_std, colour = "HRV"), alpha = 0.5) +
geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) +
labs(
title = "Standardized Total Sleep Time (TST) and Heart Rate Variability (HRV) over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "SD from mean up to that point"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
scale_color_discrete(name = "Variable", breaks = c("TST","HRV")) +
theme_light()
chart_combined_std
# Same as the previous, but now using the standardized 7 day rolling average values:
chart_combined_7dra_std <- ggplot(df_oura, aes(as.Date(summary_date))) +
geom_line(aes(y = tst_hr_7dra_std, colour = "TST"), alpha = 0.5) +
geom_line(aes(y = hrv_7dra_std, colour = "HRV"), alpha = 0.5) +
geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) +
labs(
title = "Standardized 7-day rolling average of TST and HRV over time",
subtitle = "Measured with OURA ring (v2)",
x = "Date (MM-'YY)",
y = "SD from mean up to that point"
) +
scale_x_date(breaks = "month", date_labels = "%m-'%y") +
scale_color_discrete(name = "Variable", breaks = c("TST","HRV")) +
theme_light()
chart_combined_7dra_std