Details for oura-hrv-activity-sleep.ipynb

Published by gedankenstuecke

Description

A notebook adapted from Herman de Vries' blogpost and code on using Oura data to explore the links between HRV, physical activity and sleep

0

Tags & Data Sources

hrv heart rate sleep activity Oura Connect

Comments

from data from oura ring 2022 on like 7th or 8th cell: Error in runSum(x, n): n = 7 is outside valid range: [1, 6] Traceback: 1. df %>% rename

DG , 1 year, 11 months ago
Please log in to comment.

Notebook
Last updated 3 years, 6 months ago

Relationship between HRV, Sleep & physical activity in personal Oura ring data.

Background

This notebook is based on the work by Herman de Vries, which he published in a very interesting blog post. Herman also presented his work at one of the Open Humans self-research chats. Following the chat Herman was so kind to publish the code he used for analyzing his Oura data as an HTML render of his R code.

If you want to learn more about what Herman learned and how he approached his learning from his Oura data you should definitely read his blog post, as this code is not fully annotated yet.

This notebook

I took Herman's code and copy & pasted most of it right into the Exploratory framework of Open Humans. I only made minor changes to the preamble of the notebook in which the data is being loaded. Herman used a CSV export from Oura, while this notebook makes use of the JSON file of your historic data that's stored in Open Humans.

What this means: If you want to run this notebook right away on your own Oura data you have to import it into Open Humans by going to https://oura.openhumans.org/. Once your data is stored in Open Humans the notebook should run from start to finish without any issues.

Data import

Below is the boring code that loads your personal Oura data and puts it into a format that's in line with Herman's original code. If you're not interested in such things you can straight jump to the next headline.

── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

 ggplot2 3.3.0      purrr   0.3.4
 tibble  3.0.1      dplyr   0.8.5
 tidyr   1.1.0      stringr 1.4.0
 readr   1.3.1      forcats 0.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()


Attaching package: ‘jsonlite’


The following object is masked from ‘package:purrr’:

    flatten


`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

	Shapiro-Wilk normality test

data:  df_oura$hrv
W = 0.98304, p-value = 3.867e-07
	Shapiro-Wilk normality test

data:  df_oura$lnhrv
W = 0.98515, p-value = 1.945e-06
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

	Shapiro-Wilk normality test

data:  df_oura$tst_hr
W = 0.99591, p-value = 0.06978

HRV versus physical activity

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Call:
lm(formula = lead(hrv) ~ vig + mod + lig + sed, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.644  -6.416  -0.143   6.276  33.682 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 23.346561   2.947085   7.922 9.56e-15 ***
vig          0.003677   0.006083   0.604    0.546    
mod          0.014330   0.002558   5.601 3.09e-08 ***
lig          0.050185   0.006174   8.129 2.05e-15 ***
sed          0.005587   0.003750   1.490    0.137    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.621 on 680 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.157,	Adjusted R-squared:  0.152 
F-statistic: 31.66 on 4 and 680 DF,  p-value: < 2.2e-16
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ hrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-301.72 -103.91  -11.79   77.76 1084.15 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 188.9329    23.5265   8.031 4.24e-15 ***
hrv           2.5645     0.5931   4.324 1.76e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 162.1 on 684 degrees of freedom
Multiple R-squared:  0.02661,	Adjusted R-squared:  0.02518 
F-statistic:  18.7 on 1 and 684 DF,  p-value: 1.761e-05
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ lnhrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-295.95 -102.46  -13.37   79.55 1091.88 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -10.69      79.57  -0.134 0.893168    
lnhrv          82.57      22.00   3.753 0.000189 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 162.6 on 684 degrees of freedom
Multiple R-squared:  0.02018,	Adjusted R-squared:  0.01875 
F-statistic: 14.09 on 1 and 684 DF,  p-value: 0.0001892
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
Call:
lm(formula = sed ~ hrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.24  -62.30    8.82   71.24  298.32 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 715.5453    15.0898  47.419  < 2e-16 ***
hrv          -2.2770     0.3804  -5.986 3.48e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104 on 684 degrees of freedom
Multiple R-squared:  0.04977,	Adjusted R-squared:  0.04838 
F-statistic: 35.83 on 1 and 684 DF,  p-value: 3.477e-09
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ hrv_7dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-322.87 -104.15  -14.11   74.43 1083.56 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 126.1401    33.2742   3.791 0.000163 ***
hrv_7dra      4.2020     0.8539   4.921 1.08e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.9 on 678 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.03449,	Adjusted R-squared:  0.03306 
F-statistic: 24.22 on 1 and 678 DF,  p-value: 1.082e-06
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 680 rows containing missing values (geom_text).”
Call:
lm(formula = sed ~ hrv_7dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-643.13  -61.16    4.49   71.26  314.27 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 724.3684    21.5777  33.570  < 2e-16 ***
hrv_7dra     -2.4854     0.5537  -4.488 8.43e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 105 on 678 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.02886,	Adjusted R-squared:  0.02742 
F-statistic: 20.15 on 1 and 678 DF,  p-value: 8.429e-06
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 680 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ hrv_3dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-308.37 -103.96  -13.96   74.67 1083.77 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 138.5130    28.6674   4.832 1.67e-06 ***
hrv_3dra      3.8899     0.7316   5.317 1.43e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.1 on 682 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.0398,	Adjusted R-squared:  0.03839 
F-statistic: 28.27 on 1 and 682 DF,  p-value: 1.434e-07
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 684 rows containing missing values (geom_text).”
Call:
lm(formula = sed ~ hrv_3dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-641.59  -60.66    4.43   72.85  313.92 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 722.5982    18.5762  38.899  < 2e-16 ***
hrv_3dra     -2.4491     0.4741  -5.166 3.15e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.4 on 682 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.03765,	Adjusted R-squared:  0.03624 
F-statistic: 26.69 on 1 and 682 DF,  p-value: 3.147e-07
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 684 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ hrv_2dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-315.73 -101.96  -13.96   75.53 1078.46 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 151.6990    26.7461   5.672 2.09e-08 ***
hrv_2dra      3.5415     0.6802   5.207 2.54e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.2 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.03818,	Adjusted R-squared:  0.03677 
F-statistic: 27.11 on 1 and 683 DF,  p-value: 2.544e-07
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = sed ~ hrv_2dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.56  -58.96    5.93   73.70  321.45 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 721.8524    17.2940  41.740  < 2e-16 ***
hrv_2dra     -2.4358     0.4398  -5.539 4.35e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.2 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.04298,	Adjusted R-squared:  0.04158 
F-statistic: 30.68 on 1 and 683 DF,  p-value: 4.352e-08
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”

HRV over time line chart

Warning message:
“Removed 6 row(s) containing missing values (geom_path).”

HRV time series analysis

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 


Attaching package: ‘forecast’


The following object is masked from ‘package:astsa’:

    gas


A matrix: 2 × 37 of type dbl
ACF-0.41-0.03 0.00-0.05 0.00-0.01 0.07-0.03-0.03-0.010.12-0.09-0.030.05-0.01-0.03-0.01-0.02 0.050.01
PACF-0.41-0.24-0.15-0.16-0.14-0.13-0.01-0.02-0.06-0.080.10 0.02-0.040.00 0.02-0.04-0.07-0.13-0.040.00
Series: df_oura$hrv 
ARIMA(1,0,2) with non-zero mean 

Coefficients:
         ar1      ma1      ma2     mean
      0.9094  -0.5644  -0.0758  38.2464
s.e.  0.0294   0.0492   0.0430   1.2981

sigma^2 estimated as 75.48:  log likelihood=-2454.71
AIC=4919.43   AICc=4919.52   BIC=4942.08
Call:
arima(x = df_oura$hrv, order = c(7, 1, 1))

Coefficients:
         ar1     ar2     ar3     ar4     ar5     ar6     ar7      ma1
      0.3461  0.1424  0.0778  0.0108  0.0453  0.0526  0.0837  -1.0000
s.e.  0.0381  0.0402  0.0405  0.0406  0.0405  0.0402  0.0380   0.0104

sigma^2 estimated as 74.52:  log likelihood = -2450.65,  aic = 4919.3
Call:
arima(x = df_oura$hrv, order = c(7, 1, 3))

Coefficients:
         ar1      ar2     ar3     ar4     ar5     ar6     ar7      ma1     ma2
      0.4975  -0.4475  0.2493  0.0799  0.0906  0.0603  0.1106  -1.1496  0.6958
s.e.  0.2558   0.3096  0.1021  0.0618  0.0539  0.0427  0.0480   0.2559  0.4297
          ma3
      -0.5462
s.e.   0.2703

sigma^2 estimated as 74.3:  log likelihood = -2449.73,  aic = 4921.45
Call:
arima(x = df_oura$hrv, order = c(3, 1, 1))

Coefficients:
         ar1     ar2     ar3      ma1
      0.3334  0.1375  0.0861  -0.9663
s.e.  0.0463  0.0445  0.0441   0.0253

sigma^2 estimated as 76.22:  log likelihood = -2456.97,  aic = 4923.94
Call:
arima(x = df_oura$hrv, order = c(7, 1, 1))

Coefficients:
         ar1     ar2     ar3     ar4     ar5     ar6     ar7      ma1
      0.3461  0.1424  0.0778  0.0108  0.0453  0.0526  0.0837  -1.0000
s.e.  0.0381  0.0402  0.0405  0.0406  0.0405  0.0402  0.0380   0.0104

sigma^2 estimated as 74.52:  log likelihood = -2450.65,  aic = 4919.3
Call:
arima(x = df_oura$hrv, order = c(7, 1, 3))

Coefficients:
         ar1      ar2     ar3     ar4     ar5     ar6     ar7      ma1     ma2
      0.4975  -0.4475  0.2493  0.0799  0.0906  0.0603  0.1106  -1.1496  0.6958
s.e.  0.2558   0.3096  0.1021  0.0618  0.0539  0.0427  0.0480   0.2559  0.4297
          ma3
      -0.5462
s.e.   0.2703

sigma^2 estimated as 74.3:  log likelihood = -2449.73,  aic = 4921.45
Series: df_oura$hrv 
ARIMA(1,0,2) with non-zero mean 

Coefficients:
         ar1      ma1      ma2     mean
      0.9094  -0.5644  -0.0758  38.2464
s.e.  0.0294   0.0492   0.0430   1.2981

sigma^2 estimated as 75.48:  log likelihood=-2454.71
AIC=4919.43   AICc=4919.52   BIC=4942.08

HRV versus sleep

Call:
lm(formula = hrv ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.129  -7.062  -1.215   6.446  44.298 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.6053     2.5409  14.406   <2e-16 ***
tst_hr        0.2498     0.3763   0.664    0.507    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.45 on 684 degrees of freedom
Multiple R-squared:  0.0006438,	Adjusted R-squared:  -0.0008172 
F-statistic: 0.4407 on 1 and 684 DF,  p-value: 0.507
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
Call:
lm(formula = hrv ~ awake_min + rem_min + light_min + deep_min, 
    data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-30.638  -5.911  -0.478   5.363  38.451 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.628170   2.536519   8.921  < 2e-16 ***
awake_min    0.050508   0.013087   3.859 0.000124 ***
rem_min     -0.035798   0.007367  -4.859 1.46e-06 ***
light_min    0.034752   0.007004   4.962 8.84e-07 ***
deep_min     0.165188   0.014660  11.268  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.296 on 681 degrees of freedom
Multiple R-squared:  0.2119,	Adjusted R-squared:  0.2072 
F-statistic: 45.76 on 4 and 681 DF,  p-value: < 2.2e-16
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”

Sleep versus physical activity

Do the physical activity variables predict TST?

Call:
lm(formula = lead(tst_hr) ~ vig + mod + lig + sed, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5863 -0.7116 -0.0810  0.6995  4.4372 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.920e+00  3.224e-01  21.462  < 2e-16 ***
vig          6.919e-04  6.655e-04   1.040  0.29888    
mod          4.277e-04  2.799e-04   1.528  0.12702    
lig         -2.213e-03  6.755e-04  -3.276  0.00111 ** 
sed         -8.321e-05  4.103e-04  -0.203  0.83935    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.053 on 680 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.01947,	Adjusted R-squared:  0.0137 
F-statistic: 3.375 on 4 and 680 DF,  p-value: 0.009528
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = lead(tst_hr) ~ sed, data = df_oura)

Residuals:
   Min     1Q Median     3Q    Max 
-3.509 -0.689 -0.080  0.732  4.325 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.607e+00  2.425e-01  27.245   <2e-16 ***
sed         9.464e-05  3.804e-04   0.249    0.804    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.061 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  9.061e-05,	Adjusted R-squared:  -0.001373 
F-statistic: 0.06189 on 1 and 683 DF,  p-value: 0.8036
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = lead(tst_hr) ~ mvpa, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5520 -0.6920 -0.0812  0.7396  4.4083 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.5810271  0.0815435  80.706   <2e-16 ***
mvpa        0.0002972  0.0002467   1.205    0.229    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.002121,	Adjusted R-squared:  0.00066 
F-statistic: 1.452 on 1 and 683 DF,  p-value: 0.2287
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = lead(tst_hr) ~ lig, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5282 -0.7109 -0.0747  0.7000  4.3262 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.9577974  0.1019238  68.265  < 2e-16 ***
lig         -0.0019692  0.0006326  -3.113  0.00193 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.053 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.01399,	Adjusted R-squared:  0.01254 
F-statistic: 9.689 on 1 and 683 DF,  p-value: 0.001931
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = lead(tst_hr) ~ mod, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5405 -0.6879 -0.0797  0.7327  4.3834 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.6124632  0.0825907  80.063   <2e-16 ***
mod         0.0001955  0.0002613   0.748    0.455    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.000819,	Adjusted R-squared:  -0.000644 
F-statistic: 0.5598 on 1 and 683 DF,  p-value: 0.4546
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = lead(tst_hr) ~ vig, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5132 -0.6811 -0.0760  0.7439  4.3522 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.6560898  0.0411833 161.621   <2e-16 ***
vig         0.0008936  0.0006655   1.343     0.18    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.059 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.002633,	Adjusted R-squared:  0.001173 
F-statistic: 1.803 on 1 and 683 DF,  p-value: 0.1798
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
Call:
lm(formula = mvpa ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-293.60 -107.09   -8.94   75.68 1121.85 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  267.376     39.954   6.692 4.58e-11 ***
tst_hr         2.955      5.917   0.499    0.618    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 164.2 on 684 degrees of freedom
Multiple R-squared:  0.0003644,	Adjusted R-squared:  -0.001097 
F-statistic: 0.2493 on 1 and 684 DF,  p-value: 0.6177
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
Call:
lm(formula = sed ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-601.58  -62.06    8.85   69.75  292.73 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  770.123     25.355   30.37  < 2e-16 ***
tst_hr       -21.253      3.755   -5.66 2.23e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.2 on 684 degrees of freedom
Multiple R-squared:  0.04474,	Adjusted R-squared:  0.04334 
F-statistic: 32.03 on 1 and 684 DF,  p-value: 2.231e-08
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
	Pearson's product-moment correlation

data:  df_oura$sed and df_oura$mvpa
t = -8.4985, df = 684, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3752182 -0.2397329
sample estimates:
       cor 
-0.3090427 

Bonus: standardized charts

Warning message:
“Removed 6 row(s) containing missing values (geom_path).”
Warning message:
“Removed 6 row(s) containing missing values (geom_path).”
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
Warning message:
“Removed 7 row(s) containing missing values (geom_path).”
Warning message:
“Removed 7 row(s) containing missing values (geom_path).”

Notebook
Last updated 3 years, 6 months ago

Relationship between HRV, Sleep & physical activity in personal Oura ring data.

Background

This notebook is based on the work by Herman de Vries, which he published in a very interesting blog post. Herman also presented his work at one of the Open Humans self-research chats. Following the chat Herman was so kind to publish the code he used for analyzing his Oura data as an HTML render of his R code.

If you want to learn more about what Herman learned and how he approached his learning from his Oura data you should definitely read his blog post, as this code is not fully annotated yet.

This notebook

I took Herman's code and copy & pasted most of it right into the Exploratory framework of Open Humans. I only made minor changes to the preamble of the notebook in which the data is being loaded. Herman used a CSV export from Oura, while this notebook makes use of the JSON file of your historic data that's stored in Open Humans.

What this means: If you want to run this notebook right away on your own Oura data you have to import it into Open Humans by going to https://oura.openhumans.org/. Once your data is stored in Open Humans the notebook should run from start to finish without any issues.

Data import

Below is the boring code that loads your personal Oura data and puts it into a format that's in line with Herman's original code. If you're not interested in such things you can straight jump to the next headline.

In [1]:
library(tidyverse) # Includes the packages dplyr and ggplot2 that I'll be using a lot.
library(TTR) # Used to calculate the rolling averages with the SMA() function.
library('httr')
library('jsonlite')
options(repr.plot.width=15, repr.plot.height=8)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

 ggplot2 3.3.0      purrr   0.3.4
 tibble  3.0.1      dplyr   0.8.5
 tidyr   1.1.0      stringr 1.4.0
 readr   1.3.1      forcats 0.5.0

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
 dplyr::filter() masks stats::filter()
 dplyr::lag()    masks stats::lag()


Attaching package: ‘jsonlite’


The following object is masked from ‘package:purrr’:

    flatten


In [2]:
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("https://www.openhumans.org/api/direct-sharing/project/exchange-member/?access_token=",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")

for (data_source in user$data){
    if (data_source$source == "direct-sharing-184"){
         fitbit_url <- data_source$download_url
    }
}

temp <- tempfile()
download.file(fitbit_url,temp,method='wget')
json_data <- fromJSON(txt=temp)
In [3]:
sleep <- json_data$sleep
In [4]:
activity <- json_data$activity
In [5]:
readiness <- json_data$readiness
In [6]:
df <- merge(merge(sleep, readiness, by='summary_date'),activity,by='summary_date')
In [7]:
df_oura <- df %>%
  rename(
    tbt_min = `duration`,
    tst_min = `total.x`,
    awake_min = `awake`,
    rem_min = `rem`,
    light_min = `light`,
    deep_min = `deep`,
    restless = `restless`,
    se = `efficiency`,
    sol_min = `onset_latency`,
    start = `bedtime_start`,
    end = `bedtime_start`,
    rhr_avg = `hr_average`,
    rhr_low = `hr_lowest`,
    hrv = `rmssd`,
    temp = `temperature_delta`,
    rr = `breath_average`,
    cal_act = `cal_active`,
    cal_tot = `cal_total`,
    cal_tar = `target_calories`,
    steps = `steps`,
    active = `daily_movement`,
    sed = `inactive`,
    rest = `rest`,
    lig = `met_min_low`,
    mod = `met_min_medium`,
    vig = `met_min_high`,
    nwt = `non_wear`,
    met = `average_met`,
  ) %>%
  mutate(
    tbt_min = tbt_min / 60,
    tbt_hr = tbt_min / 60,
    tst_min = tst_min / 60,
    tst_hr = tst_min / 60,
    awake_min = awake_min / 60,
    awake_hr = awake_min / 60,
    rem_min = rem_min / 60,
    rem_hr = rem_min / 60,
    light_min = light_min / 60,
    light_hr = light_min / 60,
    deep_min = deep_min / 60,
    deep_hr = deep_min / 60,
    mvpa = mod+vig,
    tst_hr_7dra = SMA(tst_hr, 7),
    hrv_7dra = SMA(hrv, 7)
  )
In [8]:
# Exploring the distribution of the HRV data:
hist_hrv <- ggplot(df_oura, aes(hrv)) +
  geom_histogram() +
  labs(
    title = "Histogram of nocturnal HRV",
    subtitle = "Measured with OURA ring (v2)",
    x = "RMSSD (ms)",
    y = "Count"
  ) + 
  theme_light()
hist_hrv

# That distribution actually looks relatively symmetric already. Let's logarithmically transform it just in case (this is a common practice in HRV research) and see if the distribution improves:
df_oura <- df_oura %>%
  mutate(lnhrv = log(hrv))

# Create the actual histogram to see that distribution:
hist_lnhrv <- ggplot(df_oura, aes(lnhrv)) +
  geom_histogram() +
  labs(
    title = "Histogram of natural log of nocturnal HRV",
    subtitle = "Measured with OURA ring (v2)",
    x = "lnRMSSD (ms)",
    y = "Count"
  ) + 
  theme_light()
hist_lnhrv
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

In [9]:
# That arguably looks slightly worse. Let's do some normality tests to see explore which of the two is the most normal (gaussian):
shapiro.test(df_oura$hrv)
	Shapiro-Wilk normality test

data:  df_oura$hrv
W = 0.98304, p-value = 3.867e-07
In [10]:
shapiro.test(df_oura$lnhrv)
	Shapiro-Wilk normality test

data:  df_oura$lnhrv
W = 0.98515, p-value = 1.945e-06
In [11]:
# Confirmed. The first is almost significant, the second very significant. I'll work with regular HRV in my analysis.

# Let's also check out the TST distribution:
hist_tst <- ggplot(df_oura, aes(tst_hr)) +
  geom_histogram() +
  labs(
    title = "Histogram of TST",
    subtitle = "Measured with OURA ring (v2)",
    x = "TST (hr)",
    y = "Count"
  ) + 
  theme_light()
hist_tst

shapiro.test(df_oura$tst_hr)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

	Shapiro-Wilk normality test

data:  df_oura$tst_hr
W = 0.99591, p-value = 0.06978

HRV versus physical activity

In [12]:
install.packages('performance')
library(performance) # A package with useful functions to easily assess the model diagnostics, e.g. with the check_model() function
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

In [13]:
# First: See if time sedentary, light, moderate and vigorous physical activity predict the nocturnal HRV: 
lm_hrv_pa <- lm(lead(hrv) ~ vig + mod + lig + sed, df_oura)
summary(lm_hrv_pa)
Call:
lm(formula = lead(hrv) ~ vig + mod + lig + sed, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.644  -6.416  -0.143   6.276  33.682 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 23.346561   2.947085   7.922 9.56e-15 ***
vig          0.003677   0.006083   0.604    0.546    
mod          0.014330   0.002558   5.601 3.09e-08 ***
lig          0.050185   0.006174   8.129 2.05e-15 ***
sed          0.005587   0.003750   1.490    0.137    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.621 on 680 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.157,	Adjusted R-squared:  0.152 
F-statistic: 31.66 on 4 and 680 DF,  p-value: < 2.2e-16
In [14]:
install.packages('see')
install.packages('gridExtra')
check_model(lm_hrv_pa)
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [15]:
# Yes they do, all of them except for light PA. Model diagnostics look great. 9.4% explained variance.

# Second: See if my nocturnal HRV predicts my subsequent daytime moderate-to-vigorous physical activity (MVPA).
lm_mvpa_hrv <- lm(mvpa ~ hrv, df_oura)
summary(lm_mvpa_hrv) # It does. 8.8% explained variance.
Call:
lm(formula = mvpa ~ hrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-301.72 -103.91  -11.79   77.76 1084.15 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 188.9329    23.5265   8.031 4.24e-15 ***
hrv           2.5645     0.5931   4.324 1.76e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 162.1 on 684 degrees of freedom
Multiple R-squared:  0.02661,	Adjusted R-squared:  0.02518 
F-statistic:  18.7 on 1 and 684 DF,  p-value: 1.761e-05
In [16]:
check_model(lm_mvpa_hrv)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [17]:
# Model diagnostics look iffy. Redisuals are clearly not normally distributed, and the homoscedasticity graph shows a clear floor effect of the HRV, whereas the Q-Q plot shows that particularly for very high HRV values, the residuals move away from the line. Perhaps this same model but with the logarithmically transformed HRV is better?
lm_mvpa_lnhrv <- lm(mvpa ~ lnhrv, df_oura)
summary(lm_mvpa_lnhrv)
Call:
lm(formula = mvpa ~ lnhrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-295.95 -102.46  -13.37   79.55 1091.88 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -10.69      79.57  -0.134 0.893168    
lnhrv          82.57      22.00   3.753 0.000189 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 162.6 on 684 degrees of freedom
Multiple R-squared:  0.02018,	Adjusted R-squared:  0.01875 
F-statistic: 14.09 on 1 and 684 DF,  p-value: 0.0001892
In [18]:
check_model(lm_mvpa_lnhrv)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [19]:
# No, doesn't look any better, potentially even worse. Which is why I originally chose to use the regular HRV values in the first place off course.

# Third: Does my nocturnal HRV predict my subsequent sedentary time?
lm_sed_hrv <- lm(sed ~ hrv, df_oura)
summary(lm_sed_hrv)
Call:
lm(formula = sed ~ hrv, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.24  -62.30    8.82   71.24  298.32 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 715.5453    15.0898  47.419  < 2e-16 ***
hrv          -2.2770     0.3804  -5.986 3.48e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104 on 684 degrees of freedom
Multiple R-squared:  0.04977,	Adjusted R-squared:  0.04838 
F-statistic: 35.83 on 1 and 684 DF,  p-value: 3.477e-09
In [20]:
check_model(lm_sed_hrv)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [21]:
# It does. Model diagnostics look great. 3.0% explained variance.

# Perhaps I can use the 7 day rolling average of HRV as well in my models to account for the multi-day effect?
lm_mvpa_hrv_7dra <- lm(mvpa ~ hrv_7dra, df_oura)
summary(lm_mvpa_hrv_7dra)
Call:
lm(formula = mvpa ~ hrv_7dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-322.87 -104.15  -14.11   74.43 1083.56 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 126.1401    33.2742   3.791 0.000163 ***
hrv_7dra      4.2020     0.8539   4.921 1.08e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.9 on 678 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.03449,	Adjusted R-squared:  0.03306 
F-statistic: 24.22 on 1 and 678 DF,  p-value: 1.082e-06
In [22]:
check_model(lm_mvpa_hrv_7dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 680 rows containing missing values (geom_text).”
In [23]:
lm_sed_hrv_7dra <- lm(sed ~ hrv_7dra, df_oura)
summary(lm_sed_hrv_7dra)
Call:
lm(formula = sed ~ hrv_7dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-643.13  -61.16    4.49   71.26  314.27 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 724.3684    21.5777  33.570  < 2e-16 ***
hrv_7dra     -2.4854     0.5537  -4.488 8.43e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 105 on 678 degrees of freedom
  (6 observations deleted due to missingness)
Multiple R-squared:  0.02886,	Adjusted R-squared:  0.02742 
F-statistic: 20.15 on 1 and 678 DF,  p-value: 8.429e-06
In [24]:
check_model(lm_sed_hrv_7dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 680 rows containing missing values (geom_text).”
In [25]:
# Looks like the explained variance actually goes down when you use this. Maybe the 3 day rolling average that the auto arima suggested works?
df_oura <- df_oura %>%
  mutate(hrv_3dra = SMA(hrv, 3))

lm_mvpa_hrv_3dra <- lm(mvpa ~ hrv_3dra, df_oura)
summary(lm_mvpa_hrv_3dra)
Call:
lm(formula = mvpa ~ hrv_3dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-308.37 -103.96  -13.96   74.67 1083.77 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 138.5130    28.6674   4.832 1.67e-06 ***
hrv_3dra      3.8899     0.7316   5.317 1.43e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.1 on 682 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.0398,	Adjusted R-squared:  0.03839 
F-statistic: 28.27 on 1 and 682 DF,  p-value: 1.434e-07
In [26]:
check_model(lm_mvpa_hrv_3dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 684 rows containing missing values (geom_text).”
In [27]:
lm_sed_hrv_3dra <- lm(sed ~ hrv_3dra, df_oura)
summary(lm_sed_hrv_3dra)
Call:
lm(formula = sed ~ hrv_3dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-641.59  -60.66    4.43   72.85  313.92 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 722.5982    18.5762  38.899  < 2e-16 ***
hrv_3dra     -2.4491     0.4741  -5.166 3.15e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.4 on 682 degrees of freedom
  (2 observations deleted due to missingness)
Multiple R-squared:  0.03765,	Adjusted R-squared:  0.03624 
F-statistic: 26.69 on 1 and 682 DF,  p-value: 3.147e-07
In [28]:
check_model(lm_sed_hrv_3dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 684 rows containing missing values (geom_text).”
In [29]:
# Also lower explained variance. Looks like incorporating the lagged effect this way dilutes the actual effect of just using the intra-day values. Lastly; perhaps the same but with the 2 day rolling average?
df_oura <- df_oura %>%
  mutate(hrv_2dra = SMA(hrv, 2))

lm_mvpa_hrv_2dra <- lm(mvpa ~ hrv_2dra, df_oura)
summary(lm_mvpa_hrv_2dra)
Call:
lm(formula = mvpa ~ hrv_2dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-315.73 -101.96  -13.96   75.53 1078.46 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 151.6990    26.7461   5.672 2.09e-08 ***
hrv_2dra      3.5415     0.6802   5.207 2.54e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 161.2 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.03818,	Adjusted R-squared:  0.03677 
F-statistic: 27.11 on 1 and 683 DF,  p-value: 2.544e-07
In [30]:
check_model(lm_mvpa_hrv_2dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [31]:
lm_sed_hrv_2dra <- lm(sed ~ hrv_2dra, df_oura)
summary(lm_sed_hrv_2dra)
Call:
lm(formula = sed ~ hrv_2dra, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-646.56  -58.96    5.93   73.70  321.45 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 721.8524    17.2940  41.740  < 2e-16 ***
hrv_2dra     -2.4358     0.4398  -5.539 4.35e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.2 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.04298,	Adjusted R-squared:  0.04158 
F-statistic: 30.68 on 1 and 683 DF,  p-value: 4.352e-08
In [32]:
check_model(lm_sed_hrv_2dra)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [33]:
# Also a lower explained variance than by just using the intra-day values. Looks like a dead end, I'll stick with the auto.arima() based (3,1,1) model that used the regular HRV data.

HRV over time line chart

In [34]:
# Version with the original HRV data:
chart_hrv <- ggplot(df_oura, aes(as.Date(summary_date), hrv)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$hrv), linetype = 2) + 
  labs(
    title = "Nocturnal Heart Rate Variability (HRV) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "r-MSSD (ms)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_hrv
In [35]:
# Version with the 7 day rolling average HRV data, which I personally prefer:
chart_hrv_7dra <- ggplot(df_oura, aes(as.Date(summary_date), hrv_7dra)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$hrv_7dra, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "7-day rolling average of the nocturnal Heart Rate Variability (HRV)  over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "r-MSSD (ms)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_hrv_7dra
Warning message:
“Removed 6 row(s) containing missing values (geom_path).”

HRV time series analysis

In [36]:
install.packages('astsa')
install.packages('forecast')
library(astsa) # For the time series analysis tools that I used.
library(forecast) # For the forecast plot that I create below.

# There is clear autocorrelation present in all residuals. Let's assess the (P)ACF.
acf2(diff(df_oura$hrv))
Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 


Attaching package: ‘forecast’


The following object is masked from ‘package:astsa’:

    gas


A matrix: 2 × 37 of type dbl
ACF-0.41-0.03 0.00-0.05 0.00-0.01 0.07-0.03-0.03-0.010.12-0.09-0.030.05-0.01-0.03-0.01-0.02 0.050.01
PACF-0.41-0.24-0.15-0.16-0.14-0.13-0.01-0.02-0.06-0.080.10 0.02-0.040.00 0.02-0.04-0.07-0.13-0.040.00
In [37]:
# Looks like (7,1,3) to me, although I'm not completely sure this interpretation is correct. Based on this (https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/time-series/how-to/partial-autocorrelation/interpret-the-results/partial-autocorrelation-function-pacf/) article, you can also interpret the PACF that it just suggests that a moving average term is present. Let's check out what the auto.arima() function that basically does this itself says.
auto.arima(df_oura$hrv)
Series: df_oura$hrv 
ARIMA(1,0,2) with non-zero mean 

Coefficients:
         ar1      ma1      ma2     mean
      0.9094  -0.5644  -0.0758  38.2464
s.e.  0.0294   0.0492   0.0430   1.2981

sigma^2 estimated as 75.48:  log likelihood=-2454.71
AIC=4919.43   AICc=4919.52   BIC=4942.08
In [38]:
# Auto arima suggests (3,1,1). Let's compare the (7,1,1), (7,1,3) and (3,1,1) models and see what comes out myself. 
arima(df_oura$hrv, order = c(7,1,1))
Call:
arima(x = df_oura$hrv, order = c(7, 1, 1))

Coefficients:
         ar1     ar2     ar3     ar4     ar5     ar6     ar7      ma1
      0.3461  0.1424  0.0778  0.0108  0.0453  0.0526  0.0837  -1.0000
s.e.  0.0381  0.0402  0.0405  0.0406  0.0405  0.0402  0.0380   0.0104

sigma^2 estimated as 74.52:  log likelihood = -2450.65,  aic = 4919.3
In [39]:
arima(df_oura$hrv, order = c(7,1,3))
Call:
arima(x = df_oura$hrv, order = c(7, 1, 3))

Coefficients:
         ar1      ar2     ar3     ar4     ar5     ar6     ar7      ma1     ma2
      0.4975  -0.4475  0.2493  0.0799  0.0906  0.0603  0.1106  -1.1496  0.6958
s.e.  0.2558   0.3096  0.1021  0.0618  0.0539  0.0427  0.0480   0.2559  0.4297
          ma3
      -0.5462
s.e.   0.2703

sigma^2 estimated as 74.3:  log likelihood = -2449.73,  aic = 4921.45
In [40]:
arima(df_oura$hrv, order = c(3,1,1))
Call:
arima(x = df_oura$hrv, order = c(3, 1, 1))

Coefficients:
         ar1     ar2     ar3      ma1
      0.3334  0.1375  0.0861  -0.9663
s.e.  0.0463  0.0445  0.0441   0.0253

sigma^2 estimated as 76.22:  log likelihood = -2456.97,  aic = 4923.94
In [41]:
arima_711 <- arima(df_oura$hrv, order = c(7,1,1))
arima_711
Call:
arima(x = df_oura$hrv, order = c(7, 1, 1))

Coefficients:
         ar1     ar2     ar3     ar4     ar5     ar6     ar7      ma1
      0.3461  0.1424  0.0778  0.0108  0.0453  0.0526  0.0837  -1.0000
s.e.  0.0381  0.0402  0.0405  0.0406  0.0405  0.0402  0.0380   0.0104

sigma^2 estimated as 74.52:  log likelihood = -2450.65,  aic = 4919.3
In [42]:
tsdiag(arima_711)
In [43]:
plot(forecast(arima_711))
In [44]:
arima_713 <- arima(df_oura$hrv, order = c(7,1,3))
arima_713 # Note that this model produces NA's!
Call:
arima(x = df_oura$hrv, order = c(7, 1, 3))

Coefficients:
         ar1      ar2     ar3     ar4     ar5     ar6     ar7      ma1     ma2
      0.4975  -0.4475  0.2493  0.0799  0.0906  0.0603  0.1106  -1.1496  0.6958
s.e.  0.2558   0.3096  0.1021  0.0618  0.0539  0.0427  0.0480   0.2559  0.4297
          ma3
      -0.5462
s.e.   0.2703

sigma^2 estimated as 74.3:  log likelihood = -2449.73,  aic = 4921.45
In [45]:
tsdiag(arima_713)
In [46]:
plot(forecast(arima_713))
In [47]:
arima_auto <- auto.arima(df_oura$hrv)
arima_auto
Series: df_oura$hrv 
ARIMA(1,0,2) with non-zero mean 

Coefficients:
         ar1      ma1      ma2     mean
      0.9094  -0.5644  -0.0758  38.2464
s.e.  0.0294   0.0492   0.0430   1.2981

sigma^2 estimated as 75.48:  log likelihood=-2454.71
AIC=4919.43   AICc=4919.52   BIC=4942.08
In [48]:
tsdiag(arima_auto)
In [49]:
plot(forecast(arima_auto))
In [50]:
# (7,1,3) indeed has the lowest AIC, but note that it apparently found problems in the data and could not complete the model in full. Probably need to disregard this. I'll go for the (3,1,1) model that the auto.arima() function suggested.

HRV versus sleep

In [51]:
# Are TST and nocturnal HRV related?
lm_hrv_tst <- lm(hrv ~ tst_hr, df_oura)
summary(lm_hrv_tst)
Call:
lm(formula = hrv ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.129  -7.062  -1.215   6.446  44.298 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  36.6053     2.5409  14.406   <2e-16 ***
tst_hr        0.2498     0.3763   0.664    0.507    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.45 on 684 degrees of freedom
Multiple R-squared:  0.0006438,	Adjusted R-squared:  -0.0008172 
F-statistic: 0.4407 on 1 and 684 DF,  p-value: 0.507
In [52]:
check_model(lm_hrv_tst)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [53]:
# Yes they are, though just 1.7% explained variance. Note that both are measured simultaneously (cross-sectional), so don't interpret this as a causal relationship.

# Are my sleep stages and nocturnal HRV related?
lm_hrv_stages <- lm(hrv ~ awake_min + rem_min + light_min + deep_min, df_oura)
summary(lm_hrv_stages)
Call:
lm(formula = hrv ~ awake_min + rem_min + light_min + deep_min, 
    data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-30.638  -5.911  -0.478   5.363  38.451 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22.628170   2.536519   8.921  < 2e-16 ***
awake_min    0.050508   0.013087   3.859 0.000124 ***
rem_min     -0.035798   0.007367  -4.859 1.46e-06 ***
light_min    0.034752   0.007004   4.962 8.84e-07 ***
deep_min     0.165188   0.014660  11.268  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.296 on 681 degrees of freedom
Multiple R-squared:  0.2119,	Adjusted R-squared:  0.2072 
F-statistic: 45.76 on 4 and 681 DF,  p-value: < 2.2e-16
In [54]:
check_model(lm_hrv_stages)
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [55]:
# Yes they are, 48% explained variance. However; this is likely due to the use of the HRV data in the sleep stages classification algorithm.

Sleep versus physical activity

Do the physical activity variables predict TST?

In [56]:
lm_tst_pa <- lm(lead(tst_hr) ~ vig + mod + lig + sed, df_oura)
summary(lm_tst_pa)
Call:
lm(formula = lead(tst_hr) ~ vig + mod + lig + sed, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5863 -0.7116 -0.0810  0.6995  4.4372 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.920e+00  3.224e-01  21.462  < 2e-16 ***
vig          6.919e-04  6.655e-04   1.040  0.29888    
mod          4.277e-04  2.799e-04   1.528  0.12702    
lig         -2.213e-03  6.755e-04  -3.276  0.00111 ** 
sed         -8.321e-05  4.103e-04  -0.203  0.83935    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.053 on 680 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.01947,	Adjusted R-squared:  0.0137 
F-statistic: 3.375 on 4 and 680 DF,  p-value: 0.009528
In [57]:
check_model(lm_tst_pa)
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [58]:
# Significant multicollinearity, particularly between light PA and sedentary time. Need to disregard this model. I'll assess all relationships univariately.

# Does sedentary time predict TST?
lm_tst_sed <- lm(lead(tst_hr) ~ sed, df_oura)
summary(lm_tst_sed)
Call:
lm(formula = lead(tst_hr) ~ sed, data = df_oura)

Residuals:
   Min     1Q Median     3Q    Max 
-3.509 -0.689 -0.080  0.732  4.325 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.607e+00  2.425e-01  27.245   <2e-16 ***
sed         9.464e-05  3.804e-04   0.249    0.804    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.061 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  9.061e-05,	Adjusted R-squared:  -0.001373 
F-statistic: 0.06189 on 1 and 683 DF,  p-value: 0.8036
In [59]:
check_model(lm_tst_sed)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [60]:
# Does MVPA predict TST?
lm_tst_mvpa <- lm(lead(tst_hr) ~ mvpa, df_oura)
summary(lm_tst_mvpa)
Call:
lm(formula = lead(tst_hr) ~ mvpa, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5520 -0.6920 -0.0812  0.7396  4.4083 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.5810271  0.0815435  80.706   <2e-16 ***
mvpa        0.0002972  0.0002467   1.205    0.229    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.002121,	Adjusted R-squared:  0.00066 
F-statistic: 1.452 on 1 and 683 DF,  p-value: 0.2287
In [61]:
check_model(lm_tst_mvpa)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [62]:
# Does light PA predict TST?
lm_tst_lpa <- lm(lead(tst_hr) ~ lig, df_oura)
summary(lm_tst_lpa)
Call:
lm(formula = lead(tst_hr) ~ lig, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5282 -0.7109 -0.0747  0.7000  4.3262 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.9577974  0.1019238  68.265  < 2e-16 ***
lig         -0.0019692  0.0006326  -3.113  0.00193 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.053 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.01399,	Adjusted R-squared:  0.01254 
F-statistic: 9.689 on 1 and 683 DF,  p-value: 0.001931
In [63]:
check_model(lm_tst_lpa)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [64]:
# Does moderate PA predict TST?
lm_tst_mpa <- lm(lead(tst_hr) ~ mod, df_oura)
summary(lm_tst_mpa)
Call:
lm(formula = lead(tst_hr) ~ mod, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5405 -0.6879 -0.0797  0.7327  4.3834 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.6124632  0.0825907  80.063   <2e-16 ***
mod         0.0001955  0.0002613   0.748    0.455    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.06 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.000819,	Adjusted R-squared:  -0.000644 
F-statistic: 0.5598 on 1 and 683 DF,  p-value: 0.4546
In [65]:
check_model(lm_tst_mpa)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [66]:
# Does vigorous PA predict TST?
lm_tst_vpa <- lm(lead(tst_hr) ~ vig, df_oura)
summary(lm_tst_vpa)
Call:
lm(formula = lead(tst_hr) ~ vig, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5132 -0.6811 -0.0760  0.7439  4.3522 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.6560898  0.0411833 161.621   <2e-16 ***
vig         0.0008936  0.0006655   1.343     0.18    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.059 on 683 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.002633,	Adjusted R-squared:  0.001173 
F-statistic: 1.803 on 1 and 683 DF,  p-value: 0.1798
In [67]:
check_model(lm_tst_vpa)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 685 rows containing missing values (geom_text).”
In [68]:
# Does TST predict MVPA?
lm_mvpa_tst <- lm(mvpa ~ tst_hr, df_oura)
summary(lm_mvpa_tst)
Call:
lm(formula = mvpa ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-293.60 -107.09   -8.94   75.68 1121.85 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  267.376     39.954   6.692 4.58e-11 ***
tst_hr         2.955      5.917   0.499    0.618    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 164.2 on 684 degrees of freedom
Multiple R-squared:  0.0003644,	Adjusted R-squared:  -0.001097 
F-statistic: 0.2493 on 1 and 684 DF,  p-value: 0.6177
In [69]:
check_model(lm_mvpa_tst)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [70]:
# Does TST predict sedentary time?
lm_sed_tst <- lm(sed ~ tst_hr, df_oura)
summary(lm_sed_tst)
Call:
lm(formula = sed ~ tst_hr, data = df_oura)

Residuals:
    Min      1Q  Median      3Q     Max 
-601.58  -62.06    8.85   69.75  292.73 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  770.123     25.355   30.37  < 2e-16 ***
tst_hr       -21.253      3.755   -5.66 2.23e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 104.2 on 684 degrees of freedom
Multiple R-squared:  0.04474,	Adjusted R-squared:  0.04334 
F-statistic: 32.03 on 1 and 684 DF,  p-value: 2.231e-08
In [71]:
check_model(lm_sed_tst)
Not enough model terms in the conditional part of the model to check for multicollinearity.
`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`geom_smooth()` using formula 'y ~ x'

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning message:
“Removed 686 rows containing missing values (geom_text).”
In [72]:
# Are MVPA and sedentary time correlated?
cor.test(df_oura$sed, df_oura$mvpa)
	Pearson's product-moment correlation

data:  df_oura$sed and df_oura$mvpa
t = -8.4985, df = 684, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.3752182 -0.2397329
sample estimates:
       cor 
-0.3090427 

Bonus: standardized charts

In [73]:
# Regular line chart with my TST over time:
chart_tst_hr <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$tst_hr), linetype = 2) + 
  labs(
    title = "Total Sleep Time (TST) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "TST (hours)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_tst_hr
In [74]:
# Same TST over time chart as above, but just using the 7 day rolling average version:
chart_tst_hr_7dra <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr_7dra)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$tst_hr_7dra, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "7-day rolling average of Total Sleep Time (TST) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "TST (hours)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_tst_hr_7dra
Warning message:
“Removed 6 row(s) containing missing values (geom_path).”
In [75]:
# Regular line chart with my HRV over time:
chart_hrv <- ggplot(df_oura, aes(as.Date(summary_date), hrv)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$hrv), linetype = 2) + 
  labs(
    title = "Nocturnal Heart Rate Variability (HRV) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "r-MSSD (ms)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_hrv
In [76]:
# Same HRV over time chart but using the 7 day rolling average variable:
chart_hrv_7dra <- ggplot(df_oura, aes(as.Date(summary_date), hrv_7dra)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$hrv_7dra, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "7-day rolling average of the nocturnal Heart Rate Variability (HRV)  over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "r-MSSD (ms)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_hrv_7dra
Warning message:
“Removed 6 row(s) containing missing values (geom_path).”
In [77]:
# I'd like to make a chart with both TST and HRV in it, and compare them. To do so, one way would be to standardize both variables. This means that you center each observation first (subtract the mean from each observation) and then divide that by the standard deviation of the observations up to that point. In doing so, you're putting both variables on the same scale. The number that you get is essentially the distance in the # of standard deviations from the mean value.

# First a for-loop to create the standardized datapoints:
for (i in 1:nrow(df_oura)) {
  df_oura$tst_hr_std[i] <- (df_oura$tst_hr[i] - mean(df_oura$tst_hr[0:i])) / sd(df_oura$tst_hr[0:i])
  df_oura$tst_hr_7dra_std[i] <- (df_oura$tst_hr_7dra[i] - 
                                mean(df_oura$tst_hr_7dra[0:i], na.rm = TRUE)) / sd(df_oura$tst_hr_7dra[0:i], na.rm = TRUE)
  df_oura$hrv_std[i] <- (df_oura$hrv[i] - mean(df_oura$hrv[0:i])) / sd(df_oura$hrv[0:i])
  df_oura$hrv_7dra_std[i] <- (df_oura$hrv_7dra[i] - 
                                mean(df_oura$hrv_7dra[0:i], na.rm = TRUE)) / sd(df_oura$hrv_7dra[0:i], na.rm = TRUE)
  }
In [78]:
# Chart of the standardized TST over time:
chart_tst_hr_std <- ggplot(df_oura, aes(as.Date(summary_date), tst_hr_std)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$tst_hr_std, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "Standardized Total Sleep Time (TST) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "TST (SD from mean up to that point)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_tst_hr_std
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
In [79]:
# Chart of the standardized HRV over time:
chart_hrv_std <- ggplot(df_oura, aes(as.Date(summary_date), hrv_std)) +
  geom_line() +
  geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "Standardized Heart Rate Variability (HRV; r-MSSD in milliseconds) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "HRV (SD from mean up to that point)"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  theme_light()
chart_hrv_std
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
In [80]:
# Chart that combines the standardized HRV and TST over time:
chart_combined_std <- ggplot(df_oura, aes(as.Date(summary_date))) +
  geom_line(aes(y = tst_hr_std, colour = "TST"), alpha = 0.5) +
  geom_line(aes(y = hrv_std, colour = "HRV"), alpha = 0.5) +
  geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "Standardized Total Sleep Time (TST) and Heart Rate Variability (HRV) over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "SD from mean up to that point"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") +  
  scale_color_discrete(name = "Variable", breaks = c("TST","HRV")) + 
  theme_light()
chart_combined_std
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
Warning message:
“Removed 2 row(s) containing missing values (geom_path).”
In [81]:
# Same as the previous, but now using the standardized 7 day rolling average values:
chart_combined_7dra_std <- ggplot(df_oura, aes(as.Date(summary_date))) +
  geom_line(aes(y = tst_hr_7dra_std, colour = "TST"), alpha = 0.5) +
  geom_line(aes(y = hrv_7dra_std, colour = "HRV"), alpha = 0.5) +
  geom_hline(yintercept = mean(df_oura$hrv_std, na.rm = TRUE), linetype = 2) + 
  labs(
    title = "Standardized 7-day rolling average of TST and HRV over time",
    subtitle = "Measured with OURA ring (v2)",
    x = "Date (MM-'YY)",
    y = "SD from mean up to that point"
  ) + 
  scale_x_date(breaks = "month", date_labels = "%m-'%y") + 
  scale_color_discrete(name = "Variable", breaks = c("TST","HRV")) + 
  theme_light()
chart_combined_7dra_std
Warning message:
“Removed 7 row(s) containing missing values (geom_path).”
Warning message:
“Removed 7 row(s) containing missing values (geom_path).”
In [ ]: