Details for mapping-noise-overland-apple-watch.ipynb

Published by gedankenstuecke

Description

This notebook expects environmental noise tracking data from an Apple Watch as an additional data source in addition to GPS data from Overland. It then maps out where in your environment you encounter noise.

0

Tags & Data Sources

apple watch environmental noise mapping noise Overland connection

Comments

Please log in to comment.

Notebook
Last updated 4 years, 4 months ago

Where do I encounter environmental noise?

I saw that the latest Apple Watch hardware/software passively keeps track of environmental noise you encounter. I thought it would be interesting to see where around the city (in my case in Paris) I encounter environmental noise.

Prerequisites for this notebook

This notebook makes use of two data sources:

  1. The Overland connection for Open Humans. It passively tracks your GPS data and stores the data in Open Humans.

  2. The environmental noise data as collected by your Apple Watch. Right now there is no easy way to perform this extraction of data through Open Humans. Instead you will have to manually export the data from your phone, process it locally on your computer and then upload a correctly formatted file. Otherwise this notebook will not be able to run.

A description of how to get your environmental noise data from your Apple Watch can be found further down in this notebook. This notebook itself is written in R to perform the analysis ^ visualization of the data.

Getting started

For a start let's load our required packages. This can take a bit of time, as two packages need to be installed.

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘purrr’

The following object is masked from ‘package:jsonlite’:

    flatten


Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date


Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
Please cite ggmap if you use it! See citation("ggmap") for details.
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following object is masked from ‘package:purrr’:

    transpose

With this out of the way, we can in a first step load our Overland data from Open Humans. As the GPS records are can grow pretty large, each Year-Month will get it's own file. You can select 3 months of data by editing the year-month data in the bit below, to make sure to grab the data you are interested in. In my case I'm getting the data from October to December 2019.

Now we can start downloading the data. In the end this data will be stored in the variable loc

longitudelatitudeactivityaltitudebattery_levelbattery_statedeferreddesired_accuracyhorizontal_accuracymotionpausessignificant_changespeedtimestampvertical_accuracywifivelocitydatelonlat
2.369964 48.88470 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:25Z10 Bbox-31F92D2C -1 2019-09-30T23:58:25Z2.369964 48.88470
2.369953 48.88472 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:31Z10 Bbox-31F92D2C -1 2019-09-30T23:58:31Z2.369953 48.88472
2.369967 48.88470 other 52 0.31 charging 0 100 9 stationary False 0 0 2019-09-30T23:58:35Z 9 Bbox-31F92D2C 0 2019-09-30T23:58:35Z2.369967 48.88470
2.369960 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:54Z10 Bbox-31F92D2C -1 2019-09-30T23:58:54Z2.369960 48.88469
2.369982 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:59Z10 Bbox-31F92D2C -1 2019-09-30T23:58:59Z2.369982 48.88469
2.369866 48.88475 other 47 0.34 charging 0 100 65 stationary False 0 -1 2019-10-01T00:00:39Z10 Bbox-31F92D2C -1 2019-10-01T00:00:39Z2.369866 48.88475

We have columns for our latitude & longitude, along with data on if/how we moved around and the speed. For further analyses we might be interested in a number of things:

  1. Was the given data collected on a weekend or weekday?
  2. At which hour was the data collected?

That way we can plot our maps in a way that tells us when we were at a given space, helping us to better understand the noise measured at that time. The cell below performs this processing:

Now comes one of the most tricky parts of using this notebook: For the visualization to properly work you need to define the boundaries of the map, by giving the correct boundary_ values below. Those are the latitude & longitude values which will define how big/small the map piece is we will see.

There is no easy way at this point to find 'good' boundaries and it will take some fiddeling around with those numbers to get the map you are actually interested in. The values provided by default give a good view of central Paris, but you are likely interested in a different place.

42 tiles needed, this may take a while (try a smaller zoom).
Source : http://tile.stamen.com/toner/14/8296/5633.png
Source : http://tile.stamen.com/toner/14/8297/5633.png
Source : http://tile.stamen.com/toner/14/8298/5633.png
Source : http://tile.stamen.com/toner/14/8299/5633.png
Source : http://tile.stamen.com/toner/14/8300/5633.png
Source : http://tile.stamen.com/toner/14/8301/5633.png
Source : http://tile.stamen.com/toner/14/8296/5634.png
Source : http://tile.stamen.com/toner/14/8297/5634.png
Source : http://tile.stamen.com/toner/14/8298/5634.png
Source : http://tile.stamen.com/toner/14/8299/5634.png
Source : http://tile.stamen.com/toner/14/8300/5634.png
Source : http://tile.stamen.com/toner/14/8301/5634.png
Source : http://tile.stamen.com/toner/14/8296/5635.png
Source : http://tile.stamen.com/toner/14/8297/5635.png
Source : http://tile.stamen.com/toner/14/8298/5635.png
Source : http://tile.stamen.com/toner/14/8299/5635.png
Source : http://tile.stamen.com/toner/14/8300/5635.png
Source : http://tile.stamen.com/toner/14/8301/5635.png
Source : http://tile.stamen.com/toner/14/8296/5636.png
Source : http://tile.stamen.com/toner/14/8297/5636.png
Source : http://tile.stamen.com/toner/14/8298/5636.png
Source : http://tile.stamen.com/toner/14/8299/5636.png
Source : http://tile.stamen.com/toner/14/8300/5636.png
Source : http://tile.stamen.com/toner/14/8301/5636.png
Source : http://tile.stamen.com/toner/14/8296/5637.png
Source : http://tile.stamen.com/toner/14/8297/5637.png
Source : http://tile.stamen.com/toner/14/8298/5637.png
Source : http://tile.stamen.com/toner/14/8299/5637.png
Source : http://tile.stamen.com/toner/14/8300/5637.png
Source : http://tile.stamen.com/toner/14/8301/5637.png
Source : http://tile.stamen.com/toner/14/8296/5638.png
Source : http://tile.stamen.com/toner/14/8297/5638.png
Source : http://tile.stamen.com/toner/14/8298/5638.png
Source : http://tile.stamen.com/toner/14/8299/5638.png
Source : http://tile.stamen.com/toner/14/8300/5638.png
Source : http://tile.stamen.com/toner/14/8301/5638.png
Source : http://tile.stamen.com/toner/14/8296/5639.png
Source : http://tile.stamen.com/toner/14/8297/5639.png
Source : http://tile.stamen.com/toner/14/8298/5639.png
Source : http://tile.stamen.com/toner/14/8299/5639.png
Source : http://tile.stamen.com/toner/14/8300/5639.png
Source : http://tile.stamen.com/toner/14/8301/5639.png

The cell above will download the map according to your boundaries. Run the cell below to see the map and evaluate whether it matches the area you are interested in. Otherwise adjust the boundaries above, run the cell above again and then plot again to see if you're having the right area. Rinse & repeat until you are happy with the map itself.

Loading the noise data

Okay, you are happy with the map. Now it's time to load the Noise data that you got from your Apple Watch. To export the data from your iPhone you have to open the Health app and then click on your user profile image, from there you will get an option to export the data. A more detailed instruction on where to find it can be found here.

Creating this export will take a while, depending on how much data is in your phone. In my case it took between 5-10 minutes. Once this file is created you get a regular iOS sharing option. beware: the export you create will be a Zip file that will potentially be big! My own Zip archive with all health data was 117 MB (and blew up to over 2 GB after the unzipping)!

The best way forward with this data is to Airdrop it to a Mac, if you have one handy. Once that is done you should open your terminal and process the data inside the export:

unzip export.zip
cd apple_health_export
cat export.xml|grep dBASPL|grep -v Headphone|grep Record |sed "s/.*startDate=\"//"|sed "s/\" endDate=\"/,/"|sed "s/\" value=\"/,/"|sed "s/\"\/>//" > environmental_noise.csv

This will unzip the whole Apple Health archive, go into the folder this creates and then process the large XML dump with all data.

It finds all data points for environmental noise and stores those records as a simple CSV file with 3 columns:

  1. Start date/time of recording
  2. End date/time of recording
  3. Noise level in dB

Upload this data to your own notebook server and then you can run the code below to read it:

In addition loading the data, this also identifies the halfway date/time point of each data point (individual recordings can have a total length of around 30 minutes. By calculating the halfway point we just pretend that the dB value was recorded in the middle of it.

Now we can look at our noise data:

startendnoise_leveldiffhalfwaydatetime
2019-09-26 11:39:482019-09-26 12:09:4680.5925 1798 secs 2019-09-26 11:54:472019-09-26 11:54:47
2019-09-26 12:09:462019-09-26 12:39:4665.5902 1800 secs 2019-09-26 12:24:462019-09-26 12:24:46
2019-09-26 12:39:462019-09-26 13:09:4660.8837 1800 secs 2019-09-26 12:54:462019-09-26 12:54:46
2019-09-26 13:09:462019-09-26 13:39:4655.3789 1800 secs 2019-09-26 13:24:462019-09-26 13:24:46
2019-09-26 13:39:462019-09-26 14:09:4163.3795 1795 secs 2019-09-26 13:54:432019-09-26 13:54:43
2019-09-26 14:09:412019-09-26 14:39:3666.8587 1795 secs 2019-09-26 14:24:382019-09-26 14:24:38

Merging the GPS & Noise data

We're close to doing our first map. The only thing we need to do is to join the data. We do this by matching each GPS entry we recorded to the noise recording that was done most closely to the recording of that GPS data point. As the Noise data is most likely much more coarse grained than the GPS data, we will end up assigning the same noise level recording to many GPS points, but that's the best we can do.

Time to map!

For a start let's look at the noise levels in rough categories across town. To this end we bin the individual dB values into different groups

  1. 0-40 dB (really quiet)
  2. 40-70 dB (conversational levels)
  3. 70-85 db (this is close to the boundary of being too loud)
  4. 85+ dB (definitely too loud for longer periods of time)

For each of those categories we create one map, showing where most of those recordings where done:

Warning message:
“Removed 10501 rows containing non-finite values (stat_bin2d).”

(For the interactive notebooks: If you double click on the plot above, you can zoom into it. If you are looking at the graph in the notebook exploratory you can right-click it and press "View Image").

Okay, we can see that there's a few points where we do record very low levels (0-40 dB) of noise (in my case it's mainly two locations). At least one of those frequent points with little noise will most likely be your home, as hopefully you will have only little noise around when you're trying to sleep. Depending on your job, the second location might be your office (for me it is).

Other locations with low noise levels might be more curious. In my own map I do recognize some routes that I probably walked home late at night, when there was very little traffic. But other data points seem weird, especially at busy roads, where I don't remember walking/biking past at night. To better understand those points we will have to look at when the data was recorded.

For the loud enviromental noise recordings a few interesting bits can show up as well. In my case there is a) the same busy road which also was recorded with virtually no noise around the center of the map, but my own home shows up as well. This could be because my apartment is very close to a busy road (real signal) or because I also wear my watch inside the shower, which creates lots of noise for a very short amount of time when the water splashes on it (in a way an artifact during the data collection).

To better understand those different potential influences let's dive into the data a bit more. Instead of looking at the frequency of the data, let us check the noise levels in relation to

  1. when the data was recorded
  2. how i moved aroud when it was recorded.

When being outside the time might make a big difference (e.g. in terms of traffic during rush hour). Also whether I'm walking, cycling or being stationary could affect the data. We break the data down according to the motion type and the noise level. The individual data points will then be colored according to the time of recording (all of these times are UTC, which is close to Paris time, but might be very off for your own time zone).

Warning message:
“Removed 10501 rows containing missing values (geom_point).”

(For the interactive notebooks: If you double click on the plot above, you can zoom into it. If you are looking at the graph in the notebook exploratory you can right-click it and press "View Image").

This should have produced a 3x4 grid, with the horizontal plots still giving the different noise levels and the vertical ones being the different movement types (cycling, being stationary, and walking). This can help us understand some of the effects discussed earlier.

A busy road with very little noise?!

In the top-left (cycling, very little noise) category we can see that the busy road stretch that was registered as being quiet was recorded while I was cycling there during the day. Which makes it unlikely that there was indeed no noise, as there would have been plenty of cars. And I do remember this bike ride! It was a terribly rainy day, which means my watch was put away under layers and layers of clothes! So of course it couldn't properly register any noise!

A noisy home?

When looking at my home and the observed high noise levels it becomes clear that both things of my hypotheses seem to be the case. There are lots of data points collected during the day, when walking around, which most likely where outside just on the busy sidewalk. But there are also lots of stationary data points, nearly all collected in the early morning, when I was stationary. Those are most likely all the ones collected when being in the shower!

Are there differences between movement types and noise?

Let's see, how is the noise level distributed between being stationary, walking and cycling?

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There's comparatively very little data on cycling, but the distributions between walking and being stationary are pretty similar. The main difference being that the stationary one seems slightly shifted to the left. Let's calculate the mean noise level for those two conditions:

Group.1x
61.82774
cycling 57.88643
driving 65.79838
driving stationary62.87044
running 59.32646
stationary 60.04457
walking 63.29208

Yep, there's indeed a small shift. The mean noise for walking is ~63 dB, the one for being stationary is 60! And interestingly, driving is the loudest with 65 db!

Noise throughout the day

Let us also quickly check how the median noise levels change over the day. Again, all time stamps are UTC and not time zone aware, so this only works if you haven't moved across time zones for the data.

It looks like the noise data itself is pretty noisy. Nevertheless, we can see how the noise levels go particularly down between 2am and 5am UTC (that's 3-6am in Paris time), after that they shift back to higher levels, especially around 8am local time, which would be when I'm either in the metro or on the bike to work! Another fun bump is at noon UTC (1pm Paris) time, which should be lunch break time!

Notebook
Last updated 4 years, 4 months ago

Where do I encounter environmental noise?

I saw that the latest Apple Watch hardware/software passively keeps track of environmental noise you encounter. I thought it would be interesting to see where around the city (in my case in Paris) I encounter environmental noise.

Prerequisites for this notebook

This notebook makes use of two data sources:

  1. The Overland connection for Open Humans. It passively tracks your GPS data and stores the data in Open Humans.

  2. The environmental noise data as collected by your Apple Watch. Right now there is no easy way to perform this extraction of data through Open Humans. Instead you will have to manually export the data from your phone, process it locally on your computer and then upload a correctly formatted file. Otherwise this notebook will not be able to run.

A description of how to get your environmental noise data from your Apple Watch can be found further down in this notebook. This notebook itself is written in R to perform the analysis ^ visualization of the data.

Getting started

For a start let's load our required packages. This can take a bit of time, as two packages need to be installed.

In [1]:
library(httr)
library(jsonlite)
library(ggplot2)
library(devtools)
install.packages('ggmap')
library(purrr)
library(lubridate)
library(zoo)
library(ggplot2)
library(ggmap)
install.packages('data.table')
library(data.table)
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘purrr’

The following object is masked from ‘package:jsonlite’:

    flatten


Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date


Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
Please cite ggmap if you use it! See citation("ggmap") for details.
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘data.table’

The following objects are masked from ‘package:lubridate’:

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year

The following object is masked from ‘package:purrr’:

    transpose

With this out of the way, we can in a first step load our Overland data from Open Humans. As the GPS records are can grow pretty large, each Year-Month will get it's own file. You can select 3 months of data by editing the year-month data in the bit below, to make sure to grab the data you are interested in. In my case I'm getting the data from October to December 2019.

In [2]:
month <- '2019-10'
month2 <- '2019-11'
month3 <- '2019-12'

Now we can start downloading the data. In the end this data will be stored in the variable loc

In [3]:
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("https://www.openhumans.org/api/direct-sharing/project/exchange-member/?access_token=",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")
month <- paste('overland-data-',month,sep='')

for (data_source in user$data){
    if (grepl(month, data_source$basename)){
        loc <- read.csv(url(data_source$download_url))
    }
    if (grepl(month2, data_source$basename)){
        loc2 <- read.csv(url(data_source$download_url))
    }
    if (grepl(month3, data_source$basename)){
        loc3 <- read.csv(url(data_source$download_url))
    }

}

loc <- rbind(loc, loc2)
loc <- rbind(loc, loc3)
loc$velocity <- loc$speed
loc$date <- loc$timestamp
loc$lon <- loc$longitude
loc$lat <- loc$latitude

head(loc)
longitudelatitudeactivityaltitudebattery_levelbattery_statedeferreddesired_accuracyhorizontal_accuracymotionpausessignificant_changespeedtimestampvertical_accuracywifivelocitydatelonlat
2.369964 48.88470 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:25Z10 Bbox-31F92D2C -1 2019-09-30T23:58:25Z2.369964 48.88470
2.369953 48.88472 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:31Z10 Bbox-31F92D2C -1 2019-09-30T23:58:31Z2.369953 48.88472
2.369967 48.88470 other 52 0.31 charging 0 100 9 stationary False 0 0 2019-09-30T23:58:35Z 9 Bbox-31F92D2C 0 2019-09-30T23:58:35Z2.369967 48.88470
2.369960 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:54Z10 Bbox-31F92D2C -1 2019-09-30T23:58:54Z2.369960 48.88469
2.369982 48.88469 other 48 0.31 charging 0 100 65 stationary False 0 -1 2019-09-30T23:58:59Z10 Bbox-31F92D2C -1 2019-09-30T23:58:59Z2.369982 48.88469
2.369866 48.88475 other 47 0.34 charging 0 100 65 stationary False 0 -1 2019-10-01T00:00:39Z10 Bbox-31F92D2C -1 2019-10-01T00:00:39Z2.369866 48.88475

We have columns for our latitude & longitude, along with data on if/how we moved around and the speed. For further analyses we might be interested in a number of things:

  1. Was the given data collected on a weekend or weekday?
  2. At which hour was the data collected?

That way we can plot our maps in a way that tells us when we were at a given space, helping us to better understand the noise measured at that time. The cell below performs this processing:

In [4]:
loc$timestamp <- as.POSIXct(loc$timestamp,format="%Y-%m-%dT%H:%M:%SZ")
loc$weekday <- weekdays(loc$timestamp)
loc$weekend <- loc$weekday %in% c('Sunday','Saturday')
loc$weekend <- ifelse(loc$weekend, 'weekend', 'weekday')
loc$hour <- hour(loc$timestamp)

Now comes one of the most tricky parts of using this notebook: For the visualization to properly work you need to define the boundaries of the map, by giving the correct boundary_ values below. Those are the latitude & longitude values which will define how big/small the map piece is we will see.

There is no easy way at this point to find 'good' boundaries and it will take some fiddeling around with those numbers to get the map you are actually interested in. The values provided by default give a good view of central Paris, but you are likely interested in a different place.

In [5]:
boundary_west=2.3
boundary_east=2.40 
boundary_south=48.815 
boundary_north=48.9

my_map <- get_stamenmap(bbox=c(boundary_west,
                               boundary_south,
                               boundary_east,
                               boundary_north),zoom=14,maptype='toner',force=TRUE)
42 tiles needed, this may take a while (try a smaller zoom).
Source : http://tile.stamen.com/toner/14/8296/5633.png
Source : http://tile.stamen.com/toner/14/8297/5633.png
Source : http://tile.stamen.com/toner/14/8298/5633.png
Source : http://tile.stamen.com/toner/14/8299/5633.png
Source : http://tile.stamen.com/toner/14/8300/5633.png
Source : http://tile.stamen.com/toner/14/8301/5633.png
Source : http://tile.stamen.com/toner/14/8296/5634.png
Source : http://tile.stamen.com/toner/14/8297/5634.png
Source : http://tile.stamen.com/toner/14/8298/5634.png
Source : http://tile.stamen.com/toner/14/8299/5634.png
Source : http://tile.stamen.com/toner/14/8300/5634.png
Source : http://tile.stamen.com/toner/14/8301/5634.png
Source : http://tile.stamen.com/toner/14/8296/5635.png
Source : http://tile.stamen.com/toner/14/8297/5635.png
Source : http://tile.stamen.com/toner/14/8298/5635.png
Source : http://tile.stamen.com/toner/14/8299/5635.png
Source : http://tile.stamen.com/toner/14/8300/5635.png
Source : http://tile.stamen.com/toner/14/8301/5635.png
Source : http://tile.stamen.com/toner/14/8296/5636.png
Source : http://tile.stamen.com/toner/14/8297/5636.png
Source : http://tile.stamen.com/toner/14/8298/5636.png
Source : http://tile.stamen.com/toner/14/8299/5636.png
Source : http://tile.stamen.com/toner/14/8300/5636.png
Source : http://tile.stamen.com/toner/14/8301/5636.png
Source : http://tile.stamen.com/toner/14/8296/5637.png
Source : http://tile.stamen.com/toner/14/8297/5637.png
Source : http://tile.stamen.com/toner/14/8298/5637.png
Source : http://tile.stamen.com/toner/14/8299/5637.png
Source : http://tile.stamen.com/toner/14/8300/5637.png
Source : http://tile.stamen.com/toner/14/8301/5637.png
Source : http://tile.stamen.com/toner/14/8296/5638.png
Source : http://tile.stamen.com/toner/14/8297/5638.png
Source : http://tile.stamen.com/toner/14/8298/5638.png
Source : http://tile.stamen.com/toner/14/8299/5638.png
Source : http://tile.stamen.com/toner/14/8300/5638.png
Source : http://tile.stamen.com/toner/14/8301/5638.png
Source : http://tile.stamen.com/toner/14/8296/5639.png
Source : http://tile.stamen.com/toner/14/8297/5639.png
Source : http://tile.stamen.com/toner/14/8298/5639.png
Source : http://tile.stamen.com/toner/14/8299/5639.png
Source : http://tile.stamen.com/toner/14/8300/5639.png
Source : http://tile.stamen.com/toner/14/8301/5639.png

The cell above will download the map according to your boundaries. Run the cell below to see the map and evaluate whether it matches the area you are interested in. Otherwise adjust the boundaries above, run the cell above again and then plot again to see if you're having the right area. Rinse & repeat until you are happy with the map itself.

In [6]:
ggmap(my_map)

Loading the noise data

Okay, you are happy with the map. Now it's time to load the Noise data that you got from your Apple Watch. To export the data from your iPhone you have to open the Health app and then click on your user profile image, from there you will get an option to export the data. A more detailed instruction on where to find it can be found here.

Creating this export will take a while, depending on how much data is in your phone. In my case it took between 5-10 minutes. Once this file is created you get a regular iOS sharing option. beware: the export you create will be a Zip file that will potentially be big! My own Zip archive with all health data was 117 MB (and blew up to over 2 GB after the unzipping)!

The best way forward with this data is to Airdrop it to a Mac, if you have one handy. Once that is done you should open your terminal and process the data inside the export:

unzip export.zip
cd apple_health_export
cat export.xml|grep dBASPL|grep -v Headphone|grep Record |sed "s/.*startDate=\"//"|sed "s/\" endDate=\"/,/"|sed "s/\" value=\"/,/"|sed "s/\"\/>//" > environmental_noise.csv

This will unzip the whole Apple Health archive, go into the folder this creates and then process the large XML dump with all data.

It finds all data points for environmental noise and stores those records as a simple CSV file with 3 columns:

  1. Start date/time of recording
  2. End date/time of recording
  3. Noise level in dB

Upload this data to your own notebook server and then you can run the code below to read it:

In [7]:
noise <- read.csv(file='environmental_noise.csv',head=TRUE)
noise$start <- as.POSIXct(noise$start)
noise$end <- as.POSIXct(noise$end)
noise$diff <- noise$end - noise$start
noise$halfway <- noise$diff / 2
noise$halfway <- noise$start + noise$halfway
noise$datetime <- noise$halfway

In addition loading the data, this also identifies the halfway date/time point of each data point (individual recordings can have a total length of around 30 minutes. By calculating the halfway point we just pretend that the dB value was recorded in the middle of it.

Now we can look at our noise data:

In [8]:
head(noise)
startendnoise_leveldiffhalfwaydatetime
2019-09-26 11:39:482019-09-26 12:09:4680.5925 1798 secs 2019-09-26 11:54:472019-09-26 11:54:47
2019-09-26 12:09:462019-09-26 12:39:4665.5902 1800 secs 2019-09-26 12:24:462019-09-26 12:24:46
2019-09-26 12:39:462019-09-26 13:09:4660.8837 1800 secs 2019-09-26 12:54:462019-09-26 12:54:46
2019-09-26 13:09:462019-09-26 13:39:4655.3789 1800 secs 2019-09-26 13:24:462019-09-26 13:24:46
2019-09-26 13:39:462019-09-26 14:09:4163.3795 1795 secs 2019-09-26 13:54:432019-09-26 13:54:43
2019-09-26 14:09:412019-09-26 14:39:3666.8587 1795 secs 2019-09-26 14:24:382019-09-26 14:24:38

Merging the GPS & Noise data

We're close to doing our first map. The only thing we need to do is to join the data. We do this by matching each GPS entry we recorded to the noise recording that was done most closely to the recording of that GPS data point. As the Noise data is most likely much more coarse grained than the GPS data, we will end up assigning the same noise level recording to many GPS points, but that's the best we can do.

In [9]:
loc$datetime <- loc$timestamp
loc$noise_level <- setDT(noise)[loc, noise_level, roll = "nearest", on = "datetime"]

Time to map!

For a start let's look at the noise levels in rough categories across town. To this end we bin the individual dB values into different groups

  1. 0-40 dB (really quiet)
  2. 40-70 dB (conversational levels)
  3. 70-85 db (this is close to the boundary of being too loud)
  4. 85+ dB (definitely too loud for longer periods of time)

For each of those categories we create one map, showing where most of those recordings where done:

In [10]:
library(repr)
options(repr.plot.width=30, repr.plot.height=7)
noisebreaks <- c(0,40,70,85,120)
noiselabels <- c("0-40 dB",'40-70 dB',"70-85 dB","85+ dB")

setDT(loc)[ , noisegroups := cut(noise_level, 
                                breaks = noisebreaks, 
                                right = FALSE, 
                                labels = noiselabels)]
options(repr.plot.width=30, repr.plot.height=20)
ggmap(my_map) + 
    geom_bin2d(data = subset(loc, 
                                    loc$motion %in% c('stationary', 'walking', 'cycling')), 
                    aes(x = lon, 
                        y = lat,
                       fill = (..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..]),bins=60, alpha=0.7) + 
                
  theme(legend.position = "right") + theme_minimal(base_size=25) + facet_grid(. ~ noisegroups) + scale_fill_continuous('Frequency')
Warning message:
“Removed 10501 rows containing non-finite values (stat_bin2d).”

(For the interactive notebooks: If you double click on the plot above, you can zoom into it. If you are looking at the graph in the notebook exploratory you can right-click it and press "View Image").

Okay, we can see that there's a few points where we do record very low levels (0-40 dB) of noise (in my case it's mainly two locations). At least one of those frequent points with little noise will most likely be your home, as hopefully you will have only little noise around when you're trying to sleep. Depending on your job, the second location might be your office (for me it is).

Other locations with low noise levels might be more curious. In my own map I do recognize some routes that I probably walked home late at night, when there was very little traffic. But other data points seem weird, especially at busy roads, where I don't remember walking/biking past at night. To better understand those points we will have to look at when the data was recorded.

For the loud enviromental noise recordings a few interesting bits can show up as well. In my case there is a) the same busy road which also was recorded with virtually no noise around the center of the map, but my own home shows up as well. This could be because my apartment is very close to a busy road (real signal) or because I also wear my watch inside the shower, which creates lots of noise for a very short amount of time when the water splashes on it (in a way an artifact during the data collection).

To better understand those different potential influences let's dive into the data a bit more. Instead of looking at the frequency of the data, let us check the noise levels in relation to

  1. when the data was recorded
  2. how i moved aroud when it was recorded.

When being outside the time might make a big difference (e.g. in terms of traffic during rush hour). Also whether I'm walking, cycling or being stationary could affect the data. We break the data down according to the motion type and the noise level. The individual data points will then be colored according to the time of recording (all of these times are UTC, which is close to Paris time, but might be very off for your own time zone).

In [11]:
options(repr.plot.width=30, repr.plot.height=20)
ggmap(my_map) + 
    geom_point(data = subset(loc, 
                             loc$motion %in% c('stationary', 'walking', 'cycling')), 
                    aes(x = lon, 
                        y = lat,
                       color=hour), alpha=0.3, size=0.5) + scale_color_gradientn(
        colors=c('darkblue',
                 'darkblue',
                 'darkblue',
                 'darkblue',
                 'darkblue',
                 'orange',
                 'orange',
                 'orange',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'yellow',
                 'violet',
                 'violet',
                 'violet',
                 'lightblue',
                 'lightblue',
                 'lightblue',
                 'lightblue')) +
  theme(legend.position = "right") + theme_minimal(base_size=25) + facet_grid(motion ~ noisegroups)
Warning message:
“Removed 10501 rows containing missing values (geom_point).”

(For the interactive notebooks: If you double click on the plot above, you can zoom into it. If you are looking at the graph in the notebook exploratory you can right-click it and press "View Image").

This should have produced a 3x4 grid, with the horizontal plots still giving the different noise levels and the vertical ones being the different movement types (cycling, being stationary, and walking). This can help us understand some of the effects discussed earlier.

A busy road with very little noise?!

In the top-left (cycling, very little noise) category we can see that the busy road stretch that was registered as being quiet was recorded while I was cycling there during the day. Which makes it unlikely that there was indeed no noise, as there would have been plenty of cars. And I do remember this bike ride! It was a terribly rainy day, which means my watch was put away under layers and layers of clothes! So of course it couldn't properly register any noise!

A noisy home?

When looking at my home and the observed high noise levels it becomes clear that both things of my hypotheses seem to be the case. There are lots of data points collected during the day, when walking around, which most likely where outside just on the busy sidewalk. But there are also lots of stationary data points, nearly all collected in the early morning, when I was stationary. Those are most likely all the ones collected when being in the shower!

Are there differences between movement types and noise?

Let's see, how is the noise level distributed between being stationary, walking and cycling?

In [12]:
ggplot(data = subset(loc,loc$motion %in% c('stationary', 'walking', 'cycling') & loc$timestamp > as.POSIXct('2019-09-30 00:00')), aes(noise_level,fill=motion)) + 
    geom_histogram() + scale_x_continuous('environmental noise in dB') +
    theme_minimal(base_size = 40) + facet_grid(. ~ motion)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

There's comparatively very little data on cycling, but the distributions between walking and being stationary are pretty similar. The main difference being that the stationary one seems slightly shifted to the left. Let's calculate the mean noise level for those two conditions:

In [13]:
aggregate(loc$noise_level, by=list(loc$motion),FUN=mean)
Group.1x
61.82774
cycling 57.88643
driving 65.79838
driving stationary62.87044
running 59.32646
stationary 60.04457
walking 63.29208

Yep, there's indeed a small shift. The mean noise for walking is ~63 dB, the one for being stationary is 60! And interestingly, driving is the loudest with 65 db!

Noise throughout the day

Let us also quickly check how the median noise levels change over the day. Again, all time stamps are UTC and not time zone aware, so this only works if you haven't moved across time zones for the data.

In [14]:
ggplot(data = subset(loc,loc$motion %in% c('stationary', 'walking', 'cycling') & loc$timestamp > as.POSIXct('2019-10-01 00:00')), 
        aes(y=noise_level,x=hour,group=hour)) + geom_boxplot(notch=TRUE) + theme_minimal(base_size = 40) + scale_y_continuous('environmental noise in dB')

It looks like the noise data itself is pretty noisy. Nevertheless, we can see how the noise levels go particularly down between 2am and 5am UTC (that's 3-6am in Paris time), after that they shift back to higher levels, especially around 8am local time, which would be when I'm either in the metro or on the bike to work! Another fun bump is at noon UTC (1pm Paris) time, which should be lunch break time!

In [ ]: