plot-google-location-history.ipynb
Take a Google Location History archive and plot where you've been, how fast you moved and how much data was collected. Looking for contributors that'll extend this notebook to make more cool stuff!
Please log in to comment.I get an error on line json_data <- fromJSON(txt=temp) :(
This notebook requires you to use the Google Location History upload to get your GPS data into Open Humans.
This notebook then uses the GPS data to plot your personal movement history on different maps on different scales.
For a start let's load our required packages. To load the JSON we are using rjson
, which needs to be installed with install.packages
in the first step:
With that out of the way we can access our Google Location History data from our Open Humans account:
Now that we have our fitbit data stored in json_data
we can start to work with that data. Much of this notebook is adapted code from Shirin Glander's excellent blogpost.
Let's read the location data in to the loc
variable and properly parse the timestamps of when we have been at a place and convert the E7 formatted coordinates to regular coordinates:
Now we can look at how our data looks like:
Now that we have collected and formatted the data we can start analyzing it!
Let's start off to see how much data Google has about my location? After some more data processing we want to plot:
There's some variation in the amount of data Google recorded for each time unit, but nevertheless it seems to come out at around 50 data points per day on average. This probably also depends on how much we travel, the more we are traveling, the more data can be collected.
Google gives an estimation on how accurate the data is it recorded about our location. Let's plot how accurate that data is:
Overall the accuracy of the data is pretty good and the vast majority of the data points is in the smallest error category. Then there are some outliers of data being less accurate, but that's only a small amount overall.
We will start by looking at a world map and plot each of the data points that Google has. This gives us a good indication of where we have been and where we might want to zoom in more:
In my case Central Europe is more or less one huge red blob. Which makes sense given that I have been commuting between Frankfurt, Germany and Zurich, Switzerland for quite some time. So let's zoom in on this part of the world.
To do so we have to define some boundary boxes through latitudes & longitudes. This can be done by the variables below. If you want to zoom in onto another part of the World: Adjust these four variables to map down to the location you are interested in. (Googling: coordinates PLACE_OF_INTEREST
is really useful for this)
With this out of the way we can start the plotting with the code below. And to make it a bit more interesting we are not just plotting the points, but also the velocity we had at the time the data was recorded. That way we can see where we moved fast & slow.
The blue spots (e.g. around Frankfurt in the center of the map and around Zurich in the South) show where I moved rather slowly, i.e. on foot. Between these we can see the more purpleish connections, which are me driving on the highways.
Let's now zoom in some more on my movements in and around Frankfurt, where I lived for a good while. We again define our boundary box through the four cutoff-latitude/longitudes. Adjust these again if you want to zoom in on another place:
Now we can do the same plot as before but zoomed in at a given place:
Here we can see a similar pattern. The more purplish axis from south to north is a highway - one that I'd take on my way to work regularly. The more blueish axis from west to east is one of my regular walking routes that took me from my home to downtown Frankfurt. And the blue hook to the east? One of my regular running/walking routes!
But I haven't only been living in Frankfurt. In 2017 I moved to Berkeley. Let's have a look at the map there. Again, we start with the boundaries:
This looks different. There's much less purple and it's nearly all blue. But why? Oh, right, I didn't drive at all for 90% of the time living in Berkeley, so I had to do everything by walking! And it seems that I have walked all over Berkeley by now!
This notebook requires you to use the Google Location History upload to get your GPS data into Open Humans.
This notebook then uses the GPS data to plot your personal movement history on different maps on different scales.
For a start let's load our required packages. To load the JSON we are using rjson
, which needs to be installed with install.packages
in the first step:
#install.packages('rjson')
library(httr)
#library("rjson")
library(jsonlite)
library(ggplot2)
With that out of the way we can access our Google Location History data from our Open Humans account:
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("https://www.openhumans.org/api/direct-sharing/project/exchange-member/?access_token=",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")
for (data_source in user$data){
if (data_source$source == "direct-sharing-182"){
gps_data_url <- data_source$download_url
}
}
temp <- tempfile()
download.file(gps_data_url,temp,method='wget')
json_data <- fromJSON(txt=temp)
Now that we have our fitbit data stored in json_data
we can start to work with that data. Much of this notebook is adapted code from Shirin Glander's excellent blogpost.
Let's read the location data in to the loc
variable and properly parse the timestamps of when we have been at a place and convert the E7 formatted coordinates to regular coordinates:
# extracting the locations dataframe
loc = json_data$locations
# converting time column from posix milliseconds into a readable time scale
loc$time = as.POSIXct(as.numeric(json_data$locations$timestampMs)/1000, origin = "1970-01-01")
# converting longitude and latitude from E7 to GPS coordinates
loc$lat = loc$latitudeE7 / 1e7
loc$lon = loc$longitudeE7 / 1e7
Now we can look at how our data looks like:
tail(loc)
Now that we have collected and formatted the data we can start analyzing it!
Let's start off to see how much data Google has about my location? After some more data processing we want to plot:
install.packages('ggmap')
library(lubridate)
library(zoo)
loc$date <- as.Date(loc$time, '%Y/%m/%d')
loc$year <- year(loc$date)
loc$month_year <- as.yearmon(loc$date)
points_p_day <- data.frame(table(loc$date), group = "day")
points_p_month <- data.frame(table(loc$month_year), group = "month")
points_p_year <- data.frame(table(loc$year), group = "year")
# set up plotting theme
library(ggplot2)
library(ggmap)
my_theme <- function(base_size = 12, base_family = "sans"){
theme_grey(base_size = base_size, base_family = base_family) +
theme(
axis.text = element_text(size = 12),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
axis.title = element_text(size = 14),
panel.grid.major = element_line(color = "grey"),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "aliceblue"),
strip.background = element_rect(fill = "lightgrey", color = "grey", size = 1),
strip.text = element_text(face = "bold", size = 12, color = "navy"),
legend.position = "right",
legend.background = element_blank(),
panel.margin = unit(.5, "lines"),
panel.border = element_rect(color = "grey", fill = NA, size = 0.5)
)
}
points <- rbind(points_p_day[, -1], points_p_month[, -1], points_p_year[, -1])
ggplot(points, aes(x = group, y = Freq)) +
geom_point(position = position_jitter(width = 0.2), alpha = 0.3) +
geom_boxplot(aes(color = group), size = 1, outlier.colour = NA) +
facet_grid(group ~ ., scales = "free") + my_theme() +
theme(
legend.position = "none",
strip.placement = "outside",
strip.background = element_blank(),
strip.text = element_blank(),
axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
) +
labs(
x = "",
y = "Number of data points",
title = "How many data points did Google collect about me?",
subtitle = "Number of data points per day, month and year"
)
There's some variation in the amount of data Google recorded for each time unit, but nevertheless it seems to come out at around 50 data points per day on average. This probably also depends on how much we travel, the more we are traveling, the more data can be collected.
Google gives an estimation on how accurate the data is it recorded about our location. Let's plot how accurate that data is:
accuracy <- data.frame(accuracy = loc$accuracy, group = ifelse(loc$accuracy < 800, "high", ifelse(loc$accuracy < 5000, "middle", "low")))
accuracy$group <- factor(accuracy$group, levels = c("high", "middle", "low"))
ggplot(accuracy, aes(x = accuracy, fill = group)) +
geom_histogram() +
facet_grid(group ~ ., scales="free") +
my_theme() +
theme(
legend.position = "none",
strip.placement = "outside",
strip.background = element_blank(),
axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
) +
labs(
x = "Accuracy in metres",
y = "Count",
title = "How accurate is the location data?",
subtitle = "Histogram of accuracy of location points",
caption = "\nMost data points are pretty accurate,
but there are still many data points with a high inaccuracy.
These were probably from areas with bad satellite reception."
)
Overall the accuracy of the data is pretty good and the vast majority of the data points is in the smallest error category. Then there are some outliers of data being less accurate, but that's only a small amount overall.
We will start by looking at a world map and plot each of the data points that Google has. This gives us a good indication of where we have been and where we might want to zoom in more:
options(warn = -1)
world <- get_map(location=c(-179,-60,179,67), source = "stamen",maptype='toner')
ggmap(world) +
geom_point(data = loc, aes(x=lon, y=lat), size=0.8,alpha=0.7,color='red') +
#stat_density_2d(geom = "point", data = loc, aes(x=lon, y=lat, size = stat(density)), n = 20, contour = FALSE)
#geom_density_2d(bins = 300, data = loc, aes(x = lon, y = lat), alpha = 0.5, color='red') +
#stat_summary_2d(geom = "tile", bins = 300, data = loc, aes(x = lon, y = lat, z = accuracy), alpha = 0.5) +
#scale_fill_gradient(low = "blue", high = "red", guide = guide_legend(title = "Accuracy")) +
labs(
x = "Longitude",
y = "Latitude",
title = "Location history data points around the world")
In my case Central Europe is more or less one huge red blob. Which makes sense given that I have been commuting between Frankfurt, Germany and Zurich, Switzerland for quite some time. So let's zoom in on this part of the world.
To do so we have to define some boundary boxes through latitudes & longitudes. This can be done by the variables below. If you want to zoom in onto another part of the World: Adjust these four variables to map down to the location you are interested in. (Googling: coordinates PLACE_OF_INTEREST
is really useful for this)
europe_boundary_west=4
europe_boundary_east=12
europe_boundary_south=45
europe_boundary_north=53.5
With this out of the way we can start the plotting with the code below. And to make it a bit more interesting we are not just plotting the points, but also the velocity we had at the time the data was recorded. That way we can see where we moved fast & slow.
loc_2 <- loc[which(!is.na(loc$velocity)), ]
loc_2 <- subset(loc_2, loc_2$velocity > 0)
europe <- get_map(
location=c(europe_boundary_west,
europe_boundary_south,
europe_boundary_east,
europe_boundary_north),
source = "stamen",
maptype='toner')
ggmap(europe) + geom_point(data = loc_2, aes(x = lon, y = lat, color = velocity), alpha = 0.1,size=0.5) +
theme(legend.position = "right") +
labs(x = "Longitude", y = "Latitude",
title = "Location history data points in Europe",
subtitle = "Color scale shows velocity measured for location") +
scale_colour_gradient(low = "blue", high = "red", guide = guide_legend(title = "Velocity"))
The blue spots (e.g. around Frankfurt in the center of the map and around Zurich in the South) show where I moved rather slowly, i.e. on foot. Between these we can see the more purpleish connections, which are me driving on the highways.
Let's now zoom in some more on my movements in and around Frankfurt, where I lived for a good while. We again define our boundary box through the four cutoff-latitude/longitudes. Adjust these again if you want to zoom in on another place:
frankfurt_boundary_west=8.6
frankfurt_boundary_east=8.8
frankfurt_boundary_south=50.075
frankfurt_boundary_north=50.20
Now we can do the same plot as before but zoomed in at a given place:
loc_2 <- loc[which(!is.na(loc$velocity)), ]
loc_2 <- subset(loc_2, loc_2$velocity > 0)
frankfurt <- get_map(location=c(
frankfurt_boundary_west,
frankfurt_boundary_south,
frankfurt_boundary_east,
frankfurt_boundary_north), source = "stamen",maptype='toner')
ggmap(frankfurt) + geom_point(data = loc_2, aes(x = lon, y = lat, color = velocity), alpha = 0.3,size=0.7) +
theme(legend.position = "right") +
labs(x = "Longitude", y = "Latitude",
title = "Location history data points in Frankfurt",
subtitle = "Color scale shows velocity measured for location") +
scale_colour_gradient(low = "blue", high = "red", guide = guide_legend(title = "Velocity"))
Here we can see a similar pattern. The more purplish axis from south to north is a highway - one that I'd take on my way to work regularly. The more blueish axis from west to east is one of my regular walking routes that took me from my home to downtown Frankfurt. And the blue hook to the east? One of my regular running/walking routes!
But I haven't only been living in Frankfurt. In 2017 I moved to Berkeley. Let's have a look at the map there. Again, we start with the boundaries:
berkeley_boundary_west=-122.325
berkeley_boundary_east=-122.23
berkeley_boundary_south=37.84
berkeley_boundary_north=37.89
loc_2 <- loc[which(!is.na(loc$velocity)), ]
loc_2 <- subset(loc_2, loc_2$velocity > 0)
berkeley <- get_map(location=c(berkeley_boundary_west,
berkeley_boundary_south,
berkeley_boundary_east,
berkeley_boundary_north), source = "stamen",maptype='toner')
ggmap(berkeley) + geom_point(data = loc_2, aes(x = lon, y = lat, color = velocity), alpha = 0.3,size=0.7) +
theme(legend.position = "right") +
labs(x = "Longitude", y = "Latitude",
title = "Location history data points in Berkeley",
subtitle = "Color scale shows velocity measured for location") +
scale_colour_gradient(low = "blue", high = "red", guide = guide_legend(title = "Velocity"))
This looks different. There's much less purple and it's nearly all blue. But why? Oh, right, I didn't drive at all for 90% of the time living in Berkeley, so I had to do everything by walking! And it seems that I have walked all over Berkeley by now!