Details for plot-google-location-history.ipynb

Published by gedankenstuecke

Description

Take a Google Location History archive and plot where you've been, how fast you moved and how much data was collected. Looking for contributors that'll extend this notebook to make more cool stuff!

1

Tags & Data Sources

GPS location data location movement google location history

Comments

Please log in to comment.

Notebook
Last updated 2 months, 1 week ago

Analyzing your Google Location History in R

This notebook requires you to use the Google Location History upload to get your GPS data into Open Humans.

This notebook then uses the GPS data to plot your personal movement history on different maps on different scales.

For a start let's load our required packages. To load the JSON we are using rjson, which needs to be installed with install.packages in the first step:

In [1]:
#install.packages('rjson')
library(httr)
#library("rjson")
library(jsonlite)
library(ggplot2)

With that out of the way we can access our Google Location History data from our Open Humans account:

In [2]:
access_token <- Sys.getenv("OH_ACCESS_TOKEN")
url <- paste("https://www.openhumans.org/api/direct-sharing/project/exchange-member/?access_token=",access_token,sep="")
resp <- GET(url)
user <- content(resp, "parsed")

for (data_source in user$data){
    if (data_source$source == "direct-sharing-182"){
         gps_data_url <- data_source$download_url
    }
}

temp <- tempfile()
download.file(gps_data_url,temp,method='wget')
json_data <- fromJSON(txt=temp)

Getting started with the data processing

Now that we have our fitbit data stored in json_data we can start to work with that data. Much of this notebook is adapted code from Shirin Glander's excellent blogpost.

Let's read the location data in to the loc variable and properly parse the timestamps of when we have been at a place and convert the E7 formatted coordinates to regular coordinates:

In [3]:
# extracting the locations dataframe
loc = json_data$locations

# converting time column from posix milliseconds into a readable time scale
loc$time = as.POSIXct(as.numeric(json_data$locations$timestampMs)/1000, origin = "1970-01-01")

# converting longitude and latitude from E7 to GPS coordinates
loc$lat = loc$latitudeE7 / 1e7
loc$lon = loc$longitudeE7 / 1e7

Now we can look at how our data looks like:

In [4]:
tail(loc)
timestampMslatitudeE7longitudeE7accuracyaltitudeverticalAccuracyvelocityheadingtimelatlon
1079991280558133517 526128500 81255040 17 NA NA NA NA 2010-07-31 06:35:3352.61285 8.125504
1080001280558011227 525723000 81104850 17 NA NA NA NA 2010-07-31 06:33:3152.57230 8.110485
1080011280557969000 525613248 81128391 5 NA NA NA NA 2010-07-31 06:32:4952.56132 8.112839
1080021280557967983 525573400 81134790 47 NA NA NA NA 2010-07-31 06:32:4752.55734 8.113479
1080031280557901000 525368945 81131936 500 NA NA NA NA 2010-07-31 06:31:4152.53689 8.113194
1080041280557864967 523251200 80977320 38205 NA NA NA NA 2010-07-31 06:31:0452.32512 8.097732

Analyzing the data

Now that we have collected and formatted the data we can start analyzing it!

How much data does Google have?

Let's start off to see how much data Google has about my location? After some more data processing we want to plot:

  • How many data points does Google have
    • per Day?
    • per Month?
    • per Year?
In [5]:
install.packages('ggmap')
library(lubridate)
library(zoo)

loc$date <- as.Date(loc$time, '%Y/%m/%d')
loc$year <- year(loc$date)
loc$month_year <- as.yearmon(loc$date)

points_p_day <- data.frame(table(loc$date), group = "day")
points_p_month <- data.frame(table(loc$month_year), group = "month")
points_p_year <- data.frame(table(loc$year), group = "year")
# set up plotting theme
library(ggplot2)
library(ggmap)

my_theme <- function(base_size = 12, base_family = "sans"){
  theme_grey(base_size = base_size, base_family = base_family) +
  theme(
    axis.text = element_text(size = 12),
    axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
    axis.title = element_text(size = 14),
    panel.grid.major = element_line(color = "grey"),
    panel.grid.minor = element_blank(),
    panel.background = element_rect(fill = "aliceblue"),
    strip.background = element_rect(fill = "lightgrey", color = "grey", size = 1),
    strip.text = element_text(face = "bold", size = 12, color = "navy"),
    legend.position = "right",
    legend.background = element_blank(),
    panel.margin = unit(.5, "lines"),
    panel.border = element_rect(color = "grey", fill = NA, size = 0.5)
  )
}

points <- rbind(points_p_day[, -1], points_p_month[, -1], points_p_year[, -1])

ggplot(points, aes(x = group, y = Freq)) + 
  geom_point(position = position_jitter(width = 0.2), alpha = 0.3) + 
  geom_boxplot(aes(color = group), size = 1, outlier.colour = NA) + 
  facet_grid(group ~ ., scales = "free") + my_theme() +
  theme(
    legend.position = "none",
    strip.placement = "outside",
    strip.background = element_blank(),
    strip.text = element_blank(),
    axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
  ) +
  labs(
    x = "",
    y = "Number of data points",
    title = "How many data points did Google collect about me?",
    subtitle = "Number of data points per day, month and year"
  )
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

    date


Attaching package: ‘zoo’

The following objects are masked from ‘package:base’:

    as.Date, as.Date.numeric

Warning message:
“`panel.margin` is deprecated. Please use `panel.spacing` property instead”

There's some variation in the amount of data Google recorded for each time unit, but nevertheless it seems to come out at around 50 data points per day on average. This probably also depends on how much we travel, the more we are traveling, the more data can be collected.

How accurate is the data Google stores?

Google gives an estimation on how accurate the data is it recorded about our location. Let's plot how accurate that data is:

In [6]:
accuracy <- data.frame(accuracy = loc$accuracy, group = ifelse(loc$accuracy < 800, "high", ifelse(loc$accuracy < 5000, "middle", "low")))

accuracy$group <- factor(accuracy$group, levels = c("high", "middle", "low"))

ggplot(accuracy, aes(x = accuracy, fill = group)) + 
  geom_histogram() + 
  facet_grid(group ~ ., scales="free") + 
  my_theme() +
  theme(
    legend.position = "none",
    strip.placement = "outside",
    strip.background = element_blank(),
    axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 0.5)
  ) +
  labs(
    x = "Accuracy in metres",
    y = "Count",
    title = "How accurate is the location data?",
    subtitle = "Histogram of accuracy of location points",
    caption = "\nMost data points are pretty accurate, 
but there are still many data points with a high inaccuracy.
    These were probably from areas with bad satellite reception."
  )
Warning message:
“`panel.margin` is deprecated. Please use `panel.spacing` property instead”`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning message:
“Removed 482 rows containing non-finite values (stat_bin).”

Overall the accuracy of the data is pretty good and the vast majority of the data points is in the smallest error category. Then there are some outliers of data being less accurate, but that's only a small amount overall.

Let's make some maps

Travel around the world

We will start by looking at a world map and plot each of the data points that Google has. This gives us a good indication of where we have been and where we might want to zoom in more:

In [7]:
options(warn = -1)
world <- get_map(location=c(-179,-60,179,67), source = "stamen",maptype='toner')
ggmap(world) + 
    geom_point(data = loc, aes(x=lon, y=lat), size=0.8,alpha=0.7,color='red') + 
    #stat_density_2d(geom = "point", data = loc, aes(x=lon, y=lat, size = stat(density)), n = 20, contour = FALSE)
    #geom_density_2d(bins = 300, data = loc, aes(x = lon, y = lat), alpha = 0.5, color='red') + 
  #stat_summary_2d(geom = "tile", bins = 300, data = loc, aes(x = lon, y = lat, z = accuracy), alpha = 0.5) + 
  #scale_fill_gradient(low = "blue", high = "red", guide = guide_legend(title = "Accuracy")) +
 labs(
    x = "Longitude", 
    y = "Latitude", 
    title = "Location history data points around the world")
Map from URL : http://tile.stamen.com/toner/2/0/0.png
Map from URL : http://tile.stamen.com/toner/2/1/0.png
Map from URL : http://tile.stamen.com/toner/2/2/0.png
Map from URL : http://tile.stamen.com/toner/2/3/0.png
Map from URL : http://tile.stamen.com/toner/2/0/1.png
Map from URL : http://tile.stamen.com/toner/2/1/1.png
Map from URL : http://tile.stamen.com/toner/2/2/1.png
Map from URL : http://tile.stamen.com/toner/2/3/1.png
Map from URL : http://tile.stamen.com/toner/2/0/2.png
Map from URL : http://tile.stamen.com/toner/2/1/2.png
Map from URL : http://tile.stamen.com/toner/2/2/2.png
Map from URL : http://tile.stamen.com/toner/2/3/2.png

Zooming in on a place

In my case Central Europe is more or less one huge red blob. Which makes sense given that I have been commuting between Frankfurt, Germany and Zurich, Switzerland for quite some time. So let's zoom in on this part of the world.

To do so we have to define some boundary boxes through latitudes & longitudes. This can be done by the variables below. If you want to zoom in onto another part of the World: Adjust these four variables to map down to the location you are interested in. (Googling: coordinates PLACE_OF_INTEREST is really useful for this)

In [8]:
europe_boundary_west=4
europe_boundary_east=12
europe_boundary_south=45
europe_boundary_north=53.5

With this out of the way we can start the plotting with the code below. And to make it a bit more interesting we are not just plotting the points, but also the velocity we had at the time the data was recorded. That way we can see where we moved fast & slow.

In [9]:
loc_2 <- loc[which(!is.na(loc$velocity)), ]
loc_2 <- subset(loc_2, loc_2$velocity > 0)

europe <- get_map(
        location=c(europe_boundary_west,
                   europe_boundary_south,
                   europe_boundary_east,
                   europe_boundary_north), 
        source = "stamen",
        maptype='toner')
ggmap(europe) + geom_point(data = loc_2, aes(x = lon, y = lat, color = velocity), alpha = 0.1,size=0.5) + 
  theme(legend.position = "right") + 
  labs(x = "Longitude", y = "Latitude", 
       title = "Location history data points in Europe",
       subtitle = "Color scale shows velocity measured for location") +
  scale_colour_gradient(low = "blue", high = "red", guide = guide_legend(title = "Velocity"))
Map from URL : http://tile.stamen.com/toner/7/65/41.png
Map from URL : http://tile.stamen.com/toner/7/66/41.png
Map from URL : http://tile.stamen.com/toner/7/67/41.png
Map from URL : http://tile.stamen.com/toner/7/68/41.png
Map from URL : http://tile.stamen.com/toner/7/65/42.png
Map from URL : http://tile.stamen.com/toner/7/66/42.png
Map from URL : http://tile.stamen.com/toner/7/67/42.png
Map from URL : http://tile.stamen.com/toner/7/68/42.png
Map from URL : http://tile.stamen.com/toner/7/65/43.png
Map from URL : http://tile.stamen.com/toner/7/66/43.png
Map from URL : http://tile.stamen.com/toner/7/67/43.png
Map from URL : http://tile.stamen.com/toner/7/68/43.png
Map from URL : http://tile.stamen.com/toner/7/65/44.png
Map from URL : http://tile.stamen.com/toner/7/66/44.png
Map from URL : http://tile.stamen.com/toner/7/67/44.png
Map from URL : http://tile.stamen.com/toner/7/68/44.png
Map from URL : http://tile.stamen.com/toner/7/65/45.png
Map from URL : http://tile.stamen.com/toner/7/66/45.png
Map from URL : http://tile.stamen.com/toner/7/67/45.png
Map from URL : http://tile.stamen.com/toner/7/68/45.png
Map from URL : http://tile.stamen.com/toner/7/65/46.png
Map from URL : http://tile.stamen.com/toner/7/66/46.png
Map from URL : http://tile.stamen.com/toner/7/67/46.png
Map from URL : http://tile.stamen.com/toner/7/68/46.png

The blue spots (e.g. around Frankfurt in the center of the map and around Zurich in the South) show where I moved rather slowly, i.e. on foot. Between these we can see the more purpleish connections, which are me driving on the highways.

Enhance!

Let's now zoom in some more on my movements in and around Frankfurt, where I lived for a good while. We again define our boundary box through the four cutoff-latitude/longitudes. Adjust these again if you want to zoom in on another place:

In [10]:
frankfurt_boundary_west=8.6
frankfurt_boundary_east=8.8
frankfurt_boundary_south=50.075
frankfurt_boundary_north=50.20

Now we can do the same plot as before but zoomed in at a given place:

In [11]:
loc_2 <- loc[which(!is.na(loc$velocity)), ]
loc_2 <- subset(loc_2, loc_2$velocity > 0)
frankfurt <- get_map(location=c(
    frankfurt_boundary_west,
    frankfurt_boundary_south,
    frankfurt_boundary_east,
    frankfurt_boundary_north), source = "stamen",maptype='toner')
ggmap(frankfurt) + geom_point(data = loc_2, aes(x = lon, y = lat, color = velocity), alpha = 0.3,size=0.7) + 
  theme(legend.position = "right") + 
  labs(x = "Longitude", y = "Latitude", 
       title = "Location history data points in Frankfurt",
       subtitle = "Color scale shows velocity measured for location") +
  scale_colour_gradient(low = "blue", high = "red", guide = guide_legend(title = "Velocity"))
Map from URL : http://tile.stamen.com/toner/12/2145/1385.png
Map from URL : http://tile.stamen.com/toner/12/2146/1385.png
Map from URL : http://tile.stamen.com/toner/12/2147/1385.png
Map from URL : http://tile.stamen.com/toner/12/2148/1385.png
Map from URL : http://tile.stamen.com/toner/12/2145/1386.png
Map from URL : http://tile.stamen.com/toner/12/2146/1386.png
Map from URL : http://tile.stamen.com/toner/12/2147/1386.png
Map from URL : http://tile.stamen.com/toner/12/2148/1386.png
Map from URL : http://tile.stamen.com/toner/12/2145/1387.png
Map from URL : http://tile.stamen.com/toner/12/2146/1387.png
Map from URL : http://tile.stamen.com/toner/12/2147/1387.png
Map from URL : http://tile.stamen.com/toner/12/2148/1387.png