spotify-find-bpm.ipynb
Find songs within a given range of BPM!
This Notebook requires you to have data from the Spotify integration in your Open Humans account.
With the notebook we want to look into
To get started we import some libraries we need and then access your spotify data
Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a dataframe
. The lines below do this:
Let's have a look at which tempo the music we listen to plays:
Now we can start filtering down the music to our own speed requirements. Change the bpm_range
value below to the one you're looking for. The default value is 160 bpm and we will extract all songs within ±2 bpm of this target:
Now we can start downsampling and look at the list of songs:
This Notebook requires you to have data from the Spotify integration in your Open Humans account.
With the notebook we want to look into
To get started we import some libraries we need and then access your spotify data
from ohapi import api
import os
import requests
import json
import pandas as pd
import datetime
member = api.exchange_oauth2_member(os.environ.get('OH_ACCESS_TOKEN'))
for f in member['data']:
if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-listening-archive.json':
sp_songs = requests.get(f['download_url'])
if f['source'] == 'direct-sharing-176' and f['basename'] == 'spotify-track-metadata.json':
sp_meta = requests.get(f['download_url'])
sp_data = json.loads(sp_songs.content)
sp_metadata = json.loads(sp_meta.content)
Now that we got all of your data we want to transform the rather complex Spotify JSON format into something that is easier to read, a simple table - also called a dataframe
. The lines below do this:
track_title = []
artist_name = []
album_name = []
played_at = []
popularity = []
duration_ms = []
explicit = []
track_id = []
tempo = []
['tempo']
for sp in sp_data:
track_title.append(sp['track']['name'])
artist_name.append(sp['track']['artists'][0]['name'])
album_name.append(sp['track']['album']['name'])
played_at.append(sp['played_at'])
popularity.append(sp['track']['popularity'])
duration_ms.append(sp['track']['duration_ms'])
explicit.append(sp['track']['explicit'])
track_id.append(sp['track']['id'])
tempo.append(sp_metadata[sp['track']['id']]['tempo'])
def parse_timestamp(lst):
timestamps = []
for item in lst:
try:
timestamp = datetime.datetime.strptime(
item,
'%Y-%m-%dT%H:%M:%S.%fZ')
except ValueError:
timestamp = datetime.datetime.strptime(
item,
'%Y-%m-%dT%H:%M:%SZ')
timestamps.append(timestamp)
return timestamps
played_at = parse_timestamp(played_at)
dataframe = pd.DataFrame(data={
'track_id': track_id,
'track': track_title,
'artist': artist_name,
'album': album_name,
'popularity': popularity,
'duration_ms': duration_ms,
'explicit': explicit,
'played_at': played_at,
'tempo': tempo
})
dataframe = dataframe.set_index(dataframe['played_at'])
Let's have a look at which tempo the music we listen to plays:
%load_ext rpy2.ipython
%%R -i dataframe -w 4 -h 2 --units in -r 200
library(ggplot2)
ggplot(dataframe,aes(tempo)) +
geom_histogram(binwidth=5) +
scale_x_continuous('tempo') +
theme_minimal() +
geom_vline(xintercept=mean(dataframe$tempo),color='red') + ggtitle('red bar is average')
Now we can start filtering down the music to our own speed requirements. Change the bpm_range
value below to the one you're looking for. The default value is 160 bpm and we will extract all songs within ±2 bpm of this target:
bpm_range = 160
Now we can start downsampling and look at the list of songs:
print('The following songs fall within the range in terms of bpm.\n')
print('The table gives track name, artist and BPM!\n')
k1 = dataframe[(dataframe.tempo >= bpm_range-2) & (dataframe.tempo <= bpm_range+2)]
k1 = k1 = k1[['artist','track', 'tempo']]
k1 = k1.drop_duplicates()
for index,row in k1.iterrows():
print("{}\t{}\t{}".format(row['artist'], row['track'], row['tempo']))