Details for Eye color and 23andMe data.ipynb

Published by madprime

Description

This notebook compares your 23andMe data to public data from openSNP to explore our ability to predict eye color from genetic data. It compares your genetic data at a couple positions to ask: "Do you have the same eye color as people with a similar genotype?"

1

Tags & Data Sources

trait prediction genotyping eye color 23andMe Upload

Other Notebook Versions

Comments

Please log in to comment.

Notebook
Last updated 6 months, 1 week ago

Eye color and 23andMe data

This notebook explores the connection between genetics and eye color in individuals with European ancestry. It compares your genetic data to public data from openSNP to ask this question:

Do people with genotypes like yours have the same eye color as you?

Data you need

This notebook was designed to work with data from 23andMe. If you have 23andMe data, you can add it to Open Humans using this tool: https://www.openhumans.org/activity/23andme-upload/

How it works

This compares your data to people participating openSNP, where people have publicly shared genetic data along with responses to surveys - including one about eye color.

(Do you have an openSNP account? You can use connect it to Open Humans!)

The notebook uses three genetic locations known to be associated with eye color: rs12913832, rs16891982, and rs12203592.

Hit "Run" to start!

Hit the "Run" button above to run each step in the code below. (Or select "Run All" from the "Cell" menu above to run everything at once.) First, we'll get your genetic data stored in Open Humans.

In [1]:
import os
import requests
import tempfile

print("Checking for 23andMe data in Open Humans...\n")

response = requests.get(
    "https://www.openhumans.org/api/direct-sharing/project/exchange-member/"
    "?access_token={}".format(os.environ.get('OH_ACCESS_TOKEN')))
for entry in response.json()['data']:
    if entry['source'] == "direct-sharing-128" and 'vcf' not in entry['metadata']['tags']:
        file_url_23andme = entry['download_url']
        break
        
if 'file_url_23andme' not in locals():
    print("Sorry, you first need to add 23andMe data to Open Humans!\n"
          "You can do that here: https://www.openhumans.org/activity/23andme-upload/")
else:
    print("Great, you have 23andMe data in Open Humans! We'll retrieve this...\n")

file_23andme = tempfile.NamedTemporaryFile()
file_23andme.write(requests.get(file_url_23andme).content)
file_23andme.flush()

print("Done!")
Checking for 23andMe data in Open Humans...

Great, you have 23andMe data in Open Humans! We'll retrieve this...

Done!

Step 2: Find your data at three genetic locations.

Each line of 23andMe data represents your genetic information at a particular location, called a single nucleotide polymorphism (SNP).

This notebook's skin & color prediction method uses data from three locations. These three SNPs have been reported on as associated with eye color, and used by various eye color prediction algorithms in the literature: rs12913832, rs16891982, rs12203592.

Keep hitting "Run" to continue running the notebook. The code below will scan your data and get your genetic information at these locations.

In [2]:
snps = {
    'rs12913832': None,
    'rs16891982': None,
    'rs12203592': None
}

file_23andme.seek(0)
for line in file_23andme:
    line = line.decode('utf-8').strip()
    if line.startswith('#'):
        continue
    line_data = line.split('\t')
    if line_data[0] in snps.keys():
        snps[line_data[0]] = line_data[3]

for snp in snps.keys():
    print('{}:\t{}'.format(snp, snps[snp] if snps[snp] else 'Unknown'))

your_genotype = ('{}'.format(snps['rs12203592']), '{}'.format(snps['rs12913832']), '{}'.format(snps['rs16891982']))
rs12913832:	AG
rs16891982:	GG
rs12203592:	CC

Get data from openSNP to compare against

This is loading some files that were generated using the openSNP API: https://github.com/openSNP/snpr/wiki/JSON-API

(Copies are stored in Google Drive so openSNPs servers don't get queried each time someone wants to run this notebook!)

In [3]:
opensnp_eyecolors = requests.get('https://drive.google.com/uc?export=view&id=1KYeLz0hoSnyv2jHYHiKqLhv2akiIb5xr').json()
opensnp_rs12203592 = requests.get('https://drive.google.com/uc?export=view&id=1opmYjbG_0nVSzw3l0iuFLRVUmZ8LvC80').json()
opensnp_rs12913832 = requests.get('https://drive.google.com/uc?export=view&id=15f9lFEmRsHEFvZskPAzy_l7V3YBDyjeg').json()
opensnp_rs16891982 = requests.get('https://drive.google.com/uc?export=view&id=1yPC4d4hWljODlHWDS9b1M_NTodbslJRl').json()

Next, we need to process this data so we can use compare your genotype to it.

The code below sorts this data to produce a list of eye colors for any given genotype combination.

In [4]:
eyecolor_by_uid = {item['user_id']: item['variation'].lower() for item in opensnp_eyecolors['users']}
rs12203592_by_uid = {item['user']['id']: item['user']['genotypes'][0]['local_genotype'] for item in
                     opensnp_rs12203592 if item['user']['genotypes']}
rs12913832_by_uid = {item['user']['id']: item['user']['genotypes'][0]['local_genotype'] for item in
                     opensnp_rs12913832 if item['user']['genotypes']}
rs16891982_by_uid = {item['user']['id']: item['user']['genotypes'][0]['local_genotype'] for item in
                     opensnp_rs16891982 if item['user']['genotypes']}

joint_uids = [uid for uid in eyecolor_by_uid.keys() if uid in rs12203592_by_uid and
              uid in rs12913832_by_uid and uid in rs16891982_by_uid]

genotypes_to_color = {}
for uid in joint_uids:
    genotype = ('{}'.format(rs12203592_by_uid[uid]),
                '{}'.format(rs12913832_by_uid[uid]),
                '{}'.format(rs16891982_by_uid[uid]))
    if genotype in genotypes_to_color:
        genotypes_to_color[genotype].append(eyecolor_by_uid[uid])
    else:
        genotypes_to_color[genotype] = [eyecolor_by_uid[uid]]
In [5]:
color_counts = {}
for color in genotypes_to_color[your_genotype]:
    if color in color_counts:
        color_counts[color] += 1
    else:
        color_counts[color] = 1

color_counts = sorted(list(color_counts.items()), key=lambda x: x[1], reverse=True)
color_count_sum = sum([item[1] for item in color_counts])
color_count_percentages = [(item[0], item[1]/color_count_sum) for item in color_counts]

print("\nOut of {} people sharing this genotype in openSNP data, they report...\n".format(color_count_sum))
for item in color_count_percentages:
    print('{0:.0f}%\t{1}'.format(item[1]*100, item[0]))
Out of 131 people sharing this genotype in openSNP data, they report...

46%	brown
16%	hazel
13%	brown-green
5%	dark brown
3%	green
2%	green-brown
2%	olive-brown ringing burnt umber-brown
2%	blue-green
2%	indeterminate brown-green with a subtle grey caste
2%	hazel/light brown
2%	hazel (brown/green)
1%	blue-grey
1%	blue-green 
1%	blue-grey; broken amber collarette
1%	brown-(green when external temperature rises)
1%	light-mixed green
1%	rs12913832 ag (they
1%	green-gray
1%	brown - brown and green in bright sunlight
1%	gray-blue

Did it work for you?

It worked for me - I do have brown eyes, and 46% of people matching my genotype report the same. But genetics is complicated! We say "blue eyes" are recessive, but it turns out eye color isn't due to a single gene. :)

According to Wikipedia, researchers have found 10 genes associated with eye color - and these only explain 50% of eye color variation!