Big Data May Provide Answers to Big Problems

Computers are amazing. The ability to store and access data has been tremendously improved by computers and the internet. Most of the journal articles I read come from online databases. I’m using a computer to write this and then put it on the internet for you to read it. As much as I enjoy computers being fast and small enough to let me talk to you in this way, I think what biologists and geneticists are able to do now is much cooler.

Advances in DNA sequencing have allowed scientists to gather and investigate genes as they never could before. Now we have DNA sequences of humans, mice, bacteria, plants, fungi and all sorts of other things. My DNA is about 99.9% the same as yours but what do the differences in the other 0.1% (about 3 million bases of DNA) mean. Managing these giant data sets becomes a serious task. In the image below is a short sequence of DNA repeated 14 times. There are three lines that have differences in them. Even with this very small example it would take a lot of time to find the changes by eye. Instead of the 75 bases in each line, imagine that you needed to find the few differences hiding in 14 lines 3 billion letters long. If you can sort through all the data to answer a question, there are serious rewards.



23andMe is a company that collects DNA from customers, finds the order of bases in the DNA, and then provides a report back to the customer. This report includes genetic markers for diseases like cystic fibrosis or conditions like lactose intolerance a person may have as well as an ancestry report (they can even detect how much Neanderthal DNA you have). If you have interesting markers, you can volunteer to participate in research by donating your DNA sequence allowing researchers to find out more about genetic causes of certain diseases and conditions. published a paper claiming that certain genes are involved in making people morning people or night owls.

In the paper they found that people with the same genes answered questions the same, but that does not mean that those genes cause that behavior. Big datasets are likely to find such patterns, but lots of work remains studying these genes in depth to determine what they actually do. Studies like the ones published by 23andMe provide other scientists with a list of genes that may be responsible for the disease they are studying. Now that 23andMe has access to these lists of genes, they are partnering with drug companies to create new treatments for common and rare diseases and recently began their own drug discovery group called 23andMe Therapeutics to make use of the massive dataset that they are creating. For example, 23andMe has collected data from people who have been diagnosed with inflammatory bowel disease and is using the data in partnership with the drug company Pfizer to find new genes that might be involved with inflammatory bowel disease. These new genes then become new possibilities for drug targets. This same approach is being taken to find new treatments for lupus and Parkinson’s disease.

The ability to collect and analyze great amounts of data has been great for progress towards curing disease and learning more about the world we live in. Big Data promises that with enough sampling all the hidden patterns of the world will become clearer, but it is important we do not get ahead of ourselves and forget to prove that these associations are more than coincidence.

Bryan Visser
2013-12-04 14.06.58Bryan is a 2nd year graduate student studying DNA replication. He plans on making a career for science advocacy working at a museum or in Washington, DC. In his free time, Bryan enjoys board games and ballroom dancing.

