Big Data and Health: A Primer


By Makda Zewde, NPHR Editor

You’ve probably noticed some hype surrounding the term “big data” in recent years. The concept is relatively new in human history; an often-quoted 2013 ScienceDaily article states that as much as 90% of the world’s data was generated over the last two years [1]. Part of this novelty is thanks to new data sources — think apps, sensors, social media, smartphones, and other smart devices. Increasingly, our activities and interactions are being recorded and stored as data points. Big data analytics uses sophisticated tools to glean new insights from our recorded lives, and these insights are helping companies, hospitals, and government agencies make better decisions.

Big data analytics is the process of examining large and varied data sets — i.e., big data — to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more-informed business decisions [2].

Target & Targeted Ads

Though big data analytics is fairly new, we’ve been experiencing its impact on advertising for quite a while now. You might have heard the 2012 story about Target’s accurate prediction of a teenage girl’s pregnancy. Target’s statistical team would analyze specific purchasing patterns, such as how often women were purchasing unscented lotions or vitamin supplements, to assign each customer a “pregnancy prediction” score. This eventually made news when the company sent a teenage girl coupons for baby items well before even her own father knew she was pregnant [3]. In marketing, big data analytics is used to understand customers’ behaviors and preferences — or pregnancy status, in this case — so that ads can be promoted specifically to their target demographics.

Personalized Medicine

In medicine, big data analytics is solving different kinds of problems. Suppose you are diagnosed with cancer, and your doctor is evaluating treatment options. It is likely that the decisions your doctor makes are based not only on scientific and medical knowledge, but on another source of big data: your genome. Precision medicine, in a similar vein with targeted marketing, allows doctors to analyze mutation patterns in your DNA, and then select treatment options that are most likely to benefit you. A whole subfield of pharmacology, called pharmacogenomics, deals with how a person’s genetic makeup influences their response to drugs [4].

“Smart” Mammograms

Another application of data in medicine is medical imaging. Breast cancer is the most common solid cancer in the United States, with an estimated 252,710 new cases per year [5]. Although mammograms are currently the best method of detecting breast cancer, up to half of mammogram screenings result in false positives [6], leading patients to undergo unnecessarily painful and expensive breast biopsies for further examination. Machine learning, a branch of computer science, is helping to solve this problem. Rather than being programmed with a static set of instructions, machine learning enables software to make predictions based on data it collects over time — essentially, it learns from its mistakes.  Now, radiologists can turn to software that has been “fed” thousands of mammography data and clinical reports, to augment their decision-making [7].

Data Ethics

Big data is an exciting new source of innovation that is quickly transforming the world’s industries. Even with its immense potential for impact, there will be ethical challenges to address. Questions of privacy and consent will continue to arise when we consider the fact that our devices are constantly generating data [9]. If machines will one day be responsible for diagnosing patients, we will also have to think about who will be liable for their mistakes [8]. And finally, there will remain the problem of bias. While research studies are often carefully planned to mitigate bias, predictions made from inconsistent data might further existing racial and ethnic disparities in disease outcomes [10].

Over the next few months, this column will explore the big data revolution and its impact on all areas of health, from personalized therapeutics and medical imaging, to smartphone apps and beyond.


1. “Big Data, for better or worse: 90% of world’s data generated over last two years.” ScienceDaily. SINTEF.

2. “What is Big Data Analytics?” Techtarget.

3. “How Target Figured Out a Teen Girl was Pregnant Before Her Father Did.” Forbes.

4. “What is Pharmacogenomics?” NIH.

5. “Cancer Stat Facts: Female Breast Cancer.” NIH.

6. “Artificial Intelligence Is Helping Doctors Find Breast Cancer Risk 30 Times Faster.” Forbes.

7. “Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods.” Cancer.

8. “AI vs. MD.” The New Yorker.

9. “The DeepMind debacle demands dialogue on data.” Nature.

10. “Machine Learning and Prediction in Medicine — Beyond the Peak of Inflated Expectations.” The New England Journal of Medicine.

11. “What is a Solid Tumor?” St. Jude Children’s Hospital.



About NPHR Blog (339 Articles)
The is the blog of the Northwestern Public Health Review journal. The blog and journal are both student run and contain research articles, opinions, interviews and other content pertaining to public health.

1 Comment on Big Data and Health: A Primer

  1. testing is very important in order to achieve a good decision

1 Trackback / Pingback

  1. Tracking Health Disparities with Big Data – NPHR Blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: