Tracking Health Disparities with Big Data


By Makda Zewde, NPHR Editor

We are constantly generating data. Through population surveys, electronic health records, medical imaging, lab tests and diagnoses, we contribute information toward research that might ultimately improve our health. I have already described the meaningful ways in which big data analytics has been improving health care, particularly through diagnostics and targeted therapies. However, health care, and health itself, varies greatly among different populations, and many health issues disproportionately affecting lower socioeconomic and minority racial/ethnic groups are entirely preventable. The National Institute on Minority Health and Disparities (NIMHD) has identified some measures to work towards eliminating health disparities, and the first of these measures is correctly identifying and documenting them. [1] As one might imagine, this is no straightforward task. Big data analytics may once again contribute to public health by helping to identify and measure disparities, though deliberate efforts must be made to ensure the methods used are accurate and fair.


Health Disparities in Cardiovascular Disease and Hypertension. Source: Continuum Health

Health Disparities: Cardiovascular Disease and Diabetes

Health disparities are defined as any difference in health outcomes between populations, although for this column I will focus on race and ethnicity. A particularly well-studied example is outcomes related to heart disease, the leading cause of death in the United States and worldwide. [2] Studies have shown that the burden of disease falls to a greater extent on African Americans, who have two to three times the likelihood of dying from cardiovascular disease compared to Whites. [3] Diabetes is another stark example: African Americans and Latinos are 70% more likely to develop type 2 diabetes compared to non-Hispanic Whites, while Asian Americans are 20% more likely. [4] Many factors are at play here, including genetics, lifestyle, and access to care. There might be more nuanced factors as well, such as mistrust of the medical system. Studies have shown that African Americans express less trust in the medical system, in part due to the country’s medical exploitation of black and brown people in the past. [5] Data analysts working to address such disparities would have to take into account the interactive effects of multiple, often complex, factors contributing to health.  Factors inconsistently or inaccurately measured may also contribute to health, limiting the utility of survey and census data. Therefore, creativity and sophisticated data analytics tools will be required to obtain the most meaningful data about socioeconomic factors affecting disparity populations.


Health Disparities in Diabetes. Source: Continuum Health

Missing Data

Given that improved data collection is key to correctly identifying disparities, careful attention must be paid to the implications of missing data. For instance, reports have shown that Hispanic Americans are less likely to die of old age compared to White Americans, but this data has been brought into question because mortality statistics for Hispanics are less reliable due to age misreporting or undercounts of deaths. [5] Latino immigrants may return home to die in their countries of origin, for instance, and these populations would be excluded from mortality data in the U.S altogether. Missing or incomplete data can easily lead to false conclusions, and in such a context, policymakers might underestimate disparities affecting specific populations.

Missing data related to income and race/ethnicity is another major challenge, particularly when analyzing survey and census records, and even clinical trial data. The analysis of Electronic Health Record (EHR)  data is a promising avenue for identifying disparities, not only because it will likely contain more information on disparity populations, who are usually underrepresented in clinical trials, but because data on race and ethnicity is more readily available. Other key factors, like geography, clinical notes, and insurance information are available in EHRs as well. Not all plans and providers collect race and ethnicity information, however, and incentives or regulations might be required to make this information more widely available. [6]

Because big data analytics can incorporate large and varied datasets, we can anticipate the integration of even more forms of data in the future. In addition to socioeconomic factors available through EHR data, we can extract activity footprints stored in our smartphone apps and smart devices to get insights about lifestyle-related risk factors for disease. A Nature study published earlier this year analyzed smartphone activity data of more than 700,000 people in 111 countries worldwide. The study found that ‘activity inequality’, or inequality in how activity is distributed within countries, was a better predictor of obesity prevalence than average activity volume. It also found that in more walkable cities, activity was greater across age, gender, and body mass index (BMI) groups. [7] Smartphones can provide valuable information about lifestyle differences that contribute to health disparities, and integration of this data will be useful to policymakers involved in public health interventions and even urban planning.

Addressing genetic disparities through pharmacogenomics

The theory that genetics underlie disparities in disease outcomes has been a subject of debate. Some have argued that disparities exist because of socioeconomic factors, and not genetics. But in diseases like breast cancer, genetics have already been shown to explain at least some of the disparities in outcomes between Black and White women. Considering the inadequate enrollment of minority populations in randomized clinical trials, studies focusing on these populations are crucial to understanding disparities in disease outcomes. The Perera Lab at Northwestern University studies pharmacogenomics, or how a person’s genetic makeup influences their response to drugs, with a focus on minority populations. Recently, the lab identified polymorphisms predominant in African Americans that are associated with increased risk of venous thromboembolism, a disease that disproportionately affects African Americans and is the third most common life-threatening cardiovascular condition in the United States. [8] Studies like these are instrumental in achieving precision medicine, an emerging approach in disease prevention and treatment that aims to incorporate individual variations in genes, environment and lifestyle into treatment decisions. [9] With more studies focusing on minority populations, we will hopefully achieve precision medicine for all populations.

Data analytics is not necessarily fair, and must be controlled for bias

A concern with the use of big data analytics for policy decisions is that decisions based on predictive analytics are not necessarily fair. In her book, titled “Weapons of Math Destruction: How Big Data Increases Inequality And Threatens Democracy,” Cathy O’Neal outlines some examples in which the use of algorithms led to harmful policies. (10) One such policy was a crime prediction software called Predpol, adopted by a police department in Reading, Pennsylvania to predict the occurrence of violent crimes using data from “nuisance” crimes prevalent in low-income neighborhoods. Even though the system was designed as “color-blind”, in practice, the model more frequently targeted Black and Hispanic people living in low income neighborhoods. Though these actions were based on data, the police department made specific choices about where to direct their attention. The parameters included in any algorithm are chosen by humans and are thus subject to bias. Predictive algorithms can indeed lead to harmful policies, which are then harder to dispute because of their basis in ‘science.’

The growing capabilities of big data analytics make it a promising tool to measure and perhaps one day eliminate health disparities. With extra focus on developing unbiased and accurate methods to translate big data findings into policy, there is hope for greater health and health care equity for all populations.


  10. O’Neal, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality And Threatens Democracy. Rockville: Serenity, 2016. Print.




About NPHR Blog (335 Articles)
The is the blog of the Northwestern Public Health Review journal. The blog and journal are both student run and contain research articles, opinions, interviews and other content pertaining to public health.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: