Problem Statement: Public Health Data Visualization

Technological changes have influenced the amount and the kinds of data that are being collected by public health agencies to monitor and control the spread of communicable diseases. Whereas earlier public health investigations collected limited amounts of data, in the order of tens to hundreds of patients, and very specific data types (usually demographic and exposure data), today?s datasets can constitute millions of individuals and include, but are not limited to, treatment and outcomes, patient demographics, genomic data from the patient or pathogen, geographic contextual data, and contact network data. Theoretically, this larger amount of heterogeneous data can support informed decision making by providing a more complete picture of a population?s general health and wellbeing in addition to the success of prior public health interventions. Yet, these data can be difficult to use by public health decision makers, which include clinicians, nurses, researchers, epidemiologists with various specializations, policy makers, and even government officials, precisely because these data are large and heterogeneous, and often require a computer scientist or statistician to analyze and communicate the findings. The result is that heterogeneous data are infrequently integrated and the rich knowledge within is largely unexplored. Public health decisions makers need better tools to help them access insights within their data.

Research Statement

I propose that the visualization of public health data can support decision making for communicable diseases, from high-level policy decisions to lower-level decisions surrounding disease outbreak investigations. Through a series of connected research projects, I aim to demonstrate to what extent and in what ways data visualization can be integrated into public health decision-making around these large and heterogeneous public health datasets. The tangible outcomes of these projects will include methodological and technical contributions, software, and community resources that will be publicly disseminated.

Application Context

To ground my work in practical challenges, I am collaborating with tuberculosis (TB) controllers, individuals that monitor and manage the spread of TB, at the British Columbia Centre for Disease Control (BCCDC). In undertaking this collaboration, I do not set out to construct a fail-safe healthcare application; rather, I set out to collaboratively explore how visualization of data could support decision making.

Tuberculosis (TB) was the deadliest communicable disease caused by a single infectious agent (Mycobacterium tuberculosis). In 2015 alone, there were 10.4 million new cases of symptomatic TB and 1.8 million deaths, and as much as one-third of the world’s population is thought to be infected with a latent, asymptomatic, form of the disease. TB is an ancient human disease, managed by both early and modern public systems, it is currently curable but risks become resistant to antibiotics in the future. The effort to end TB worldwide has motivated a push toward a digital global strategy to share and integrate data, and to facilitate evidence-based (informed by data and prior research) decision making. It has become clear that the best way to defeat this ancient disease is by combining traditional epidemiologic approaches inculcated in public health practice with modern approaches from computer science and infovis.

Research Projects

Overall, my research is intended to bridge computer science (infovis) and public health by translating and refining methodologies between these two disciplines through the development of a framework for the design and critical appraisal of public health data visualizations. More specifically, I intend to integrate infovis theory and methodology about visualization design and analysis with public health tasks, data, and regulatory constraints. To this end, I will conduct a series of research projects to produce both methodological frameworks and data visualization software and techniques that demonstrates these academic contributions in action.

Broadly, the products of several of my proposed research projects are meant to be operationalized toward the development of data visualization software for public health for communicable disease prevention and control, which I have called EpiCOGS (a portmanteau of Epidemiology Contact Network, Observations, Genomics, data Synthesis).

GEviT Project

The purpose of GEviT (Genomic Epidemiology Visualization Typology) project is to identify and analyze visualization design patterns that are currently used in genomic epidemiology reports, research papers, and software. To my knowledge, a comprehensive characterization of genomic epidemiology visualizations has not yet been completed, but my observations are that it would be useful resource for the public health research community who primarily develop visualization on an ad hoc basis and are unaware of design alternatives.

GEviT was first presented at the Applied Bioinformatics in Public Health Microbiology Conference in 2017.

You can peruse the following content about GEviT:

Synthetic, Simulated, and Sensitive Data (S3) Project

The S3 project seeks to address this knowledge gap by borrowing techniques from other disciplines, including statistics and privacy-by-design research in human computer interaction, and will assess how these techniques can be applied to the design and evaluation of data visualizations.

Already I have conducted a preliminary investigation ) that demonstrates the value of synthetic and simulated data in the my prototype development process. The preliminary research resulted in the publication of a workshop paper at the IEEE Vis 2016 Conference, entitled “On Regulatory and Organizational Constraints in Visualization Design and Evaluation”.

You can peruse the following content about the (S3) project:


Overall, my doctoral research is both trying to create a theoretical framework for building data visualizations in public health contexts and also seeks to demonstrate that framework in action. The previously mentioned research projects will provide the theoretical foundations to support the development of a data visualization software tool, which I have called EpiCOGS. The EpiCOGS tool will be developed in collaboration with TB controllers at the BCCDC and will visualize an existing TB retrospective cohort.

Some preliminary work has already begun on EpiCOGS to test the functionality of the R programming language, in which it is implemented, and gauge the interest and receptiveness of TB controllers to data visualizations in general. Importantly, the early challenges that arose during the initial EpiCOGS project helped me to develop

You peruse the following content about EpiCOGS:

The Big Picture