Understanding Machine Learning for Spatio-Temporal Analytics in Health Risk Prediction and Mapping – An Information-Theoretic Perspective

About

The vigorous development of artificial intelligence (AI) in recent years has offered a science-grounded approach to addressing challenges in health risk prediction and mapping. Toward this end, machine learning (ML) methods have been proposed, which have shown some promising results in disease transmission modeling, prediction, risk assessment, spatio-temporal analytics, and healthcare resource allocation. However, most of the existing methods share one major common limitation in that they do not adequately tackle the difficult real-world ML issues in health risk prediction and mapping, such as data heterogeneity and incompleteness, spatio-temporal heterogeneity and multi-scale dependencies of disease dynamics, and resource scarcity. As a result, the impact of these methods in the real world remains unsatisfactory. Moreover, given a specific health risk prediction and mapping task and the available data sources, the underlying theoretical ML issues of how to appropriately determine the most appropriate ML model, theoretically analyze the model’s learning behavior, and quantitatively characterize the model’s learning capacity are largely untouched.

Building on our ongoing research in AI and ML, as well as our previous experience in AI/ML-enabled disease control and prevention together with domain experts, this research project aims to make a major step forward by further developing a novel ML framework as well as an information-theoretic approach to understanding, modeling, and analyzing spatio-temporal domain tasks in general, and the public health and epidemiology tasks in particular. Specifically, the project will consist of three key milestones, namely:

We will design and demonstrate a novel ML framework that addresses the issues of how to integrate data from heterogeneous sources, how to capture information at multiple spatio-temporal scales, and how to integrate information from different scales for subsequent learning tasks such as health risk prediction.
We will theoretically formulate and prove that the designed models will be able to adequately learn complex multi-scale spatio-temporal dependencies in risk prediction and mapping. We will develop an information-theoretic approach to examining information-based learning capacities of the proposed models. In so doing, we expect to develop answers to some of the important open questions in ML, e.g., how to determine the appropriate configurations of a designed learning model with respect to a given learning task at hand.
We will validate the designed learning models and the information-theoretic approach by systematically conducting a series of experiments on public health and disease risk modeling, prediction, and mapping, involving both synthetic and real-world datasets. Moreover, we will examine the learning behaviors of the designed models as well as the model configurations derived from the information-theoretic analysis in the real-world context.

This project will enrich the interdisciplinary research in AI, ML, as well as public health and epidemiology. The outcomes of this project will timely serve as both the methodological and practical foundations for health risk prediction and mapping, strengthening the academic and public health collaborations both regionally and internationally, and most importantly, contribute to the well-being of the society.

Required skills

The students should have solid mathematical background, and be familiar with Matlab, C/C++, Java, or Python.

Principal Investigator

Prof. Jiming Liu

Chair Professor, Department of Computer Science

Co-Investigator

Dr. Yang Liu

Assistant Professor, Department of Computer Science