Multimodal Deep Learning for Medical Data Representations

Led by Sanjay Purushotham, Ph. D.

Sanjay Purushotham is an Assistant Professor in the Department of Information Systems at the University of Maryland Baltimore County (UMBC). Before joining UMBC, he was a Postdoctoral Scholar Research Associate in the Department of Computer Science and Integrated Media Systems Center (IMSC) at the University of Southern California (USC), where he was mentored by Prof. Yan Liu and Prof. Cyrus Shahabi. He obtained his PhD in Electrical Engineering from USC under the supervision of Prof. C.-C. Jay Kuo in the Media Communications Labs (MCL). His research interests are in machine learning, data mining, optimization theory, statistics, computer vision and its applications to healthcare & bio-informatics, social networks and multimedia data.


The increasing collection and availability of digital medical data via Electronic Medical Records (EMR) and wearable sensor devices offers new opportunities to learn rich data-driven representations of health and diseases. Recently, a series of works have been conducted to seek machine learning solutions for learning medical data representations. Deep learning models aka deep neural networks have emerged as the most promising solution since they alleviate the tedious work on feature engineering and extraction, and they achieve state-of-the-art results on many prediction tasks. While existing deep learning solutions are encouraging, there still exists a gap before they can be adopted for practical medical applications. First, the quality of medical data is poor - (almost) all medical datasets are noisy with many missing values. Medical data usually comes from different sources i.e. from multiple modalities. For example, EMR are characterized by multimodal data, such as patients’ medical history, vital signs, diagnoses, medications, monitor readings, lab test results and so on. These data properties make it extremely challenging for the machine learning models (and particularly complex deep models) to discover meaningful representations and to make robust predictions. The data from each modality usually exhibit different types of features, correlation structures, missing rates, and other statistical properties. Thus, it is important to discover the nonlinear relationships across modalities which might be useful for feature learning and for making predictions. In contrast, existing successful applications of deep learning, such as computer vision and natural language processing, mostly involve large-scale homogeneous data. There is limited work on addressing the multi-modality issue in medical datasets, and thus, novel solutions are in great demand. In this project, REU student will be guided to work on a novel deep learning framework termed as Hierarchical Multimodal Deep Learning Models (HMMDL) to learn shared feature representations from multiple modalities. The key intuition is that HMMDL successively builds shared representation layers from multiple modalities in a systematic and hierarchical fashion to learn from limited but complementary information present in each modality. HMMDL will be designed to do feature selection and implicit imputation to handle high-dimensionality and missingness respectively present in medical datasets. The student will be guided to perform experiments on large medical datasets such as MIMIC and to compare the performance of the proposed model with respect to the existing machine learning models.