Title of Talk: Learning from High Dimensional Partially Observed Temporal Data
Abstract: We will first show how to efficiently approximate the Markov Blanket that
consists of multiple dependent variables as to find a minimum subset of the most
informative variables for predictive modeling in high dimensional data. Our method,
based on Hilbert-Schmidt criterion in a kernel-induced space, allows removal of
both irrelevant and redundant variables in high dimensional classification and
regression problems. We will then describe how to avoid the data imputation
step when learning from partial observations. For this purpose, we formulate a
convex optimization problem where the objective function is maximization of each
instance’s uncertainty margin in its own relevant subspace. Our method was shown
to outperform the alternatives when there is a large fraction of missing values in
high dimensional data. Finally, we will present an extension of our margin-based
feature selection method to high dimensional temporal data where a fixed-point
gradient descent method is proposed to solve the formulated objective function to
learn the optimal feature weights. The experimental results on temporal microarray
data provide evidence that the proposed method can identify more informative
features than the alternatives that flatten the temporal data.
Presented results are obtained in collaboration with Q. Lou while he was a Ph.D.
student at my lab.
Zoran Obradovic is professor of Computer and Information Sciences and the director of the Center for Data Analytics and Biomedical Informatics at Temple
University in Philadelphia. His data analytics work is published in more than
260 articles and is cited more than 10,000 times (H-index 41 and I10-index 86).
Obradovic is the executive editor at the journal on Statistical Analysis and Data
Mining, which is the official publication of the American Statistical Association (ASA)
and is currently an editorial board member at eleven journals. He is general co-
chair for 2013 and 2014 for SIAM International Conference on Data Mining and was
the program and/or track chair at many data mining and biomedical informatics