Designed as a forum on policy setting for the Data Science initiative at Temple University, this University-wide symposium on Data Science and data analytics aims to develop the research agenda for data science initiatives by bringing together academic researchers, industry partners, and program directors from funding agencies to discuss priorities on how methodological advances in data science can inform various applications in practice. The symposium seeks to strengthen both internal collaboration and external partnerships, improve existing data science infrastructure throughout the university, and identify specific research opportunities and directions that can inform industry practice. The overarching goal is to create a leading national institute of excellence in data science by:
  • Defining how Temple can lead, in both research and applications in industry, government, and society
  • Integrating capabilities that exist across the university in various schools/departments to inform practice
  • Aligning current research to current trends and practical opportunities in industry, government, and society
  • Disseminate important insights about the state of data science, new methodological advances, and novel applications


Temple University Morgan Hall 27th Floor 1601 N. Broad St. Philadelphia, PA 19122 (Entrance on street level – corner of Broad St. and Cecil B. Moore Ave.) Location Map

Schedule of Events

11:30 amDoors Open
12:00 – 1:00 pmLunch and Keynote Presentation
1:00 – 1:15 pmBreak
1:15 – 2:15 pmInternal Presentations
2:15 – 2:30 pmBreak
2:30 – 3:15 pmNSF Presentation
3:15 - 3:30 pmBreak
3:30 - 5:00 pmIndustry Panel
5:00 - 6:30 pmPoster Session

Keynote Speaker

disney1Mark Shafer
"Facilitating an Analytics Transformation: The Disney Story"
We’re in a global analytics arms race, where yesterday’s strategic advantage can quickly become tomorrow’s industry standard.  To stay competitive, companies must continue to invest and evolve at an ever increasing rate.In this keynote session, Disney Sr. Vice President of Revenue and Profit Management Mark Shafer will discuss his 30-year rags to riches analytical journey, including lessons learned from being on the receiving end of analytics at People Express Airlines to building a science-based analytical team at The Walt Disney Company.During his 20 years at Disney, Mark led an analytical transformation, starting by implementing Walt Disney World's first resort revenue management model to currently leading an Internal consulting team of more than 150 + employees responsible for supporting analytics across The Walt Disney Company, including Parks and Resorts, Media Networks (ABC, ESPN, Disney Channel, A&E Networks etc.), Studio Entertainment (The Walt Disney Studios, Disney Theatrical). Leave with deep insights and practical advice on how to steer a successful analytics journey at your company.

Temple Faculty Presentations

180px-slkpSergei L Kosakovsky Pond
"Near real time tracking of HIV-1 epidemic using high throughput molecular sequence analysis”
Following formal undergraduate training in computer science (at Kiev State University, Ukraine), I received a PhD from the interdisciplinary program in Applied Mathematics at the University of Arizona. My theoretical graduate research into statistical methodology for evolutionary analyses of coding sequence alignments found an application in an HIV research group at UCSD, which I joined as a postdoctoral fellow in 2003. Until 2016, I was an associate professor in the Divisions of Infectious Diseases and Biomedical Informatics in the UCSD Department of Medicine. In addition, I am the director the Bioinformatics and Information Technologies Core at the UCSD Center of AIDS Research. In 2016, I joined the Institute for Genomics and Evolutionary Medicine at Temple University. My research interest include developing models and computational approaches for comparative analysis of sequence data, especially large and rich data set from measurably evolving pathogens, such as HIV-1, Influenza A virus and Hepatitis C virus. My group has published a number of methodological and applied papers applying evolutionary algorithms and machine learning techniques to complex problems in sequence evolution, especially in the context of HIV population history, adaptation to new hosts, transmission, immune escape, and the development of drug resistance. My current research interests can be loosely partitioned as follows.For more details click here
 2Zoran Obradovic
"Fusion of Qualitative Knowledge and Big Data for Predictive Analytics in Healthcare"
Abstract:  An overview of our ongoing projects aimed to facilitate predictive analytics in healthcare will be presented in this talk. Challenges and the proposed solutions will be discussed related to structured regression on multilayer networks, joint modeling of positive and negative influences, uncertainty propagation and effective integration of domain knowledge and big data. The algorithms will be evaluated in the context of applications related to estimating hospitalization cost, predicting admission and mortality rate for high impact diseases at a large number of hospitals, identifying disease relationships and discovering gene-disease interactions.Biography: Zoran Obradovic an Academician at the Academia Europaea (the Academy of Europe) and a Foreign Academician at the Serbian Academy of Sciences and Arts. He is a L.H. Carnell Professor of Data Analytics at Temple University, Professor in the Department of Computer and Information Sciences with a secondary appointment in Department of Statistical Science, and is the Director of the Center for Data Analytics and Biomedical Informatics. His research interests include data science and complex networks applications in health management and other complex decision support systems. Zoran is the executive editor at the journal on Statistical Analysis and Data Mining, which is the official publication of the American Statistical Association and is an editorial board member at eleven journals. His work is published in more than 330 articles and is cited more than 17,000 times (H-index 50). His research is currently funded by the DARPA THOR, DARPA GRAPHS, DARPA DLT, ONR Data Science, NSF Big Data and NSF IIS programs. His most recent project started in June 2016 and highlighted at the EurekAlert, the Global Source for Science News is aimed to understand why some host organisms are tolerant to pathogenic infection, and to uncover which biological mechanisms are responsible for their resilience. This $9.9M multi-institutional, DARPA-funded effort includes investigators from Harvard Medical School, the Mayo Clinic, Temple University, Tufts University, and Boston Children's Hospital.For more details click here
cheng-yong-tang-199x300Cheng Yong Tang
"Methods for Analyzing Large and Complex Data"
Abstract: In this talk, I will briefly overview a few topics on current development of statistical methods for analyzing modern large and complex data sets. These include various aspects of the development of statistical methods for analyzing high-dimensional data where the number of parameters in the regression model can be much larger than the observations. Another topic is on methods for estimating large covariance matrices incorporating structural information.Biography:  Dr. Cheng Yong Tang joined the Department of Statistical Science of the Fox School as an Associate Professor in 2014. He received his PhD in Statistics from the Iowa State University in 2008. Dr. Tang previously taught at the National University of Singapore, as a tenured professor in the Department of Statistics and Applied Probability, and at the University of Colorado Denver as an Assistant Professor of Business Analytics. Dr. Tang has published more than twenty research articles with more than ten papers appeared in top statistical journals including the Annals of Statistics, Biometrika, Journal of the American Statistical Association, and Journal of the Royal Statistical Society, Series B. His research interests include longitudinal data analysis, high-dimensional data analysis, nonparametric statistical methods, empirical likelihood, financial data analysis, survey data and missing data analysis. He has been the PI of two NSF grants, one on methods for longitudinal data analysis supported by the Division of Social and Economics Sciences, and the other on ensemble learning methods with random projections supported by the BIGDATA program. Dr. Tang is the recipient of multiple honors, including the Dean’s Research Honor Roll of the Fox School of Business, National University of Singapore’s Young Scientist Award and Teaching Excellence Award, the IMS Laha Award, as well as Iowa State University’s Research Excellence and Teaching Excellence Awards.
jr-sunil-wattal_05-200x300Sunil Wattal
"Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational genetics approach"
Abstract:  We are developing a method to predict the emergence of system-level behaviors by analyzing large volumes of digital trace data using evolutionary social ontology to build a multi-level model of complex socio-technical systems. We use analytical techniques developed in evolutionary biology and systems biology: (1) to characterize a stream of digital trace data from a complex socio-technical system with finite genetic elements; (2) to predict the behavior of socio-technical systems based on the pattern of “behavioral gene” interactions; and (3) to explore the impact of mutational input, gene flow, and recombination in “behavioral genes” on the evolution of socio-technical systems. We test our model in GitHub, one of the largest open source communities that includes over 5 million open source software development projects and Twitter, one of the largest social media site, that has over 500 million messages per day. We believe our model can be used for other types of massive digital trace data including sensor data from Internet of the Things and mobile user data. Our work will also allow scholars in different field to study the emergence of complex systems behavior through the interaction of low-level events.Bio: Sunil Wattal is an Associate Professor of Management Information Systems at the Fox School of Business, Temple University. Dr. Wattal’s expertise focuses on economics of information systems, privacy and security, social media and crowd-funded marketplaces. His work has been published in top academic journals such as MIS Quarterly, Information Systems Research, Management Science, Journal of Management information Systems, and IEEE Transactions on Software Engineering. He currently serves as an AE at MIS Quarterly, and a special AE at ISR. His work has received several best paper awards and nominations, as well as been funded through grants from NSF and Kauffman Foundation. Dr. Wattal holds a Bachelor’s in Engineering from Birla Institute of Technology and Science Pilani (India), an MBA from Indian Institute of Management Calcutta (India), an MS (Industrial Administration) from Carnegie Mellon University, and a PhD from the Tepper School of Business, Carnegie Mellon University.For more details click here
zhaozhg-200x300Zhigen Zhao
"GATE: Group Assisted Testing for Multiple Hypotheses Appearing in Groups"
Abstract: Multiple testing of grouped hypotheses is becoming a common place in modern large-scale statistical investigations, especially in the field of genomics and brain imaging, as researchers are taking advantage of the underlying group structure of the hypotheses, either biologically defined or formed using external information on covariates, in an attempt to make more important discoveries. Often, computational and Big Data challenges associated with testing huge-scale multiple testing in those investigations compel investigators to create computationally feasible batches or groups of hypotheses. The paradigm shift of multiple testing from single to multiple groups has brought in newer statistical challenges in terms of controlling false discoveries. A long-term objective is to introduce newer theories and methodologies addressing these challenges. Specifically, we will address the following three questions in the domain of grouped hypotheses testing: Q1. How to effectively capture the underlying group structure instead of simply pooling all the hypotheses into a single group while controlling overall false discoveries across all individual hypotheses? Q2. When discovering significant groups is an important consideration, how to maintain a control over falsely discovered groups while answering Q1? Q3. When discovering significant hypotheses within each group is an important consideration, how to maintain a control over false discoveries within each group while answering the question(s) Q1 and/or Q2? This talk gives a brief overview of novel Bayesian/empirical Bayes techniques answering these questions and their applications to some real data.Bio: Zhigen Zhao graduated from Cornell University in 2009. Dr. Zhao's research interests include Bayesian/empirical Bayesian statistics, high dimensional data analysis, multiple comparison, bioinformatics, selective confidence intervals. Dr. Zhao has published papers in top tier journals, such as Journal of the Royal Statistical Society, Series B, Journal of the American Statistical Association. Dr Zhao’s current research is supported by national science foundation.


160929-tang-black-purple-portraitModerator- Stephen Tang, PhD

Steve Tang became President and CEO of the University City Science Center, the nation’s oldest and largest urban research park, in 2008. Dr. Tang is the first president in the Science Center’s history to have not only led a company through venture funding and an initial public offering, but to also serve as a senior executive with a large life sciences company as it acquired and integrated smaller start‐ups.

In September 2016, U.S. Commerce Secretary Penny Pritzker reappointed Dr. Tang to the National Advisory Council on Innovation and Entrepreneurship (NACIE). Tang will serve as NACIE co‐chair during his two‐year term. NACIE members will offer recommendations for policies and programs designed to make U.S. communities, businesses, and the workforce more globally competitive. Previously, Tang served on the U.S. Department of Commerce’s Innovation Advisory Board.

Dr. Tang also serves on several state‐wide, regional and local Boards of Directors. He chairs the Board of Directors of the Committee of Seventy, the Philadelphia region’s good government advocate, and along with Pennsylvania Governor Tom Wolf, co‐chairs the Team Pennsylvania Foundation, which bridges the gap between government and the private sector. Dr. Tang earned a doctorate in Chemical Engineering from Lehigh University, an M.B.A. from Wharton School of Business at the University of Pennsylvania, and a B.S. in Chemistry from the College of William and Mary.

yong-caiYong Cai Yong Cai is the director of Advanced Analytics at QuintilesIMS. He has more than 10 years of experience in data science research, developing new methodologies and designing innovative solutions to address complex issues in healthcare industry. Yong’s main research interests are Bayesian modeling, machine learning and predictive analytics. He has extensive experience in data driven applications including multichannel marketing, segmentation, pricing and market access, prelaunch opportunity assessment, experimental design, forecasting, sampling design, projection and optimization. Yong also publishes and presents in various conferences. His most recent research interest focuses on relational learning to address the challenges brought by big data. He received 2015 IMS Health CEO Award for innovation in media channel optimization. Yong holds a PhD in Statistics from University of Louisiana at Lafayette.
picture1Vladimir Crnojević Prof. Dr. Vladimir Crnojević is director of BioSense Institute – R&D institute for information technologies in Biosystems, which has recently been ranked as No.1 European Center of Excellence within Spreading Excellence and Widening Participation program. He is also professor at computer science department of University of Novi Sad, Serbia and extraordinary professor at applied mathematics of Stellenbosch University, South Africa. He studied Electrical Engineering and got his Ph.D. in image processing from University of Novi Sad in 2004. Since then he has been involved in a number of industrial projects related to computer vision, machine learning and data mining. Vladimir is the founder of BioSense Institute which is now a regional leader in number of EU projects in FP7 and H2020 program. Also, BioSense has strong industrial cooperation and significant track record in innovation accelerator activities with around 50 SMEs funded through BioSense programmes. Dr. Crnojevic has published more than 50 papers in renowned journals and conferences, and acts as a reviewer for some of the most eminent journals in the field. His current research interests include machine learning, image processing, remote sensing, data mining, with applications to agriculture, biosystems and IoT. Recently, Dr. Crnojevic has been leading a team of researchers at BioSense Institute which took 4th place at Syngenta Global Crop Challenge 2016, as the most highly ranked European team.
krish-ghoshKrish Ghosh Krish Ghosh is the Vice President of Informatics at LabCorp, and serves as a member of the Executive Management team reporting to the CIO of the company. Prior to this role, Krish was the Vice President of Global Resource Management at Covance, where he reported to the CFO of the company. LabCorp acquired Covance in February of 2015. Krish joined Covance in May of 2006. Krish is responsible for leading Enlighten Health, a Business Unit at LabCorp. Its primary goal is to develop and sell innovative capabilities through Software as a Service (SaaS) to the Pharmaceutical Industry using proprietary internal data as well as externally available data. Underlying all the “Apps on the Platform,” there are highly advanced methods and solutions that have been developed for clients to help them drive value in the drug development continuum, drive their business performance, best practices, growth, capacity management, expansion efforts and profitability. Krish’s key accomplishment was to create the vision and lead the concept and development of Xcellerate®, a brand name for Covance. Based on internal data from Clinical, Central Laboratories, and external data (public and subscription), Xcellerate® was developed to optimize clinical trial execution, trial forecasting, patient enrollment prediction, and resource demand/supply management for any study across different TA/indications, phases, and geographies. Krish, in addition, has 13 years of Pharmaceutical industry experience and 4 years of experience in university based academics and consulting in Statistics. Prior to joining Covance, he was the Director of Project Planning and Information at Wyeth, where he was responsible for Portfolio Planning and Management, R&D Productivity Model and Steady State Management, Metrics and Industry Benchmarking, and Cost and Resource Management. Prior to joining Wyeth, he held different positions in Project Planning and Management, Finance, Clinical Strategic Planning and Operations and Biostatistics at Bristol Myers Squibb. Krish holds a Ph.D. in Statistics and an MBA in Finance from Temple University.
1412Jianying Hu Jianying Hu is a Distinguished Research Staff Member and Program Director of Center for Computational Health at IBM Thomas. J. Watson Research Center. She studied Electrical Engineering at Tsinghua University in China, and received her Ph.D. in Computer Science from SUNY Stony Brook in the US in 1993. Prior to joining IBM she was with Bell Labs (1993 to 2000) and Avaya Labs (2000 to 2002). Dr. Hu’s main research interests include machine learning, data mining, statistical pattern recognition, and signal processing, with applications to healthcare analytics, medical informatics, business analytics, and multimedia content analysis. Most recently, Dr. Hu has been leading a team of researchers at IBM Research developing advanced machine learning and data mining methodologies for deriving data-driven insights to facilitate “learning health systems”. Dr. Hu has published over 100 technical articles and holds 28 patents. She has served as associate editor for IEEE Transactions on Pattern Analysis and Machine Intelligence, and IEEE Transactions on Image Processing, Pattern Recognition, and International Journal on Document Analysis and Recognition. She is a fellow of IEEE (class of 2015), a fellow of International Association of Pattern Recognition (class of 2010), and received the Asian American Engineer of the Year award in 2013.
michael-w-linkMichael W. Link
“Data Science, Surveys & Enabling Technologies: The New Era of Insight”
Michael W. Link, Ph.D. is President and CEO of Abt SRBI, one of the leading providers of research for government, academic, and commercial clients. He is also a past President of the American Association for Public Opinion Research, 2014-2015. Dr. Link’s research efforts focus on developing methodologies for confronting the most pressing issues facing measurement and data science, including use of new technologies such as mobile platforms, social media, and other forms of Big Data for understanding public attitudes and behaviors . Along with several colleagues, he received the American Association for Public Opinion Research 2011 Mitofsky Innovator’s Award for his research on address-based sampling. His numerous research articles have appeared in leading scientific journals, such as Public Opinion Quarterly, International Journal of Public Opinion Research, and Journal of Official Statistics.
1719de5Curtis Smith
"Big Data in the context of the Pharmaceutical Industry"
Curtis Smith is the Senior Director, Commercial Innovation at Janssen Pharmaceuticals, Inc. In that role he focuses on nurturing emerging capabilities in response to the changing external environment, and oversees the development and deployment of critical value-generating analytic capabilities. Curtis has 30 years of experience developing and executing analytic plans to address complex, critical business issues across multiple industries, and extensive expertise in growing and leading Business Analytics functions within organizations. Prior to joining Janssen, he led the analytics function for McKinsey & Company/Henry Rak Consulting Partners. Throughout his career, Curtis has developed a track record for translating analytic plans and research into clear business insights and strategic direction, resulting in significant business transformation. Curtis holds a Bachelor of Arts degree in Psychology from Bucknell University and a Master of Arts degree in applied research and evaluation in Psychology from Hofstra University. He is currently a student in the Temple Fox School of Business DBA program, where he is working on research to understand the role of predictive analytics on the team decision making process.Presentation
daniel_suiDaniel Sui
"Harnessing the Data Revolution: Perspectives from the U.S. National Science Foundation"
Daniel Sui is an Arts and Sciences Distinguished Professor and Professor of Geography, Public Affairs, Public Health, and Urban/Regional Planning at the Ohio State University (OSU).  Since July 2016, he has been on an IPA assignment to serve as the Division Director for Social and Economic Sciences (SES), Directorate of Social, Behavioral, and Economic Sciences (SBE) at the U.S. National Science Foundation.  Prior his current appointment, Daniel Sui served as Chair of Geography (2011-2015) and as Director of the Center for Urban & Regional Analysis (CURA) (2009-2012) at OSU.  Before joining the faculty of OSU in July 2009, he was a professor of geography (1993-2009) and holder of the Reta A. Haynes endowed chair (2001-2009) at Texas A&M University.  He holds a B.S. (1986) and M.S. (1989) from Peking University and Ph.D. from University of Georgia (1993).  His research current interests include GIScience and cartographic theory, development of smart cities, location-based social media, open/alternative GIS, Deep Web/Darknet, and legal/ethical issues of using geospatial technology in society. Sui was a 2009 Guggenheim Fellow, 2006 winner of the Michael Breheny Prize for best paper in Environment and Planning, and 2014 recipient of the distinguished scholar award from the Association of American Geographers. Sui was also  the 2015 Public Policy Scholar in residence at the Woodrow Wilson International Center for Scholars. He served on the U.S. National Mapping Science Committee for two terms (2007-2013) and currently serves as editor-in-chief for GeoJournal. For more information about Sui’s current research Click Here 

 Poster Session

Presenter: Sudhir KumarAuthors: Scheinfeldt L, Patel R, Lanham T, Kumar S.
Title: Thousands of adaptive polymorphisms in human proteins and their prevalence in disease.
Abstract: Hundreds of thousands of polymorphic missense mutations have been discovered in the human genome. However, fewer than 100 of these polymorphisms are known to carry signatures of positive selection. If true, this would make the incidence of adaptation through protein sequence changes to be an extremely rare phenomenon. Through an evolutionary time-series analysis, which uses between-species differences to generate neutral expectations and discover candidate adaptive polymorphisms (caps) based on the discordance between neutral expectations and observed allele frequencies, we have discovered more than 18,000 missense caps. Using available genome-wide association data, we have validated >2,000 caps to be involved in phenotypic traits. Therefore, we have identified 20-times more bona fide caps than known previously. Our new multidimensional approach integrates inter-species, intra-species, and phenotypic information in one framework to discover adaptations that have shaped human phenotypic variability and the disease landscape.
Presenter: Sudhir KumarAuthors: Kumar S, Gomez K, Murrillo O, Miura S.
Title: Inferring clonal sequences and phylogenies from personal tumor genome profiles.
Abstract: Tumor progression involves evolution of clone cells that arise through cell division, mutate and spread in tumors. Genetic profiling has revealed the presence of extensive variation in tumor samples from individual patients; however, existing methods have difficulty in estimating clone genotypes and clone frequencies from heterogeneous tumor samples. We show that the clone cell genotypes present in each tumor sample can be accurately inferred from the analysis of multiple tumor sample profiles with our new CloneFinder method. CloneFinder uses molecular evolutionary principles to deduce clone genotypes at a single-nucleotide resolution along with the frequency of each clone in each sample. It performs better than existing methods and is scalable for the analysis of a large number of samples. Application of CloneFinder to more than 60 empirical datasets revealed that a vast majority of tumor samples contain at least one ancestral clone, that ancestral clones occur at high frequencies, and that many tumors are a mixture of very early and recently evolved clones. Therefore, our new method provides new insights into the clonal structure of tumors and their evolution within a personal life time.
Presenter: Yiran LiAuthors: Yiran Li , Heidi Grunwald, Carole A Tucker, and Li Bai
Title: SMART System: Survey and Measurement using Avatar and Robotic Technology
Abstract: Recent advances in interactive technologies, such as humanoid robotic systems and computer avatars, have emerged within healthcare and education settings as teaching technology platforms, and for training, care and opportunities for social interaction. The objective of our Survey and Measurement using Avatar and Robotic Technology (SMART) technology platform is to improve robotic survey methodology and data science based analytics that support PRO assessment in individuals with socioemotional or communication limitations that preclude the use of typical survey administration such as paper/pencil or computer. The data science aspect of our project involves building the scalable, platform architecture and the integration and analyses of a large data set consisting of multiple streams (video, audio, Microsoft band) of survey responses, survey paradata (e.g. response time, predictable missing-ness or patterned responses), environmental (e.g. ambient light, sound, persons), behavioral (e.g. physical proximity and gaze with robot, and facial expression – emotion recognition), and physiologic data (heart rate, motion, galvanic skin responses) to develop feature detection and analytics to improve administration burden and scoring reliability of PROs. Physiological sensor data from respondents will be meshed with the real-time video and environmental data from the humanoid robot/avatar and cross referenced with the testing database to adaptively administer PROs.
 Presenter: Avrum GillespieAuthors: Gillespie A, Fink EL, Traino H, Uversky A, Bass SB, Greener J, Hunt J, Browne T, Hammer H, Reese P, Obradovic Z.
Title: Hemodialysis clinic social networks, sex differences, and renal transplantation.
Abstract: This study describes and characterizes the formation of patient social networks within a new hemodialysis clinic, and models the association between social network participation and kidney transplantation.  Transplant eligible women participated less often in the network than men, but women who participated discussed their health more often.  Social network analyses showed patients completed more steps in the transplant work and were transplanted if their network had a higher clustering coefficient and the network members also completed more steps. The hemodialysis clinic patient social network had a net positive effect on completion of transplant steps, and sex differences in network participation may contribute to sex disparities in the kidney transplantation process.
Presenter: Daniele Granata Authors: Daniele Granata and Vincenzo Carnevale
Title: "Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets."
Abstract: The collective behavior of a large number of degrees of freedom can be often described by a handful of variables. This observation justifies the use of dimensionality reduction approaches to model complex systems and motivates the search for a small set of relevant “collective” variables. Here, we analyze this issue by focusing on the optimal number of variable needed to capture the salient features of a generic dataset and develop a novel estimator for the intrinsic dimension (ID). By approximating geodesics with minimum distance paths on a graph, we analyze the distribution of pairwise distances around the maximum and exploit its dependency on the dimensionality to obtain an ID estimate. We show that the estimator does not depend on the shape of the intrinsic manifold and is highly accurate, even for exceedingly small sample sizes. We apply the method to several relevant datasets from image recognition databases and protein multiple sequence alignments and discuss possible interpretations for the estimated dimension in light of the correlations among input variables and of the information content of the dataset. The code is available Here. Reported results are published at:
  • Granata D., Carnevale V. (2016) “Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets” Nature Scientific Reports, 2016, Aug. 11, 6:31377 doi: 10.1038/srep31377.

Presenter: Vincenzo Carnevale Authors: Daniele Granata and Vincenzo Carnevale
Title: "Unveiling hierarchical community structures from clustering of correlation patterns: the case of protein evolutionary domains."
Abstract: The combination of constantly growing sequence databases and recent methodological developments in the field of bioinformatics enable nowadays prediction of proteins tertiary structure with a remarkable degree of accuracy. These methods exploit the information contained in large multiple sequence alignments by focusing, in particular, on patterns of pairwise correlations to reconstruct the network of residue-residue contacts. Less explored, and yet extremely intriguing, is the functional relevance of such coevolving networks: do they collectively encode for well defined structural dynamics? Here, by combining coevolutionary coupling analysis with a state-of-the-art dimensionality reduction approach, we investigate how dynamical and functional properties can be inferred from sequence alone. We show that the network of pairwise evolutionary couplings shows a striking clustering tendency for all the analyzed protein families. These clusters, which we term evolutionary domains, reveal a community structure of the networks of couplings, which reflects the structural organization of proteins. In particular the hierarchy of evolutionary domains outlined by the method, which can be defined for different spatial resolution, parallels the multi-scale nature of proteins function. We systematically explore a sequence database of 800 families to benchmark the structural compactness of these subdivisions and their strong correlation with dynamical, quasi-rigid domains. We demonstrate that the algorithm is robust with respect to the size of the datasets, providing consistent results with as few as hundreds of sequences. Finally, we apply the method in a comparative context to investigate the relevant case of the 6TM ion channels family, recapitulating the major functional traits that distinguish three subfamilies. The decomposition tool is available as a software package and web server.
Presenter: Jeremy MennisAuthors: Jeremy Mennis and Andrea Ambrus
Title: Correlation between Urban Vegetation and Adolescent Stress in Richmond, Virginia
Abstract: Psychological stress is associated with a host of negative health outcomes, including cardiovascular disease and mental disorders.  Evidence indicates that neighborhood conditions can influence stress – living in neighborhoods with high crime and violence can cause stress, whereas exposure to vegetated areas, i.e. ‘greenspace,’ can reduce stress.  This study investigates whether exposure to greenspace that occurs throughout the activity space of daily life is associated with lower levels of stress among a sample of urban youth in Richmond, Virginia.  Georerenced, momentary data on stress over a two year period was collected using Global Positioning System (GPS)-enabled Ecological Momentary Assessment (EMA), a technique that delivers surveys via mobile phone to capture in-situ perceptions, behaviors, and social interactions. Data on greenspace was collected via processing of Landsat ETM+ imagery to derive Normalized Difference Vegetation Index (NDVI).  Preliminary results indicate that youth with higher exposure to urban greenspace in their activity space experienced lower levels of stress. This study is the result of joint work with Andrea Ambrus.
Presenter: Rob KulathinalAuthors: Rob Kulathinal, Zoran Obradovic, Sunil Wattal, Youngjin Yoo
Title: Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex socio-technical systems: An organizational genetics approach
Abstract: We are developing a method to predict the emergence of system-level behaviors by analyzing large volumes of digital trace data and using evolutionary social ontology to build a multi-level model of complex socio-technical systems. We use analytical techniques developed in evolutionary genetics and systems biology to: (1) characterize a stream of digital trace data from a complex socio-technical system with finite genetic elements, (2) predict the behavior of socio-technical systems based on the pattern of “behavioral gene” interactions, and (3) explore the impact of mutational input, gene flow, and recombination in “behavioral genes” on the evolution of socio-technical systems. We test our model in GitHub, one of the largest open source communities that includes over 5 million open source software development projects and Twitter, one of the largest social media site, that has over 500 million messages per day. We will generalize our model to be used for other types of massive digital trace data including sensor data from the Internet of the Things and mobile user data. Our work will also allow scholars in different field to study the emergence of complex systems behavior through the interaction of low-level events.
Presenter:  Jelena StojanovicAuthors: Jelena Stojanovic, Djordje Gligorijevic and Zoran Obradovic
Title: Modeling Qualitative and Quantitative Knowledge in Health Informatics
Abstract: Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect well-being of millions of patients. In our studies EHR data is used to:
  • discover novel relationships between diseases by studying their comorbidities,
  • discover novel disease-gene associations by including domain knowledge from genome-wide association studies,
  • Predict important parameters of healthcare quality, and
  • discover latent disease subtypes
Our model outperformed the baseline models on all tasks revealing very compelling results and indicating a strong potential for advancing the quality of the healthcare system. Reported results are published at:
  • Gligorijevic Dj., Stojanovic J., Djuric N., Radosavljevic V., Grbovic M., Kulathinal R., Obradovic Z. (2016) “Large-Scale Discovery of Disease-Disease and Disease-Gene Associations,” Nature Scientific Reports, 2016, Aug. 31, 6:32404 doi: 10.1038/srep32404.
  • Gligorijevic, Dj., Stojanovic, J., Obradovic, Z., (2016) “Disease Types Discovery from a Large Database of Inpatient Records: A Sepsis Study,” Methods, Jul 28. S1046-2023(16)30232-8, doi: doi:10.1016/j.ymeth.2016.07.021
  • Stojanovic, J., Gligorijevic, Dj., Radosavljavic, V., Djuric, N., Grbovic, M., Obradovic, Z., (2016) “Modeling Healthcare Quality via Compact Representations of Electronic Health Records," IEEE/ACM Transactions on Computational Biology and Bioinformatics, Jul 14. doi:10.1109/TCBB.2016.2591523

Presenter: Ivan StojkovicAuthors: Ivan Stojkovic and Zoran Obradovic
Title: Distance Based Modeling of Interactions in Structured Regression
Abstract: Graphical models, as applied to multi-target prediction problems, commonly utilize interaction terms to impose structure among the output variables. Often, such structure is based on the assumption that related outputs need to be similar and interaction terms that force them to be closer are adopted. Here we relax that assumption and propose a feature that is based on distance and can adapt to ensure that variables have smaller or larger difference in values. We utilized a Gaussian Conditional Random Field model, where we have extended its originally proposed interaction potential to include a distance term. The extended model is compared to the baseline in various structured regression setups. An increase in predictive accuracy was observed on both synthetic examples and real-world applications, including challenging tasks from climate and healthcare domains. Reported results are published at:
  • Stojkovic, I., Jelisavcic, V., Milosavljevic, V., Obradovic, Z. (2016) “Distance Based Modeling of Interactions in Structured Regression,” 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, July 2016, pp. 2032-2038.

Presenter:  Djordje GligorijevicAuthors: Djordje Gligorijevic, Jelena Stojanovic and Zoran Obradovic
Title: Uncertainty Propagation in Long-term Structured Regression on Evolving Networks
Abstract: In long-term forecasting it is important to estimate the confidence of predictions, as they are often affected by errors that are accumulated over the prediction horizon. To address this problem, an effective novel iterative method is developed for Gaussian structured learning models in this study for propagating uncertainty in temporal graphs by modeling noisy inputs. The proposed method is applied for three long-term (up to 8 years ahead) structured regression problems on real world evolving networks from the health and climate domains. The obtained empirical results and use case analysis provide evidence that the new approach allows better uncertainty propagation as compared to published alternatives. Reported results are published at:
  • Gligorijevic, Dj, Stojanovic, J., Obradovic, Z. (2016) “Uncertainty Propagation in Long-term Structured Regression on Evolving Networks," Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, AZ, February 2016, pp. 1603-1610.

 Presenter:  Jesse GlassAuthors: Jesse Glass, Peter Jones and Zoran Obradovic
Title: Personalized Dynamic Models for Early Identification of Academically At-Risk Students
Abstract:  Our objective is to develop a dynamic framework based on innovative data mining technology for early, accurate, and easy-to-interpret identification of students’ risk assessment. The proposed model combines the existing cross-sectional data with an array of student behavior temporal data that reflects changes in risk, such as student activities regarding financial aid and housing, student ID card swipes, student access to university online systems, visits to student support centers, and academic advising. The study is characterized on predicting dropout for multiple cohorts of Temple University students.
Presenter: Robert J. LevisAuthors: Fengjian Shi, Rachel Parise, Evan M. Lutton, Servio Ramirez,  Robert J. Levis  Title: Discovery of Biomarkers for Traumatic Brain Injury using Laser Electrospray Mass SpectrometryAbstract: Traumatic brain injury (TBI) is a complex injury involving multiple physiological and biochemical alterations to tissue. The potentially thousands of relevant biomarkers spread over a volume of thousands of mm3 makes the spatial analysis of brain a big data problem. Laser electrospray mass spectrometry (LEMS) is an ambient mass spectrometry system that merges femtosecond laser vaporization with electrospray ionization/time of flight detection and has been developed as a new chemical microscopy tool. LEMS was employed in this study to image TBI mouse brain sample and assess the spatial distribution of biomarkers after trauma. The imaging experiment was performed using a 100 µm laser spot size rastered over a 4 x 4 mm area of a mild TBI brain tissue section to generate 100 Gbytes of data. Species identified by mass spectra were spatially mapped and compared to corresponding optical images of the brain sample. A present challenge for the method is the development of data mining techniques to identify relevant biomarkers for TBI.
Presenter: Eduard Dragut
Title: Streaming Architecture for Continuous Entity Linking in Social Media
Abstract: A large fraction of the ever-growing internet content is found in social media. Users access it to both form and share their opinions about events and people, election preferences, product and brand recommendations. This situation provides opportunities to create added layers of data mining and analysis regarding users' views on developing events, products, services, or government actions; at the same time, it raises challenges for Entity Linking (EL) in social media. EL is the task of linking an extracted mention to a specific definition of the entity. The definition of an entity is usually a pointer to a Web page that defines the entity. The goals of this project are to research algorithms to detect in near real-time those pieces of text in messages that reference entities, Web pages that describe entities, and to link entity references to Web pages and across microblog systems so that together a broad, more complete characterization of each entity is automatically generated. The proposed approaches are based on innovative techniques that include: incremental, iterative message analysis; smart indexing techniques with live updates to support fast incremental entity reference detection; computationally light soft-clustering of messages to improve entity reference detection; and fast incremental K-partite graph clustering. This work will benefit multiple segments of society that rely on applications using data from microblog systems, such as targeted monitoring of Twitter and Facebook to collect and understand users' opinions about a recent product or a world event, and data mining for early crisis detection and response as well as national security.

Presenter: Victor H Gutierrez

Title: Understanding social and environmental risks for the transmission of mosquito borne diseases in urban areas of South East Pennsylvania

Authors: Victor H Gutierrez-Velez, Kevin Henry, Daniel Wiese, Ananias Escalante, Deborah Nelson

Abstract: Mosquito transmitted diseases commonly associated with tropical areas such as zika, west nile virus, dengue and chinkunguya are becoming insidious in North America and other higher latitudes in the world as global flow of people and goods increases, cities grow and weather patterns become altered. Aedes albopictus is a viable vector for the transmission of these diseases. This mosquito has spread into urban areas contributing to the emergence of atypical disease patterns. This project uses machine-learning methods and parametric statistics combining ground mosquito sampling points with remotely sensed climatic, land use and vegetation condition data along with census and behavioral surveys to understand the social, climatic and environmental conditions explaining the presence and distribution of Aedes albopictus in urban areas of South East Pennsylvania. Exploratory results suggest that winter temperature previous to the mosquito season has the strongest influence in the incidence of Aedes albopictus. Mosquito presence in urban areas relates positively to a higher density of impervious surfaces and negatively to vegetation presence. Higher density of houses with very good condition relates negatively with mosquito occurrence. Our findings can contribute to implement early warning and geographic information systems to anticipate and mitigate the incidence of mosquito borne diseases in the US and abroad.