Data Science is an emerging discipline that develops solutions to applied business problems, by combining a rigorous approach rooted in statistical and mathematical reasoning, with modern computational tools and distributed database solutions to carry out analytics in real-time, at a massive scale.

Scholars and practitioners at Data Science Institute develop new lines of scientific inquiry, serve as a brain trust for the Great Philadelphia Metropolitan Area, and function as a bridge for translating research and ideas into knowledge, products, and best practice for industry, government, and society at large.

Featured Research Projects

The following highlight just a few on-going projects. For more stories of research impact, sign up to receive On the Verge, the Fox School’s flagship research magazine.

Social Healthcare in Rural Honduras with the Human Nature Lab at Yale University


Despite global progress in many aspects of child health, neonatal mortality rates remain high in the developing world, accounting for 40% to 50% of deaths of children under the age of 5. Many deaths can be prevented through the provision of clinical care services and low-cost interventions within family and community settings. Honduras has made considerable progress in its efforts to improve the health of its population, but it still lags behind much of the Mexican and Central American region.

To aid in advancing the reproductive, maternal, neonatal, and child health (RMNCH) outcomes in Honduras, Edoardo Airoldi and his research team in collaboration with the Human Nature Lab led by Nicholas A. Christakis created a randomized control trial (RCT) using social network targeting in Honduras, an offline setting, to implement sustainable community behavioral change. This research provides an approach to randomly target selected individuals as a way to influence their friends through key small segment interventions and then align this method with gold-standard social network data. This method has the potential to initiate positive change in RMNCH outcomes and affect one hundred percent of the population in Honduras. With billions of dollars spent each year in attempting to make behavioral changes in at-risk populations, this approach provides a toolkit for practitioners to use to create sustainable and substantive impact in field settings and greatly alter the landscape.


Featured Researcher(s): Edoardo Airoldi

Social Media


Social media is a pervasive force in today’s society, supplying a multitude of posts each second. These posts create big data that can seem daunting to make sense of, but can provide important information, especially for companies. The hashtag has become a way to group conversations and gather similar ideas and is ubiquitous across many social media platforms.

Subodha Kumar and his research team analyzed tweet data from over 100 companies that had trademarked a hashtag, studying the level of engagement (such as likes, comments, and retweets) and the language of the tweet (such as emotion, tone, and style). Based on this study, the researchers have greatly advanced research in social media and discovered what makes a trademarked hashtag beneficial for a company, aiding companies in better understanding the value of a trademarked hashtag and the determining factors in investing in one.


Featured Researcher(s): Subodha Kumar

Healthcare Analytics


Each of us have 20,000 genes in our DNA. How each gene interacts with another can provide insight into a person’s health—up to 55 million pairs. That’s a lot of variables. Zhigen Zhao, associate professor of Statistical Science at the Fox School, and his statistician colleagues invent new ways to analyze the millions of data points and create models that identify all kinds of associations and patterns that benefit healthcare.

For example, Zhao and his colleagues invented a methodology that can analyze all of these variables in seconds, identifying certain combinations of genes that may help doctors understand the implications of medical issues ranging from heart disease and Alzheimer’s to obesity and alcoholism.


Featured Researcher(s): Zhigen Zhao

Social Network Analytics


The study of social networks is a new but quickly broadening multidisciplinary area involving social, mathematical, statistical, and computer sciences. As much of the world is structured as a network – from organisms to organizations to economies - the social network perspective with its approach in understanding a set of phenomena or data focuses on relationships among social entities and is an important addition to standard social and behavioral research.

Paul Pavlou and his research team have employed social network analysis to further research in technology diffusion, which has been an important challenge for Information Systems research. Their research determines how people are influenced by peers in the diffusion of technology, specifically among cellphone use and adoption, greatly advancing this field of research by providing a theoretical and empirical model to study large-scale network data. As social media and online social communities continue to emerge and advance, these types of research models will enable researchers and practitioners to solve a variety of problems using large-scale data.


Featured Researcher(s): Edoardo Airoldi, Xua Bai