CS Colloquium: Jay Pujara (University of Southern California) - Probabilistic Models for Large, Noisy, and Dynamic Data
Thu, Jan 25, 2018 @ 11:00 AM - 12:20 PM
Conferences, Lectures, & Seminars
Speaker: Jay Pujara, University of Southern California
Talk Title: Probabilistic Models for Large, Noisy, and Dynamic Data
Series: Computer Science Colloquium
Abstract: We inhabit a vast, uncertain, and dynamic universe. To succeed in such an environment, artificial intelligence approaches must handle massive amounts of noisy, changing evidence. My research addresses the problems of building scalable, probabilistic models amenable to online updates. To illustrate the potential of such models, I present my work on knowledge graph identification, which jointly resolves the entities, attributes, and relationships in a knowledge graph by combining statistical NLP signals and semantic constraints. Using probabilistic soft logic, a statistical relational learning framework I helped develop, I demonstrate how knowledge graph identification can scale to millions of uncertain candidate facts and tens of millions of semantic dependencies in real-world data while achieving state-of-the-art performance. My work further extends this scalability by adopting a distributed computing approach, reducing the inference time of knowledge graph identification from two hours to ten minutes. Updating large, collective models like those used for knowledge graphs with new information poses a significant challenge. I develop a regret bound for probabilistic models and use this bound to motivate practical algorithms that support low-regret updates while improving inference time over 65%. Finally, I highlight several active projects in causal explanation, sustainability, bioinformatics, and mobile analytics that provide a promising foundation for future research.
This lecture satisfies requirements for CSCI 591: Research Colloquium. Please note, due to limited capacity in OHE 100D, seats will be first come first serve.
Biography: Jay Pujara is a research scientist at the University of Southern California's Information Sciences Institute whose principal areas of research are machine learning, artificial intelligence, and data science. He completed a postdoc at UC Santa Cruz, earned his PhD at the University of Maryland, College Park and received his MS and BS at Carnegie Mellon University. Prior to his PhD, Jay spent six years at Yahoo! working on mail spam detection, user trust, and contextual mail experiences, and he has also worked at Google, LinkedIn and Oracle. Jay is the author of over thirty peer-reviewed publications and has received three best paper awards for his work. He is a recognized authority on knowledge graphs, and has organized the Automatic Knowledge Base Construction (AKBC) and Statistical Relational AI (StaRAI) workshops, has presented tutorials on knowledge graph construction at AAAI and WSDM, and has had his work featured in AI Magazine. For more information, visit https://www.jaypujara.org
Host: Stefan Scherer
Location: Olin Hall of Engineering (OHE) - 100D
Audiences: Everyone Is Invited
Contact: Computer Science Department