CS Colloquium: Dr. Brian Milch (Google) - Combining Probabilistic and Neural Approaches for Text Classification
Tue, Nov 14, 2017 @ 05:00 PM - 06:20 PM
Conferences, Lectures, & Seminars
Speaker: Dr. Brian Milch, Google
Talk Title: Combining Probabilistic and Neural Approaches for Text Classification
Series: CS Colloquium
Abstract: This lecture satisfies requirements for CSCI 591: Research Colloquium.
In the Semantic Signals group at Google Los Angeles, we build classifiers that label text with hundreds of human-defined categories across dozens of languages. Labeled training data is sparse, so we've found it essential to incorporate unsupervised learning methods that take advantage of unlabeled text. One of our tools is a probabilistic topic model that learns discrete "clusters" to explain word co-occurrence patterns in a large corpus, and then identifies the clusters that best explain a new document. Another tool is a neural net that learns embeddings of individual words in a continuous space. I'll discuss how these approaches play complementary roles in our text classification pipeline.
Biography: Brian Milch is a software engineer at Google's Los Angeles office. He received a B.S. in Symbolic Systems from Stanford University in 2000, and a Ph.D. in Computer Science from U.C. Berkeley in 2006. He then spent two years as a post-doctoral researcher at MIT before joining Google in 2008. He has contributed to Google production systems for spelling correction, transliteration, and semantic modeling of text.
Host: Fei Sha
Location: Seeley G. Mudd Building (SGM) - 124
Audiences: Everyone Is Invited
Contact: Computer Science Department