Thu, Oct 15, 2020 @ 11:00 AM - 12:00 PM
Information Sciences Institute
Conferences, Lectures, & Seminars
Speaker: Wei Xu, Georgia Tech
Talk Title: Natural Language Understanding for Noisy Text
Series: NL Seminar
Abstract: In this talk I will present some of our recent work that focuses on understanding the meaning of user generated texts and extracting useful information. First, I will discuss the design of neural pairwise ranking models, and their applications to semantic analysis of hashtags. Our best ranking model, that incorporates multi task learning and Gaussian feature vectorization, can segment hashtags into meaningful word sequences. For Example dtlaartsdistrict DTLA Arts District with an over 95 percent accuracy. Second, I will highlight the importance of training customized BERT models for noisy text and zero shot transfer learning. I will provide two case studies 1 BERTOverflow model we trained on in-domain data that significantly outperforms off the shelf BERT on the new StackOverflow NER corpus. 2 GigaBERT, a bilingual BERT we developed specifically for English and Arabic, which performs better than Googles multilingual BERT and Facebooks XLM RoBERTa for cross lingual information extraction. I will conclude with our new work on annotating data and training automatic models to extract COVID 10 related events from Twitter.
Biography: Wei Xu is an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology. Before joining Georgia Tech, she was an assistant professor at The Ohio State University since 2016. Her research interests are in natural language processing, machine learning, and social media. Her recent work focuses on language generation, semantics, information extraction, and reading assistive technology. She has received the NSF CRII Award, Best Paper Award at COLING, CrowdFlower AI for Everyone Award, and Criteo Faculty Research Award. She recently served as a senior area chair for ACL 2020 and an area chair, workshop chair, and publicity chair for EMNLP and NAACL conferences. She has been co organizing the Workshop on Noisy User generated Text annually.
Host: Jon May and Mozhdeh Gheini
More Info: https://nlg.isi.edu/nl-seminar/
WebCast Link: https://youtu.be/pr1HGaE5dAE
Audiences: Everyone Is Invited
Contact: Petet Zamar
Event Link: https://nlg.isi.edu/nl-seminar/