-
Thomas Lord Department of Computer Science: Distinguished Lecture Series feat. Dr. Mohit Bansal
Thu, Apr 18, 2024 @ 02:00 PM - 04:15 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Dr. Mohit Bansal, John R. & Louise S. Parker Distinguished Professor, UNC Chapel Hill
Talk Title: Multimodal Generative LLMs: Unification, Interpretability, Evaluation
Abstract: In this talk, I will present our journey of large-scale multimodal pretrained (generative) models across various modalities (text, images, videos, audio, layouts, etc.) and enhancing their important aspects such as unification (for generalizability, shared knowledge, and efficiency), interpretable programming/planning (for controllability and faithfulness), and evaluation (of fine-grained skills, faithfulness, and social biases). We will start by discussing early cross-modal vision-and-language pretraining models (LXMERT). We will then look at early unified models (VL-T5) to combine several multimodal tasks (such as visual QA, referring expression comprehension, visual entailment, visual commonsense reasoning, captioning, and multimodal translation) by treating all tasks as text generation. We will next look at recent, progressively more unified models (with joint objectives and architecture, as well as newer unified modalities during encoding and decoding) such as textless video-audio transformers (TVLT), vision-text-layout transformers for universal document processing (UDOP), and interactive, interleaved, composable any-to-any text-audio-image-video multimodal generation (CoDi, CoDi-2). Second, we will discuss interpretable and controllable multimodal generation (to improve faithfulness) via LLM-based planning and programming, such as layout-controllable image generation via visual programming (VPGen), consistent multi-scene video generation via LLM-guided planning (VideoDirectorGPT), open-domain, open-platform diagram generation (DiagrammerGPT), and LLM-based adaptive environment generation for training embodied agents (EnvGen). I will conclude with important faithfulness and bias evaluation aspects of multimodal generation models, based on fine-grained skill and social bias evaluation (DALL-Eval), interpretable and explainable visual programs (VPEval), as well as reliable fine-grained evaluation via Davidsonian semantics based scene graphs (DSG).
Please RSVP by Monday, April 15, 2024 (5:00 p.m., PST): https://forms.gle/shymnJc87y5fHFJaA
This lecture satisfies requirements for CSCI 591: Research Colloquium.
Biography: Dr. Mohit Bansal is the John R. & Louise S. Parker Distinguished Professor and the Director of the MURGe-Lab (UNC-NLP Group) in the Computer Science department at UNC Chapel Hill. He received his PhD from UC Berkeley in 2013 and his BTech from IIT Kanpur in 2008. His research expertise is in natural language processing and multimodal machine learning, with a particular focus on multimodal generative models, grounded and embodied semantics, faithful language generation, and interpretable, efficient, and generalizable deep learning. He is a recipient of IIT Kanpur Young Alumnus Award, DARPA Director's Fellowship, NSF CAREER Award, Google Focused Research Award, Microsoft Investigator Fellowship, Army Young Investigator Award (YIP), DARPA Young Faculty Award (YFA), and outstanding paper awards at ACL, CVPR, EACL, COLING, and CoNLL. He has been a keynote speaker for the AACL 2023, CoNLL 2023, and INLG 2022 conferences. His service includes EMNLP and CoNLL Program Co-Chair, and ACL Executive Committee, ACM Doctoral Dissertation Award Committee, ACL Americas Sponsorship Co-Chair, and Associate/Action Editor for TACL, CL, IEEE/ACM TASLP, and CSL journals. Webpage: https://www.cs.unc.edu/~mbansal/
Host: USC Thomas Lord Department of Computer Science
More Info: https://forms.gle/shymnJc87y5fHFJaA
Location: Seeley G. Mudd Building (SGM) - 124
Audiences: Everyone Is Invited
Contact: Thomas Lord Department of Computer Science
Event Link: https://forms.gle/shymnJc87y5fHFJaA