-
PhD Dissertation Defense - Zhaoheng Zheng
Thu, Apr 25, 2024 @ 02:00 PM - 04:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Incorporating Large-Scale Vision-Language Corpora in Visual Understanding
Committee Members: Ram Nevatia (Chair), Mohammad Soleymani, Keith Jenkins
Date and Time: Thursday, April 25th, 2:00pm - 4:00pm
Abstract: As key mediators of human perception, vision and language corpora act as critical roles in the development of modern Artificial Intelligence (AI). The size of vision-language corpora has scaled up rapidly in recent years, from thousands to billions, enabling the creation of large foundation models. However, as an emerging concept, there are a series of problems yet to be explored.
We start with a study of compositional learning from pre-VLM times to the post-VLM era. We introduce a representation blending approach that creates robust features for compositional image classification and a two-stream architecture that tackles the entanglement in the feature space of the object-attribute detection problem with novel object-attribute pairs. We further design an adaptation approach to leverage CLIP encoders for compositional image classification.
The second part covers a variety of methods built with multimodal transformer models. For image retrieval, we propose a framework that assembles multimodal inputs into sequences with which a multimodal transformer encoder can be fine-tuned. The pre-training of vision-language models (VLMs) is also explored. Specifically, we introduce a fractional intermediate tower that improves the feature expressibility of dual-tower vision-language models. We further design a unified pipeline that allows a VLM to learn from not only vision-language corpora but unimodal visual and linguistic data.
Lastly, we study how to leverage the knowledge of Large Language Models (LLMs) for low-shot image classification, in a data- and computation-efficient way.
Zoom Link: https://usc.zoom.us/j/96814169370?pwd=NkhSYWFKNCsya0lyaUFBVlVDQkI3Zz09Location: Hughes Aircraft Electrical Engineering Center (EEB) - 110
Audiences: Everyone Is Invited
Contact: Zhaoheng Zheng
Event Link: https://usc.zoom.us/j/96814169370?pwd=NkhSYWFKNCsya0lyaUFBVlVDQkI3Zz09