Logo: University of Southern California

Events Calendar


  • PhD Dissertation Defense - Zhaoheng Zheng

    Thu, Apr 25, 2024 @ 02:00 PM - 04:00 PM

    Thomas Lord Department of Computer Science

    University Calendar


    Title: Incorporating Large-Scale Vision-Language Corpora in Visual Understanding  
     
    Committee Members: Ram Nevatia (Chair), Mohammad Soleymani, Keith Jenkins  
     
    Date and Time: Thursday, April 25th, 2:00pm - 4:00pm  
     
    Abstract: As key mediators of human perception, vision and language corpora act as critical roles in the development of modern Artificial Intelligence (AI). The size of vision-language corpora has scaled up rapidly in recent years, from thousands to billions, enabling the creation of large foundation models. However, as an emerging concept, there are a series of problems yet to be explored. 
    We start with a study of compositional learning from pre-VLM times to the post-VLM era. We introduce a representation blending approach that creates robust features for compositional image classification and a two-stream architecture that tackles the entanglement in the feature space of the object-attribute detection problem with novel object-attribute pairs. We further design an adaptation approach to leverage CLIP encoders for compositional image classification.
    The second part covers a variety of methods built with multimodal transformer models. For image retrieval, we propose a framework that assembles multimodal inputs into sequences with which a multimodal transformer encoder can be fine-tuned. The pre-training of vision-language models (VLMs) is also explored. Specifically, we introduce a fractional intermediate tower that improves the feature expressibility of dual-tower vision-language models. We further design a unified pipeline that allows a VLM to learn from not only vision-language corpora but unimodal visual and linguistic data. 
    Lastly, we study how to leverage the knowledge of Large Language Models (LLMs) for low-shot image classification, in a data- and computation-efficient way.
     
    Zoom Link: https://usc.zoom.us/j/96814169370?pwd=NkhSYWFKNCsya0lyaUFBVlVDQkI3Zz09

    Location: Hughes Aircraft Electrical Engineering Center (EEB) - 110

    Audiences: Everyone Is Invited

    Contact: Zhaoheng Zheng

    Event Link: https://usc.zoom.us/j/96814169370?pwd=NkhSYWFKNCsya0lyaUFBVlVDQkI3Zz09

    OutlookiCal

Return to Calendar