-
PhD Thesis Proposal - Zhaoheng Zheng
Wed, Nov 30, 2022 @ 08:30 AM - 10:00 AM
Thomas Lord Department of Computer Science
University Calendar
Ph.D. Candidate: Zhaoheng Zheng
Topic: Incorporating Large-Scale Vision-Language Corpora in Visual Understanding
Committee Chair: Prof. Ram Nevatia
Committee Member: Prof. Keith Jenkins
Committee Member: Prof. Jesse Thomason
Committee Member: Prof. Greg Ver Steeg
Committee Member: Prof. Mohammad Soleymani
Abstract: Vision and language are key mediators through which humans interact with the external world or other members of society. One goal of artificial intelligence (AI) research is to create machines that can perceive the real world through multiple modalities. Previous research has shown remarkable progress in creating functional visual or linguistic perception systems with the help of deep neural networks. Recently, thanks to the advances of the Internet and social media, large-scale vision-language corpora can be easily accessed, motivating research that aims at creating large-scale Vision-Language Pre-training (VLP) models. Compared with previous methods, VLP models are stronger and more generalizable thanks to their data scale. In this thesis, we investigate the problem of how to leverage such data to boost existing visual understanding tasks. Particularly in FashionVLP, we propose to fine-tune a pre-trained VLP model for fashion image retrieval. More specifically, we fine-tune the model with customized input sequences containing various vision-language features, achieving significant improvements on multiple benchmarks. Moreover, we take a step further and explore better designs for VLP models to learn from large-scale corpora, resulting in our recent work, Fractional Intermediate Tower (FIT). FIT enhances the vision-language fusion process inside VLP models by encoding vision features from multiple vision layers before they are taken by the fusion encoder.
WebCast Link: https://usc.zoom.us/j/95655803815?pwd=d3RrOXNrU2dVVE1sTkZpYXU3NWxEUT09
Audiences: Everyone Is Invited
Contact: Lizsl De Leon