-
PhD Thesis Proposal - Hayley Song
Wed, Nov 20, 2024 @ 02:15 PM - 03:15 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Riemannian-Geometric Fingerprints of Generative Models
Date: November 20, 2024
Time: 2:15 pm - 3:15 pm
Location: KAP 209
Committee: Laurent Itti, Chair, Emilio Ferrara, Kyler Siegel, Robin Jia, and Willie Neiswanger
Abstract: Recent breakthroughs and rapid integration of generative models (GMs) have sparked interest in the problem of model attribution and their fingerprints.For instance, service providers need reliable methods of authenticating their models to protect their IP, while users and law enforcement seek to verify the source of generated content for accountability and trust. In addition, a growing threat of model collapse is arising, as more model-generated data are being fed back into sources (e.g., YouTube) that are often harvested for training ("regurgitative training''), heightening the need to differentiate synthetic from human data. Yet, a gap still exists in understanding generative models' fingerprints, we believe, stemming from the lack of a formal framework that can define, represent, and analyze the fingerprints in a principled way. To address this gap, we take a geometric approach and propose a new definition of artifact and fingerprint of generative models using Riemannian geometry, which allows us to leverage the rich theory of differential geometry.Our new definition generalizes previous work (Song et al, 2024) to non-Euclidean manifolds by learning Riemannian metrics from data and replacing the Euclidean distances and nearest-neighbor search with geodesic distances and kNN-based Riemannian center of mass. We apply our theory to a new gradient-based algorithm for computing the fingerprints in practice. Results show that it is more effective in distinguishing a large array of generative models, spanning across 4 different datasets in 2 different resolutions (64x64, 256x256), 27 model architectures, and 2 modalities (Vision, Vision-Language). Using our proposed definition can significantly improve the performance on model attribution, as well as a generalization to unseen datasets, model types, and modalities, suggesting its efficacy in practice.Location: Kaprielian Hall (KAP) - 209
Audiences: Everyone Is Invited
Contact: Ellecia Williams