Fri, Apr 29, 2022 @ 12:00 PM - 02:00 PM
PhD Candidate: Ritesh Ahuja
Dissertation Committee: Cyrus Shahabi, Bhaskar Krishnamachari, Aleksandra Korolova
Venue: Online at 12 pm -2 pm
Thesis title: Differentially Private Learned Models for Location Services
The emergence of mobile apps (e.g., location-based services, geosocial networks, ride-sharing) led to the collection of vast amounts of location data. Publishing aggregate information about user's movements benefits research on traffic optimization, context-aware notifications and public health (e.g., disease spread). While the benefits provided by location data are indisputable, preserving location privacy is essential, since even aggregate statistics (e.g., in the form of population density maps) can leak details about individual whereabouts. To protect against privacy risks, the data curator may publish a noisy version of the dataset, transformed according to Differential Privacy (DP), the de-facto standard for releasing statistical data.
The goal of a DP mechanism is to ensure privacy while keeping the query answers as accurate as possible. Conventional approaches build DP-compliant representation of a spatial dataset by partitioning the data domain into bins, and then publishing a histogram with the noisy count of points that fall within each bin. These solutions fall short of properly capturing skewness inherent to sparse location datasets, and as a result yield poor accuracy. Instead, in this work, we propose a paradigm shift towards learned representations of data. We learn powerful machine learning (ML) models that exploit patterns within location datasets to provide more accurate location services. We focus on key location queries that are the building blocks of many processing tasks.
For population-density maps that support range count queries on snapshot releases, where each individual contributes a single location report, we design a neural database system called Spatial Neural Histograms (SNH). We model spatial data such that density features are preserved, even when DP-compliant noise is added. As such, learning can be used to also combat data modelling errors, present in DP setting. SNH employs a set of neural networks that learn from diverse regions of the dataset and at varying granularities, leading to superior accuracy. More often however, spatio-temporal density information is required for utility (e.g., in modeling COVID hotspots). As a result, the released statistics must continually capture population counts in small areas for short time periods.
When releasing multiple snapshots, individuals may contribute multiple reports to the same dataset. The ability of an adversary to breach privacy increases significantly, and a shift to user-level privacy is necessitated. We employ the pattern recognition power of neural networks, specifically Variational Auto-Encoders (VAE), to reduce the noise introduced by DP mechanisms such that accuracy is increased, while the privacy requirement is still satisfied. The system called VAE based Data Release (VDR) enables longitudinal release of location data. In addition, by limiting the number of location reports from any single user, we reduce the noise needed by DP mechanisms, while ensuring data utility is not compromised. As a post-processing step we propose statistical estimators to adjust density information to account for the fact that they are calculated on a subset of the actual data.
Lastly, recommending a user the next-location to visit is fundamentally more challenging. When considering trajectories exhibiting short and non-repetitive spatial and temporal regularity, capturing user-user correlations requires learning sophisticated ML models that have high dimensionality in the intermediate layers of the neural networks. We propose a technique called Private Location Prediction (PLP). Central to our approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to accelerate learning and improve model accuracy.
Extensive experimental results on real datasets with heterogeneous characteristics show that our proposed approaches---SNH, VDR and PLP--- significantly outperform the state of the art.
WebCast Link: https://usc.zoom.us/j/7125668882
Audiences: Everyone Is Invited
Contact: Lizsl De Leon