Thu, Oct 14, 2021 @ 01:30 PM - 03:00 PM
Thesis Proposal - Ritesh Ahuja
"Differentially Private Model Publishing for Location Services."
Time:1:30-3:00pm PST, Oct 14 (Thursday)
Committee: Cyrus Shahabi, Aleksandra Korolova, Bhaskar Krishnamachari, Muhammad Naveed, Srivatsan Ravi
Zoom link: https://usc.zoom.us/j/7125668882
Mobile users share their coordinates with service providers (e.g., Google Maps) in exchange for receiving services customized to their location. The service providers analyze the data and publish powerful machine learning models for location search and recommendation. Even though individual location data are not disclosed directly, the model itself retains significant amounts of specific movement details, which in turn may leak sensitive information about an individual. To preserve individual privacy, one must first sanitize location data, which is commonly done using the powerful differential privacy (DP) concept. However, existing solutions fall short of properly capturing skewness inherent to sparse location datasets, and as a result yield poor accuracy
In this proposal, we first focus on the Spatial Range Count primitive that forms the basis for many important applications such as improving POI placement, or studying disease spread. We propose a neural histogram system (SNH) that models spatial datasets such that important density and correlation features present in the data are preserved, even when DP-compliant noise is added. SNH employs a set of neural networks that learn from diverse regions of the dataset and at varying granularities, leading to superior accuracy. We also devise a framework for effective system parameter tuning on top of public data, which helps practitioners set important system parameters while avoiding privacy leakages.
Finally, we focus on the next-location recommendation task, which is fundamentally more challenging. Learning user-user correlations from trajectory data requires increasing the dimensionality of intermediate layers in the neural network, and in the context of privacy-preserving learning, it increases data sensitivity, and requires a large amount of noise to be introduced. We briefly show that specific model architectures and data handling processes during DP-compliant training can significantly boost learning accuracy by keeping under tight control the amount of noise required to meet the privacy constraint. We conclude by suggesting ways to learn even richer models that can accurately recommend to a user entire location sequences, as opposed to only the next location.
WebCast Link: https://usc.zoom.us/j/7125668882
Audiences: Everyone Is Invited
Contact: Lizsl De Leon