Fri, Mar 25, 2022 @ 11:00 AM - 12:30 PM
Thomas Lord Department of Computer Science
Time: 11AM - 12:30PM, March 25th, 2022
Committee Members: Salman Avestimehr (Chair), Mahdi Soltanolkotabi, Murali Annavaram, Ram Nevatia, Xiang Ren
Zoom Link: https://usc.zoom.us/my/usc.chaoyanghe
Title: Federated and Distributed Machine Learning at Scale: From Systems to Algorithms to Applications
Federated learning (FL) is a machine learning paradigm that many clients (e.g. mobile/IoT devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g., service provider), while keeping the training data decentralized. It has shown huge potential in mitigating many of the systemic privacy risks, regulatory restrictions, and communication costs resulting from traditional, centralized machine learning and data science approaches in healthcare, finance, smart city, autonomous driving, and the Internet of things. Though it is promising, landing FL into trustworthy data-centric AI infrastructure faces many realistic challenges from learning algorithms (e.g., data heterogeneity, label deficiency) and distributed systems (resource constraints, system heterogeneity, security, privacy, etc.), requiring interdisciplinary research in machine learning, distributed systems, and security/privacy. Driven by this goal, this thesis focuses on scaling federated and distributed machine learning end-to-end, from algorithms to systems to applications.
In the first part, we focus on the design of the distributed system for federated and distributed machine learning. We propose FedML, a widely adopted open-source library for federated learning, and PipeTransformer, which leverages automated elastic pipelining for efficient distributed training of Transformer models. FedML supports three computing paradigms: on-device training using a federation of edge devices, distributed training in the cloud that supports exchanging of auxiliary information beyond just gradients, and single-machine simulation of a federated learning algorithm. FedML also promotes diverse algorithmic research with flexible and generic API design and comprehensive reference baseline implementations (optimizer, models, and datasets). In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically allocate resources to train the remaining active layers. More specifically, PipeTransformer automatically excludes frozen layers from the pipeline, packs active layers into fewer GPUs, and forks more replicas to increase data-parallel width.
In the second part, we propose a series of algorithms to scale up federated learning by breaking many aforementioned constraints, such as FedGKT, an edge-cloud collaborative training for resource-constrained clients, FedNAS, a method towards automation on invisible data via neural architecture search, SpreadGNN, effective training on decentralized topology, SSFL, tackling label deficiency via personalized self-supervision, and LightSecAgg, the lightweight and versatile secure aggregation. Most algorithms are compatible with each other. Specially, we unified all implementations under the FedML framework. Therefore, under the complex constraints of the real world, the orchestration of these algorithms has the potential to greatly enhance the scalability of federated learning.
Finally, we further propose FedML Ecosystem, which is a family of open research libraries to facilitate federated learning research in diverse application domains. FedNLP (Natural Language Processing), FedCV (Computer Vision), FedGraphNN (Graph Neural Networks), and FedIoT (Internet of Things). Compared with TFF and LEAF, FedNLP and FedCV greatly enrich the diversity of data sets and learning tasks. FedNLP supports various popular task formulations in the NLP domain, such as text classification, sequence tagging, question answering, seq2seq generation, and language modeling. FedCV can help researchers evaluate the three most representative tasks: image classification, image segmentation, and object detection. Moreover, FedGraphNN is the first FL research platform for analyzing graph-structured data using Graph Neural Networks in a distributed computing manner, filling the gap between federated learning and the data mining field. Going beyond traditional AI applications, FedIoT further extends FL to perform in wireless communication (e.g., 5G) and mobile computing (e.g., embedded IoT devices such as Raspberry PI, smartphones running on Android OS).
WebCast Link: https://usc.zoom.us/my/usc.chaoyanghe
Audiences: Everyone Is Invited
Contact: Lizsl De Leon