Thu, Aug 25, 2022 @ 01:00 PM - 03:00 PM
Thomas Lord Department of Computer Science
PhD Candidate: Chi Zhang
Title: Acceleration of Deep Reinforcement Learning: Efficient Algorithms and Hardware Mapping
zoom meeting ID: 243 580 6897
Dr. Viktor Prasanna (chair)
Dr. Aiichiro Nakano (CS department)
Dr. Paul Bogdan (Non-CS department)
Despite the recent success of Deep Reinforcement Learning (DRL) in game playing, robotics manipulation and data center cooling, training DRL agents takes a tremendous amount of time and computation resources. This is because it requires collecting a large amount of data by interacting with the environment and a large amount of policy updates via Stochastic Gradient Descent (SGD) to converge.
To reduce the amount of data to collect, existing work adopts model-based DRL that learns a world model using the data collected by interacting with the environment. Then, it uses the world model to generate synthetic data to perform policy updates. State-of-the-art approaches generate synthetic data by uniformly sampling initial states. This generates a large amount of similar data and makes each policy update less efficient. To accelerate performing policy updates, state-of-the-art hardware mappings of DRL propose efficient customized hardware designs on FPGA. However, most of the work is only applicable for a specific range of input parameters. To further increase the speed to perform policy updates when the input batch size of the neural network is large, existing works split the input batch into multiple sub-batches and adopt multiple learners to process each sub-batch on a learner concurrently. However, the synchronization overhead including data transfer and gradient averaging significantly impairs the scalability of existing approaches.
In this work, we address these limitations by developing efficient algorithms and hardware mappings. First, we propose Maximum Entropy Model Rollouts (MEMR) that generates diverse synthetic data by prioritized sampling of the initial states such that the entropy of the generated synthetic data is maximized. We mathematically derived the maximum entropy sampling criteria assuming that the synthetic data distribution is Gaussian. To accomplish this criteria, we utilize a Prioritized Replay Buffer. Second, we propose a framework for mapping DRL algorithms with a Prioritized Replay Buffer onto heterogeneous platforms consisting of a multi-core CPU, a GPU and a FPGA. We develop specific accelerators for each primitive on CPU, FPGA and GPU. Given a DRL algorithm input parameters, our design space exploration automatically chooses the optimal mapping of various primitives based on an analytical performance model.
Finally, we propose Scalable Policy Optimization (SPO) that improves the scalability of existing multi-learner DRL by reducing the synchronization overhead via local Stochastic Gradient Descent. Our experimental evaluations on widely used benchmark environments suggest i) MEMR reduces the number of policy updates to converge compared with state-of-the-art model-based DRL; ii) our framework for hardware mapping achieves superior policy updates per second compared with other mapping methods; iii) SPO achieves nearly linear scalability as the number of learners increases.
Audiences: Everyone Is Invited
Contact: Lizsl De Leon