-
PhD Defense - Haoyu Huang
Thu, Jul 16, 2020 @ 02:00 PM - 04:00 PM
Thomas Lord Department of Computer Science
University Calendar
Ph.D. Defense - Haoyu Huang 7/16 2:00 pm "Nova-LSM: A Distributed, Component-based LSM-tree Data Store"
Ph.D. Candidate: Haoyu Huang
Date: Thursday, July 16, 2020
Time: 2:00 pm - 4:00 pm
Committee: Shahram Ghandeharizadeh (chair), Murali Annavaram, Jyotirmoy V. Deshmukh
Title: Nova-LSM: A Distributed, Component-based LSM-tree Data Store
Zoom: https://usc.zoom.us/j/99943500149
Google Meet (only if there are issues with Zoom): meet.google.com/ruu-jjiu-fbk
Abstract:
The cloud challenges many fundamental assumptions of today's monolithic data stores. It offers a diverse choice of servers with alternative forms of processing capability, storage, memory sizes, and networking hardware. It also offers fast network between servers and racks such as RDMA. This motivates a component-based architecture that separates storage from processing for a data store. This architecture complements the classical shared-nothing architecture by allowing nodes to share each other's disk bandwidth. This is particularly useful with a skewed pattern of access to data by scattering a large file across many disks instead of storing it on one disk.
This emerging component-based software architecture constitutes the focus of this dissertation. We present design, implementation, and evaluation of Nova-LSM as an example of this architecture. Nova-LSM is a component-based design of LSM-tree using RDMA. Its components implement the following novel concepts. First, they use RDMA to enable nodes of a shared-nothing architecture to share their disk bandwidth and storage. Second, they construct ranges dynamically at runtime to parallelize compaction and boost performance. Third, they scatter blocks of a file across an arbitrary number of disks and use power-of-d to scale. Fourth, the logging component separates availability of log records from their durability. These design decisions provide for an elastic system with well-defined knobs that control its performance and scalability characteristics. We present an implementation of these designs using LevelDB as a starting point. Our evaluation shows Nova-LSM scales and outperforms its monolithic counterpart by several orders of magnitude. This is especially true with workloads that exhibit a skewed pattern of access to data.WebCast Link: https://usc.zoom.us/j/99943500149
Audiences: Everyone Is Invited
Contact: Lizsl De Leon