CS Colloquium: Seo Jin Park (MIT CSAIL) - Towards Interactive Big Data Processing Through Flash Burst Parallel Systems
Thu, Mar 24, 2022 @ 02:00 PM - 03:00 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Seo Jin Park , MIT CSAIL
Talk Title: Towards Interactive Big Data Processing Through Flash Burst Parallel Systems
Series: CS Colloquium
Abstract: Today, many organizations store big data on the cloud and lease relatively small clusters of instances to run analytics queries, train machine learning models, and more. However, the exponential data growth, combined with the slowdown of Moore's law, makes it challenging (if not impossible) to run such big data processing tasks in real-time. Most applications run big data workloads on timescales of several minutes or hours and resort to complex, application-specific optimizations to reduce the amount of data processing required for interactive queries. This design pattern hurts developer productivity and restricts the scope of applications that can use big data. However, as we have many servers in a cloud datacenter, a natural question is "can we borrow thousands of servers briefly to accelerate big data processing enough to be interactive?"
In this talk, I'll share my vision to enable massively parallel data processing even for very short-duration (1-10 ms), which I call "flash bursts." This will empower interactive, real-time applications (e.g., cyber security attack defense, self-driving cars or drones, etc) to utilize much larger data than before. For this moonshot, I take a two-pronged approach. First, I restructure important big data applications (analytics and DNN training) so that they can run efficiently in a flash burst fashion. On this prong, the talk will focus on how I efficiently scaled distributed sorting to 100+ servers even for a 1-millisecond time budget. Second, I rebuild various layers in distributed systems to reduce overheads of flash burst scaling. On this prong, I will focus on how I removed the overheads of consistent replication.
This lecture satisfies requirements for CSCI 591: Research Colloquium
Biography: Seo Jin Park is a postdoctoral researcher at MIT CSAIL. He received a Ph.D. in Computer Science from Stanford University in 2019, advised by John Ousterhout. He is broadly interested in distributed systems, focusing on low-latency systems: scaling low-latency data processing, optimizing consensus protocols (both standard and byzantine), suppressing tail-latencies, and building efficient performance debugging tools. His Ph.D. study was supported by Samsung Scholarship.
Host: Barath Raghavan
Location: Michelson Center for Convergent Bioscience (MCB) - 101
Audiences: By invitation only.
Contact: Assistant to CS chair