-
CS Colloquium
Tue, Jan 25, 2011 @ 11:00 AM - 12:30 PM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Tudor Dumitras, Senior Research Engineer, Symantec Research Lab
Talk Title: Improving the Dependability of Distributed Systems through AIR Software Upgrades
Abstract: Traditional fault-tolerance approaches concentrate almost entirely on responding to, avoiding, or tolerating unexpected faults or security violations. However, scheduled events, such as software upgrades, account for most of the system unavailability and often introduce data loss or latent errors. In this talk, I will present two empirical studies that identify the leading causes of upgrade failure---breaking hidden dependencies---and of planned downtime---hanging database schemas---in distributed enterprise systems. I will also describe Imago, a system that incorporates end-to-end mechanisms for improving the dependability of large-scale distributed systems that undergo major software upgrades.
The key idea is to isolate the production system from the upgrade operations in order to avoid breaking hidden dependencies. The end-to-end upgrade is an atomic operation, executed online even when performing complex schema and data conversions. Imago harnesses the opportunities provided by emerging technologies, such as cloud computing, to simplify major enterprise-system upgrades and to improve their dependability. This approach separates the functional aspects of the upgrade (e.g., data migration) from the mechanisms for online upgrade (e.g., atomic switchover), enabling an upgrade-as-a-service model.
Biography: Tudor Dumitras is a Senior Research Engineer at Symantec Research Labs. At SRL, he is building the Worldwide Intelligence Network Environment (WINE), which will enable researchers in academia to analyze field data, collected at Symantec. The goal of this project is to create a standard benchmark for cloud-security research. Tudor received a Ph.D. degree from Carnegie Mellon University. His prior research focused on improving the dependability of large-scale distributed systems (addressing operator errors during software upgrades), of enterprise systems (addressing the predictability of fault-tolerant middleware), and of embedded systems (addressing soft errors in networks-on-chip). He received the 2009 John Vlissides Award, from ACM SIGPLAN, for showing significant promise in applied software research, and the Best Paper Award at ASP-DAC'03. He holds undergraduate degrees from the Ecole Polytechnique in Paris and the "Politehnica" University in Bucharest.
Host: Prof. Ramesh Govindan
Location: Charles Lee Powell Hall (PHE) - 631
Audiences: Everyone Is Invited
Contact: Kanak Agrwal