-
PhD Defense - Weiwei Chen
Thu, Jun 05, 2014 @ 08:30 AM - 10:30 AM
Thomas Lord Department of Computer Science
University Calendar
Title: Workflow Restructuring Techniques for Improving the Performance for Scientific Workflows Executing in Distributed Environments
PhD Candidate: Weiwei Chen
Committee: Ewa Deelman(Chair), Viktor K. Prasanna (External member), Aiichiro Nakano
Time: Jun 5, 8:30am-10:30am
Location: RTH 306
Abstract Scientific workflows are a means of defining and orchestrating large, complex, multi-stage computations that perform data analysis, simulation, visualization, etc. Scientific workflows often involve a large amount of data transfers and computations and require efficient optimization techniques to reduce the runtime of the overall workflow.
Today, with the emergence of large-scale scientific workflows executing in modern distributed environments such as grids and clouds, the optimization of workflow runtime has introduced new challenges that existing optimization methods do not tackle. Traditionally, many existing runtime optimization methods are confined to the task scheduling problem. They do not consider the refinement of workflow structures, system overheads, the occurrence of failures, etc. Refining workflow structures using techniques such as workflow partitioning and task clustering represent a new trend in runtime optimization and can result in significant performance improvement. The runtime improvement of these workflow restructuring methods depends on the ratio of application computation time to the system overheads. Since system overheads in modern distributed systems can be high, the potential benefit of workflow restructuring can be significant.
This thesis argues that workflow restructuring techniques can significantly improve the runtime of scientific workflows executing in modern distributed environments. In particular, we innovate in the area of workflow partitioning and task clustering techniques. Several previous studies also utilize workflow partitioning and task clustering techniques to improve the performance of scientific workflows. However, existing methods are based on the trial-and-error approach and require users' knowledge to tune the workflow performance. For example, many workflow partitioning techniques do not consider constraints on resources used by the workflows such as the data storage. Also, many task clustering methods optimize task granularity at the workflow level without considering data dependencies between tasks. We distinguish our work from other research by modeling a realistic distributed system with overheads and failures and we use real world applications that exhibit load imbalance in their structure and computations. We investigate the key concern of refining workflow structures and propose a series of innovative workflow partitioning and task clustering methods to improve the runtime performance. Simulation-based and real system-based evaluation verifies the effectiveness of our methods.
Location: Ronald Tutor Hall of Engineering (RTH) - 306
Audiences: Everyone Is Invited
Contact: Lizsl De Leon