-
PhD Defense - Bo Wu
Wed, Oct 21, 2015 @ 10:00 AM - 12:00 PM
Thomas Lord Department of Computer Science
University Calendar
PHD Defense -Bo Wu
Wed, Oct 21, 2015 @ 10:00am-12:00pm
SAL 213
PhD candidate: Bo Wu
Committee:
Craig A. Knoblock (Chair)
Cyrus Shahabi
Daniel O'Leary
Title: Iteratively Learning Data Transformation Programs from Examples
Abstract:
Data transformation is an essential preprocessing step in most data analysis applications. It often requires users to write many trivial and task-dependent programs, which is time consuming and requires the users to have certain programming skills. Recently, programming-by-example (PBE) approaches enable users to generate data transformation programs without coding. The user provides the PBE approaches with examples (input-output pairs). These approaches then synthesize the programs that are consistent with the given examples.
However, real-world datasets often contain thousands of records with various formats. To correctly transform these datasets, existing PBE approaches typically require users to provide multiple examples to generate the correct transformation programs. These approaches' time complexity grows exponentially with the number of examples and in a high polynomial degree with the length of the examples. Users have to wait a long time to see any response from the systems when they work on moderately complicated datasets. Moreover, existing PBE approaches also lack the support for users to verify the correctness of the transformed results so that they can determine whether they should stop providing more examples.
To address the challenges of existing approaches, we propose an approach that generates programs iteratively, which exploits the fact that users often provide multiple examples iteratively to refine programs learned from previous iterations. By collecting and accumulating key information across iterations, our approach can efficiently generate the new transformation programs by avoiding redundant computing. Our approach can also recommend potentially incorrect records for users to examine, which can save users effort in verifying the correctness of the transformation results.
To validate the approach in this thesis, we evaluated IPBE, the implementation of our iterative programming-by-example approach, against several state-of-the-art alternatives on various transformation scenarios. The results show that users of our approach used less time and achieved higher correctness compared to other alternative approaches.
Location: Henry Salvatori Computer Science Center (SAL) - 213
Audiences: Everyone Is Invited
Contact: Lizsl De Leon