Logo: University of Southern California

Events Calendar


  • PhD Defense - Bo Wu

    Wed, Oct 21, 2015 @ 10:00 AM - 12:00 PM

    Thomas Lord Department of Computer Science

    University Calendar


    PHD Defense -Bo Wu
    Wed, Oct 21, 2015 @ 10:00am-12:00pm
    SAL 213
    PhD candidate: Bo Wu

    Committee:
    Craig A. Knoblock (Chair)
    Cyrus Shahabi
    Daniel O'Leary

    Title: Iteratively Learning Data Transformation Programs from Examples

    Abstract:
    Data transformation is an essential preprocessing step in most data analysis applications. It often requires users to write many trivial and task-dependent programs, which is time consuming and requires the users to have certain programming skills. Recently, programming-by-example (PBE) approaches enable users to generate data transformation programs without coding. The user provides the PBE approaches with examples (input-output pairs). These approaches then synthesize the programs that are consistent with the given examples.

    However, real-world datasets often contain thousands of records with various formats. To correctly transform these datasets, existing PBE approaches typically require users to provide multiple examples to generate the correct transformation programs. These approaches' time complexity grows exponentially with the number of examples and in a high polynomial degree with the length of the examples. Users have to wait a long time to see any response from the systems when they work on moderately complicated datasets. Moreover, existing PBE approaches also lack the support for users to verify the correctness of the transformed results so that they can determine whether they should stop providing more examples.

    To address the challenges of existing approaches, we propose an approach that generates programs iteratively, which exploits the fact that users often provide multiple examples iteratively to refine programs learned from previous iterations. By collecting and accumulating key information across iterations, our approach can efficiently generate the new transformation programs by avoiding redundant computing. Our approach can also recommend potentially incorrect records for users to examine, which can save users effort in verifying the correctness of the transformation results.

    To validate the approach in this thesis, we evaluated IPBE, the implementation of our iterative programming-by-example approach, against several state-of-the-art alternatives on various transformation scenarios. The results show that users of our approach used less time and achieved higher correctness compared to other alternative approaches.

    Location: Henry Salvatori Computer Science Center (SAL) - 213

    Audiences: Everyone Is Invited

    Contact: Lizsl De Leon

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar