Logo: University of Southern California

Events Calendar



Select a calendar:



Filter September Events by Event Type:


SUNMONTUEWEDTHUFRISAT
31
1
2
3
4
5
6

7
8
9
10
13

14
15
16
17
18
20

21
22
23
24
25
27

28
29
30
1
2
4


Events for September 26, 2014

  • AI SEMINAR - Query-driven approach to entity resolution

    Fri, Sep 26, 2014 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Dmitri V. Kalashnikov , UCI

    Talk Title: Query-driven approach to entity resolution

    Series: AISeminar

    Abstract: The significance of data quality research is motivated by the observation that the effectiveness of data-driven technologies such as decision support tools, data exploration, analysis, and scientific discovery tools is closely tied to the *quality of data* to which such techniques are applied. It is well recognized that the outcome of the analysis is only as good as the data on which the analysis is performed. That is why today organizations spend a substantial percentage of their budgets on cleaning tasks such as removing duplicates, correcting errors, and filling missing values, to improve data quality prior to pushing data through the analysis pipeline.

    Given the critical importance of the problem, many efforts, in both industry and academia, have explored systematic approaches to addressing the cleaning challenges. This talk focuses primarily on the *entity resolution* challenge that arises because objects in the real world are referred to using references or descriptions that are not always unique identifiers of the objects, leading to ambiguity.

    Traditionally, data cleaning is performed as a preprocessing step when creating a data warehouse prior to making it available to analysis -- an approach that works well under standard settings. Cleaning the entire data warehouse, however, can require a considerable amount of time and significant computing resources. Hence, this approach is often suboptimal for many modern query-driven and Big Data applications that need to analyze only small portions of the entire dataset and produce answers "on-the-fly" and in real-time.

    To address these new cleaning challenges, we have developed a *Query-Driven Approach (QDA)* to data cleaning. QDA exploits the specificity and semantics of the given SQL selection query to significantly reduce the cleaning overhead by resolving only those records that may influence the answer of the query. It computes answers that are equivalent to those obtained by first using a regular cleaning algorithm, and then querying on top of the cleaned data. However, in many cases QDA can compute these answers much more efficiently.

    A key concept driving the QDA approach is that of *vestigiality*. A cleaning step (i.e., call to the resolve function for a pair of records) is called vestigial (redundant) if QDA can guarantee that it can still compute correct final answer without knowing the outcome of this resolve. We formalize the concept of vestigiality in the context of a large class of SQL selection queries and develop techniques to identify vestigial cleaning steps. Technical challenges arise since vestigiality, as we will show, depends on several factors, including the specifics of the cleaning function (e.g., the merge function used if two objects are indeed duplicate entities), the predicate associated with the query, and the query answer semantics of what the user expects as the result of the query. We show that determining vestigiality is NP-hard and propose an effective approximate solution to test for vestigiality that performs very well in practice.

    The comprehensive empirical evaluation of the proposed approach demonstrates its significant advantage in terms of efficiency over traditional techniques for query-driven applications.

    Biography: http://www.ics.uci.edu/~dvk/CV/dvk_bio.txt

    Dmitri V. Kalashnikov is an Associate Adjunct Professor of Computer Science at the University of California, Irvine. He received his PhD degree in Computer Science from Purdue University in 2003. He received his diploma in Applied Mathematics and Computer Science from Moscow State University, Russia in 1999, graduating summa cum laude.

    His general research interests include databases and data mining. Currently, he specializes in the areas of entity resolution & data quality, and real-time situational awareness. In the past, he has also contributed to the areas of spatial, moving-object, and probabilistic databases.

    He has received several scholarships, awards, and honors, including an Intel Fellowship and Intel Scholarship. His work is supported by the NSF, DH&S, and DARPA.

    Host: Greg Ver Steeg

    Webcast: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=dd8c0e0eef1749fdb4bc581af408d8561d

    Location: Information Science Institute (ISI) - 1135

    WebCast Link: http://webcasterms1.isi.edu/mediasite/Viewer/?peid=dd8c0e0eef1749fdb4bc581af408d8561d

    Audiences: Everyone Is Invited

    Contact: Alma Nava / Information Sciences Institute


    This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.

  • NL Seminar-Semantic Parsing at Google

    Fri, Sep 26, 2014 @ 03:00 PM - 04:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars


    Speaker: Bill MacCartney, (Google/Stanford)

    Talk Title: Semantic Parsing at Google

    Series: Natural Language Seminar

    Abstract: With the shift from desktop to mobile, and the rise of voice-driven UIs, a growing proportion of the Google query stream is not well-served by conventional keyword-based information retrieval. More and more queries use natural language ("when does walgreens close"), seek answers not found on any web page ("how do i get to work from here"), or demand action rather than information ("text my wife i'm 10 minutes late"). Satisfying such queries requires semantic parsing, that is, mapping the query into a structured, machine-readable representation of meaning. In this talk, I will give an overview of the techniques Google has developed to address the problem of semantic parsing, and discuss some of the challenges that remain. I'll also highlight differences between academia and industry in how the problem is conceived.



    Biography: Bill MacCartney is a Senior Research Scientist at Google, working primarily on semantic parsing. He is also a Consulting Assistant Professor of Computer Science at Stanford. For more info: http://nlp.stanford.edu/~wcmac/

    Host: Aliya Deri and Kevin Knight

    More Info: http://nlg.isi.edu/nl-seminar/

    Location: Information Science Institute (ISI) - 11th Flr Conf Rm # 1135, Marina Del Rey

    Audiences: Everyone Is Invited

    Contact: Peter Zamar

    Event Link: http://nlg.isi.edu/nl-seminar/


    This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.