Logo: University of Southern California

Events Calendar

  • NL Seminar-Fair Comparisons and Fundamental Ideas for Open Vocabulary Generative Language and Translation Models

    Thu, Aug 12, 2021 @ 11:00 AM - 12:00 PM

    Information Sciences Institute

    Conferences, Lectures, & Seminars

    Speaker: Sabrina Mielke, Johns Hopkins Univ

    Talk Title: Fair Comparisons and Fundamental Ideas for Open-Vocabulary Generative Language and Translation Models

    Series: NL Seminar

    Abstract: REMINDER Meeting hosts only admit guests that they know to the Zoom meeting. Hence, you're highly encouraged to use your USC account to sign into Zoom. If you're an outside visitor, please inform nlg DASH seminar DASH admin2 AT isi.edu beforehand so we'll be aware of your attendance and let you in.
    How can we fairly compare the performance of generative language and translation models on multiple languages? We will see how to use probabilistic and information theory based measures, first to evaluate monolingual open vocabulary language models by total bits and then, considering the case of Translationese, pondering the meaning of information and how to use it to compare machine translation models. In both cases, we get a little glimpse at what linguistic and non-linguistic factors might make languages easier or harder for models. The last part of the talk will if time permits propose some somewhat opinionated guidelines for open-vocabulary language modeling, and show work in progress in taxonomizing tokenization methods and the literature around open vocabulary modeling.

    Biography: Sabrina is a PhD student at the Johns Hopkins University and a part-time research intern at HuggingFace, currently researching open vocabulary language modeling for unit discovery in a variety of typologically varying languages. While her pre PhD work focused on formal language theory applied to parsing and translation, during her PhD she published on morphology, fair language model comparison, stochastic romanization at Google AI, and metacognition and calibration for chatbots at Facebook AI Research, co organized workshops and shared tasks around morphology and typology, and is currently involved in the BigScience summer of large language models workshop.

    Host: Jon May and Mozhdeh Gheini

    More Info: https://nlg.isi.edu/nl-seminar/

    Webcast: https://www.youtube.com/watch?v=zIP8XMCtHuM

    Location: Information Science Institute (ISI) - Virtual Only

    WebCast Link: https://www.youtube.com/watch?v=zIP8XMCtHuM

    Audiences: Everyone Is Invited

    Contact: Pete Zamar

    Event Link: https://nlg.isi.edu/nl-seminar/


Return to Calendar