CS Colloquium: Filip Ilievski (Vrije Universiteit Amsterdam) - Identity of Long-tail Entities in Text: A Knowledge Perspective
Fri, Aug 30, 2019 @ 10:00 AM - 11:00 AM
Thomas Lord Department of Computer Science
Conferences, Lectures, & Seminars
Speaker: Filip Ilievski, Vrije Universiteit (VU) Amsterdam
Talk Title: Identity of Long-tail Entities in Text: A Knowledge Perspective
Series: Computer Science Colloquium
Abstract: Entity linking systems are faced with a complex M-to-N mapping between surface forms in text and instances in a knowledge base, caused by the ambiguity of surface forms, the variance of the instances, and their frequency/popularity interplays, well-explained by pragmatic principles such as the Gricean maxims (Grice, 1975). Although current entity linkers report high accuracy scores, in this talk I will describe phenomena that capture large differences in performance between \'head\' and \'tail\' entities. To improve performance on the tail entities, I will argue that we need: to revisit evaluation (part I) and to employ knowledge and reason over it in a more systematic way (part II).
During the first half, I will depict how the current evaluation datasets, as well as the metrics employed, obfuscate the difference between head and tail, and discourages focus on tail entities. I will propose recommended actions and examples for long-tail-focused evaluation.
In the second half of my talk, I will present our efforts to generate expectations on long-tail entities through building neural profiling machines on top of background knowledge from Wikidata. In addition to an intrinsic evaluation, these profiling techniques are evaluated extrinsically on clustering NIL entities. I will discuss how an extension of this work can be used to capture commonsense knowledge and act as an active component in future reading machines.
This lecture satisfies requirements for CSCI 591: Research Colloquium. Please note, due to limited capacity in RTH 105, seats will be first come first serve.
Biography: Filip Ilievski is a Postdoctoral Researcher in Natural Language Processing at Vrije Universiteit (VU) Amsterdam, and closely affiliated with the Knowledge Representation and Reasoning group at the same University. His research investigates how systematic and extensive use of knowledge can help machines to deal with the \'long-tail\' (knowledge scarcity and ambiguity) of human communication. To do so, he combines ideas from Information Extraction, Knowledge Graphs, and Machine Learning.
He developed LOTUS (Ilievski et al., 2016a), the largest publicly available index over the Linked Data cloud at the time, which received an award at the Semantics conference in 2016. Later, he collaborated with prof. Ed Hovy at CMU on building neural generalization models (\'profiling machines\') over Linked Data knowledge and applying them to cluster long-tail entities. As part of his research on measuring and improving biases in NLP evaluations, he co-organized a SemEval competition on \'Counting Events and Participants in the Long Tail\' in 2018 (Ilievski et al., 2016b, Postma et al., 2018).
Filip Ilievski authored over 20 publications about these topics in peer-reviewed international journals and conference proceedings, including COLING, ESWC, and SWJ.
Host: Xiang Ren
Audiences: Everyone Is Invited
Contact: Computer Science Department