Tue, Oct 03, 2017 @ 02:30 PM - 05:00 PM
Ming Hsieh Department of Electrical and Computer Engineering
Luis Marujo is a Research Scientist at Snap Inc. Prior to joining the Snap Research team in 2016, he completed his dual-degree Ph.D. in Language Technologies (\'15) from Carnegie Mellon University (CMU) and the Instituto Superior TÃ©cnico (IST), Portugal. He also obtained a MSc. (\'12) in Language Technologies from CMU. He holds MSc. (\'09) and BSc. (\'07) in Computer Science and Engineering from IST. He was awarded the best poster award at the S3MR 2011.
Emojis are very popular ideograms used to either concisely communicate or complement the information of a text with an emotion or visual concept. Emojis are mainly use on mobile devices due to the availability of intuitive emoji keyboards. They are very popular in social media, but they have not been explored from a Speech Processing point of view. In this work, we investigate mobile-friendly multimodal approaches that use audio, speech, and textual captions to predict emojis in public video Snaps. The key idea of pipeline is to translate the input signals into text and keywords in order to conduct the analysis in a textual space. Our emoji prediction pipeline includes speech transcription, keyword spotting and dictionary based classification.
In addition we use music detection and language identification to filter the content. Our experimental results indicate that our approach using speech transcription or a list of pre-selected words from keyword spotting provide comparable information to what is found in textual captions for emoji prediction. This is an important result as it allows us to suggest emojis to snap videos before the user starts typing a caption. Our initial results also indicate that combining both textual captions with speech output can improve emoji recommendation results beyond using only textual captions.
Audiences: Everyone Is Invited
Contact: Benjamin Paul