Logo: University of Southern California

Events Calendar


  • PhD Defense - Prithviraj Banerjee

    Fri, May 02, 2014 @ 10:30 AM - 12:30 PM

    Thomas Lord Department of Computer Science

    University Calendar


    Ph.D. Candidate: Prithviraj Banerjee

    Title: Incorporating Aggregate Feature Statistics in Structured Dynamical Models for Human Activity Recognition

    Date: Friday, May 2nd, 2014
    Time: 10:30AM
    Location: PHE 223

    Committee:
    Ram Nevatia (Chair)
    Gerard Medioni
    C. -C. Jay Kuo (outside member)

    Abstract:

    Human action recognition in videos is a central problem of computer vision, with numerous applications in the fields of video surveillance, data mining and human computer interaction. There has been considerable research in classifying pre-segmented videos into a single activity class, however there has been comparatively less progress on activity detection in un-segmented and un-aligned videos containing medium to long term complex events. Our objective is to develop efficient algorithms to recognize human activities in monocular videos captured from static cameras in both indoor and outdoor scenarios. Our focus is on detection and classification of complex human events in un-segmented continuous videos, where the top level event is composed of primitive action components, such as human key-poses or primitive actions. We assume a weakly supervised setting, where only the top level event labels are provided for each video during training, and the primitive action components are not labeled.

    We require our algorithm to be robust to missing frames, temporary occlusion of body parts, background clutter, and to variations in activity styles and durations. Furthermore, our models gracefully scale to complex events containing human-human and human-object interactions, while not assuming access to perfect pedestrian or object detection results.

    We have proposed and adopted the design philosophy of combining global statistics of local spatio-temporal features, with the high level structure and constraints provided by dynamic probabilistic graphical models. We present four different algorithms for activity recognition, spanning the feature-classifier hierarchy in terms of their semantic and structure modeling capability. Firstly, we present a novel Latent CRF classifier for modeling the local neighborhood structure of spatio-temporal interest point features in terms of code-word co-occurrence statistics, which captures the local temporal dynamics present in the action. In our second work, we present a multiple kernel learning framework to combine human pose estimates generated from a collection of kinematic tree priors, spanning the range of expected pose dynamics in human actions. In our third work, we present a latent CRF model for automatically identifying and inferring the temporal location of key-poses of an activity, and show results on detecting multiple instances of actions in continuous un-segmented videos. Lastly, we propose a novel dynamic multi-state feature pooling algorithm which identifies the discriminative segments of a video, and is robust to arbitrary gaps between state transitions, and also to significant variations in state durations. We evaluate our models on short, medium and long term activity datasets, and show state of the art performance on both classification, detection and video streaming tasks.

    Location: 223

    Audiences: Everyone Is Invited

    Contact: Lizsl De Leon

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar