-
PhD Defense - Jiaping Zhao
Thu, Oct 20, 2016 @ 11:00 AM - 12:00 PM
Thomas Lord Department of Computer Science
University Calendar
Title: Toward situation awareness: activity and object recognition
Time: Oct. 20 (Thursday), 10am ~ 12pm
Location: HNB 107
PhD Candidate: Jiaping Zhao
Committee:
Laurent Itti (chair)
Aiichiro Nakano
Bartlett Mel
Abstract:
Situation awareness focuses on modelling and understanding the user's environment, and helps the user to be aware of his current situation and anticipate future events. Often, situation awareness is divided into three levels: environmental perception, situation understanding and cognitive assistance. Here, we focus on the second level -"situation understanding", to understand the user's situation by analyzing and interpreting the perceived data.
Nowadays, mobile devices with embedded IMU sensors and cameras are ubiquitous: IMU sensors capture streams of acceleration and angular speed records, while camera records video streams. The former steams are multi-variate time series, while the latter are image sequences. At current stages, we analyze time series and image frames separately to understand the user's situation: concretely, we infer user's current activities from time series, while recognize objects from images.
First, we address activity recognition from time series. Activity recognition is naturally formulated as a time series classification problem. To achieve this goal, we developed several algorithms trying to address existing problems. First, we introduced a time series segmentation algorithm, which decomposes heterogeneous time series into homogenous segments. Then we proposed a new sequence alignment algorithm, named shapeDTW, which improves the traditional dynamic time warping (DTW) alignment by taking local temporal shapes into account. To better compare the similarity between temporal sequences, we proposed to learn multiple local distance metrics, and the measured DTW distance under the learned metrics, instead of under the default Euclidean metric, performs significantly for time series classification.
Then we did object recognition from natural images. Although contemporary deep convolutional networks advanced objection recognition by a big step, the underneath mechanism is still largely unclear. Here, we attempted to explore the mechanism of object recognition using a large-scale image dataset, iLab20M, which contains 20 million images shot under controlled turntable settings. Compared with the ImageNet dataset, iLab20M is parametric, with detailed pose and lighting information for each image. Here we showed the auxiliary information could benefit object recognition. First, we formulate object recognition in a CNN-based multi-task learning framework, designed a specific skip connection pattern, and showed its superiority to single task learning theoretically and empirically. Moreover, we introduced an two-stream CNN architecture, which disentangles object identity from its instantiation factors (e.g., pose, lighting), and learned more discriminative identity representations. We experimentally showed that the learned feature from iLab20M generalizes well to other datasets, including ImageNet and Washington RGB-D.
Location: 107
Audiences: Everyone Is Invited
Contact: Lizsl De Leon