USC - Viterbi School of Engineering

Oct
21

PhD Defense- Jongeun Jun
Mon, Oct 21, 2013 @ 12:00 PM - 01:30 PM
Thomas Lord Department of Computer Science
University Calendar

Phd Candidate: Jongeun Jun

10/21/13
12pm-1:30pm
SAL 222

Committee:

Dennis McLeod (chair)
Cyrus Shahabi
Daniel O'Leary (outside member)

DBSSC : Density-Based Searchspace-limited Subspace Clustering

We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent
advancement of microarray technologies, we focus our attention on gene expression datasets mining. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of
symbols that represents the maximal subspace cluster for the gene pair.
This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolized genes, we present an efficient subspace clustering algorithm that is linearly scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Experimental
results indicate that the proposed method efficiently identifies
co-expressed gene subspace clusters for a yeast cell cycle dataset.

Furthermore, we incorporate density-based approach into subspace clustering. In the density-based clustering approach,
points with high density (i.e., core genes) and points with low
density (i.e., outlier points) are identified. Non-core, non-outlier points are defined as border points. Since a core point has high density, it is expected to locate well inside the cluster. Thus, core points and surrounding border points form multiple subspace clusters. That is, core points have high potential in belonging to multiple subspace clusters. Therefore, instead of performing subspace clustering on whole datasets, by performing subspace clustering on only core points, we can further reduce running time drastically. After that, border points are used to expand the cluster structure by assigning them to the most relevant cluster. Coupling with density-based approach, experimental results indicate that our
subspace clustering improves running time significantly.

Location: Henry Salvatori Computer Science Center (SAL) - 222
Audiences: Everyone Is Invited

Contact: Lizsl De Leon

This event is open to all eligible individuals. USC Viterbi operates all of its activities consistent with the University's Notice of Non-Discrimination. Eligibility is not determined based on race, sex, ethnicity, sexual orientation, or any other prohibited factor.
Add to Google Calendar

Return to Calendar

Events Calendar

PhD Defense- Jongeun Jun