Logo: University of Southern California

Events Calendar


  • PhD Defense- Jongeun Jun

    Mon, Oct 21, 2013 @ 12:00 PM - 01:30 PM

    Thomas Lord Department of Computer Science

    University Calendar


    Phd Candidate: Jongeun Jun

    10/21/13
    12pm-1:30pm
    SAL 222

    Committee:

    Dennis McLeod (chair)
    Cyrus Shahabi
    Daniel O'Leary (outside member)

    DBSSC : Density-Based Searchspace-limited Subspace Clustering

    We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent
    advancement of microarray technologies, we focus our attention on gene expression datasets mining. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of
    symbols that represents the maximal subspace cluster for the gene pair.
    This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolized genes, we present an efficient subspace clustering algorithm that is linearly scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Experimental
    results indicate that the proposed method efficiently identifies
    co-expressed gene subspace clusters for a yeast cell cycle dataset.

    Furthermore, we incorporate density-based approach into subspace clustering. In the density-based clustering approach,
    points with high density (i.e., core genes) and points with low
    density (i.e., outlier points) are identified. Non-core, non-outlier points are defined as border points. Since a core point has high density, it is expected to locate well inside the cluster. Thus, core points and surrounding border points form multiple subspace clusters. That is, core points have high potential in belonging to multiple subspace clusters. Therefore, instead of performing subspace clustering on whole datasets, by performing subspace clustering on only core points, we can further reduce running time drastically. After that, border points are used to expand the cluster structure by assigning them to the most relevant cluster. Coupling with density-based approach, experimental results indicate that our
    subspace clustering improves running time significantly.

    Location: Henry Salvatori Computer Science Center (SAL) - 222

    Audiences: Everyone Is Invited

    Contact: Lizsl De Leon

    Add to Google CalendarDownload ICS File for OutlookDownload iCal File

Return to Calendar