Inderjit Dhillon, November 21, 2005, 16:30-17:30, ITB 201

Speaker:	Inderjit S. Dhillon
	Department of Computer Sciences
	University of Texas at Austin

Title: Co-clustering, Matrix Approximations and Bregman Divergences

Abstract:

Many applications in data mining and machine learning require the analysis of large two-dimensional matrices. Depending on the application, these matrices can have very different characteristics: in text mining, co-occurrence matrices are large, sparse and non-negative while DNA microarray analysis yields smaller, dense matrices with positive as well as negative entries. In data analysis, it is often desirable (a) to find "low-parameter" matrix approximations, and (b) to "co-cluster" such matrices, i.e., simultaneously cluster rows as well as columns.

In this talk, I will discuss a framework that inextricably links a certain class of matrix approximations with co-clustering. The approximation error can be measured using a non-trivial class of distortion measures, called Bregman divergences, that have connections to the exponential family of probability distributions. Our algorithms allow us to handle a wide variety of matrices and distortion measures within a unifying framework, and are able to efficiently construct good co-clusterings and the corresponding low-parameter matrix approximations. I will conclude by presenting experimental results on text and microarray data.

Main Menu

Login Form

Inderjit Dhillon, November 21, 2005, 16:30-17:30, ITB 201

McMaster University