Title Data Mining via Spectral Analysis
Speaker Dr. Frank McSherry, Microsoft Research, SVC
Time Friday, December 3, 11:00am
Location VKC 156
Slides in PDF
Frank's Thesis in PDF


Abstract

Much attention has recently been paid to the analysis of data by first casting the data set as a matrix, and then considering its principal eigenvectors. This spectral analysis of data has been applied successfully in many domains; examples include Google's PageRank algorithm, Latent Semantic Analysis, and Kernel PCA. However, the success of spectral analysis is largely empirical, with little analytic understanding of why this approach works so well.

In this talk I will present a general framework for many data mining problems and show how this framework justifies the application of spectral analysis.  Specifically, we will see that the problems of collaborative filtering, web search, and data clustering fall into the framework and give rigorous bounds on the performance of spectral analysis on each problem.  Furthermore, we will see how we can immediately translate ideas and understanding developed through this framework into new algorithms which address problems in graph theory and numerical analysis.

This research represents joint work with Yossi Azar and Amos Fiat at Tel Aviv University, Dimitris Achlioptas at Microsoft Research, and Anna Karlin and Jared Saia at the University of Washington.