| Title | Data Mining via Spectral Analysis |
| Speaker | Dr. Frank McSherry, Microsoft Research, SVC |
| Time | Friday, December 3, 11:00am |
| Location | VKC 156 |
| Slides in PDF | |
| Frank's Thesis in PDF |
Much attention has recently been paid to the analysis of data by first casting the data set as a matrix, and then considering its principal eigenvectors. This spectral analysis of data has been applied successfully in many domains; examples include Google's PageRank algorithm, Latent Semantic Analysis, and Kernel PCA. However, the success of spectral analysis is largely empirical, with little analytic understanding of why this approach works so well.
In this talk I will present a general framework for many data mining problems and show how this framework justifies the application of spectral analysis. Specifically, we will see that the problems of collaborative filtering, web search, and data clustering fall into the framework and give rigorous bounds on the performance of spectral analysis on each problem. Furthermore, we will see how we can immediately translate ideas and understanding developed through this framework into new algorithms which address problems in graph theory and numerical analysis.
This research represents joint work with Yossi Azar and Amos Fiat at Tel Aviv University, Dimitris Achlioptas at Microsoft Research, and Anna Karlin and Jared Saia at the University of Washington.