CS599 Spring 09 Selected topics in machine learning

Schedule

DateTopicAssignmentsPresenters
Prior to first classPrepare a 2-min talk about your research;
send the talk (in PDF) to instructor before Jan 11
Jan 13 and Jan 151. Administrivia; course review; self-introduction of students.
2. Structured predictions: conditional random fields (CRFs)
Paper presentation sign-up ; project group forming
Readings
[Lafferty01] [Sutton07a]
F. Sha
Jan 20 and Jan 22Structured predictions: large margin approach Readings
[Taskar02] [Collins02]
[Taskar04] [Tsochantaridis05]
M. Black,J. Binney, A. St. Clair, F. Sha
Jan 27 and Jan 29Structured predictions: other types of CRFs; effect of inexact inferenceReadings
[Quattoni07] [McDonald05]
[Sutton08] [Kulesza08] [Finley08]
A. Metallinou, R. Swanson,K. Lehmann, S. Liu, J. John
Feb 3 and Feb 51. Structured predictions : other structured prediction models
2. Latent variable modeling: Isomap, LLE
Readings
[Shi08] [Sutton07b]
[Tenenbaum00] [Roweis00] [Saul03]
A. St. Clair,J. John,C. H. Kuo, C.H. Kuo,H. Vathsangam
Feb 10 Latent variable modeling: graph spectral methods, isometric embeddingsReadings
[Tenenbaum00] [Roweis00] [Saul03]
C. H. Kuo, C. H. Kuo,H. Vathsangam
Feb 17 and Feb 19Latent variable modeling: diffusion maps, wavelets, and their applications; nonlinear PCA, Gaussian Process latent variable modelsReadings
[Zhang04][Belkin03] [Belkin06]
[Donoho03] [Weinberger06] [Coifman05a]
H. Vathsangam,R. Ghosh, D. Gong, R. Ghosh, M. Kalakrishnan,R. Swanson
Feb 20, Feb 24 , Feb 26Latent variable modeling: diffusion maps, wavelets, and their applications; nonlinear PCA, Gaussian Process latent variable modelsReadings
[Coifman05b] [Mahadevan06]
[Bishop98] [Tipping99] [Lawrence05]
S. Liu,A. Tsiartas,Y. Hidaka, Y. Liu, A. Tsiartas
Mar 3 and Mar 5Latent variable modeling : application to computer visionReadings
[Urtasun06] [Urtasun07] [Kanaujia07]
[Urtasun08] [Lu08]
J. Binney, A. Metallinou,H. Vathsangam,J. Lee, A. Kazemzadeh
Mar 10 and Mar 12Distance metric learning: supervised; nearest neighborReadings
[Xing03] [Globerson06]
[Hinton03] [Goldberger05] [Weinberger06b]
M. Black,C.H. Kuo,G. Girirajan,S.S. Lee, A. Sharma
Mar 17 and Mar 19Spring BreakNone
Mar 24 and Mar 26Distance metric learning: online learning; low-rankReadings
[Shalev04] [Davis07]
[Torresani07] [Davis08]
J. Das,P. Ghosh,J. Das, G. Girirajan
March 31 and Apr 21. Distance metric learning: other metric learning
2. Transfer learning:
Readings
[Weinberger08] [Sriperumbudur08] [Chopra05]
[Carunana97] [Evgeniou05]
M. Kalakrishnan,F. Qi, Y. Hidaka, F. Qi, P. Ghosh
Apr 7 and Apr 9Transfer learning:Readings
[Argyriou07] [Argyriou08] [Jacob08]
[Ando05]
S. S. Lee, D. Gong, D. Gong,M. Gupta
Apr 14 and Apr 16Transfer learning:Readings
[Daume06] [Blitzer08]
[Raina06] [Raina07] [Liu08]
C.C. Lee,J. Lee,Y. Liu,B. Pan, K. Lehmann
Apr 21 and Apr 23Transfer learning:Readings
[Yu08] [Daume07] [Bonilla08] [Ahmed08]
M. Gupta,C. C. Lee.A. Sharma, B. Pan
Apr 28Transfer Learning and Reinforcement Learning:Guest lecturer by Dr. Matt Taylor on Reinforcement Learning and Transfer Learning
send the talk slides for the projects (in PDF) to the instructor before Apr 24
Apr 30Projects presentation; Course wrap-up
Following is not part of the official course plan
May 5 and May 7Special reading sessions: Deep architecture: Readings
[Bengio07] [Hinton06] [Bengio07b] [Hinton06b]
May 12Special reading session: Compressed sensing:Readings
[Candes08]

Reading list

Structured Prediction
  • Required
    • [Lafferty01] J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th International Conf. on Machine Learning. 2001. [ url ]
    • [Sutton07a] C. Sutton and A. McCallum. An Introduction to Conditional Random Fields for Relational Learning. In L.Getoor and B. Taskar, editors. Introduction to Statistical Relational Learning.MIT Press, 2007. [ url ]
    • [Taskar02] B. Taskar, P. Abbeel and D. Koller. Discriminative probabilistic models for relational data. In Proc. of Eighteenth Conference on Uncertainty in Artificial Intelligence. 2002. [ url ]
    • [Collins02] M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Prof. of Conference on Empirical Methods in Natural Language Processing (EMNLP). 2002. [ url ]
    • [Taskar04] B. Taskar, C. Guestrin and D. Koller. Max-Margin Markov Networks. In Sebastian Thrun and Lawrence Saul and Bernhard Schölkopf, editors, Advances in Neural Information Processing Systems 16. 2004. [ url ]
    • [Tsochantaridis05] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large Margin Methods for Structured and Interdependent Output Variables. Journal of Machine Learning Research, vol. 6, 1453-1484 2005. [ url ]
    • [Quattoni07] A. Quattoni, S. Wang, L-P. Morency, M. Collins, and Trevor Darrell. Hidden Conditional Random Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2007. [ url ]
    • [McDonald05] R. McDonald, K. Crammer, and F. Pereira . Online Large-Margin Training of Dependency Parsers. 43rd Annual Meeting of the Association for Computational Linguistics . 2005. [ url ]
    • [Sutton08] C. Sutton and A. McCallum. Piecewise Training for Structured Prediction. In submission. 2008. [ url ]
    • [Kulesza08] A. Kulesza and F . Pereira Structured Learning with Approximate Inference. In J.C. Platt, D. Koller, Y. Singer and S. Roweis, editors, Advances in Neural Information Processing Systems 20, 785--792. 2008. [ url ]
    • [Finley08] T. Finley and T. Joachims. Training Structural SVMs when Exact Inference is Intractable. In Proceedings of the International Conference on Machine Learning. 2008. [ url ]
    • [Shi08] Q. Shi, L. Wang, L. Cheng and A. Smola. Discriminative Huan Action Segmentation and Recognition using Semi-Markov Model. In Proceedings of the Computer Vision and Pattern Recognition. 2008. [ url ]
    • [Sutton07b] C. Sutton and A. McCallum. Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. In Journal of Machine Learning Research. 2007. [ url ]
  • Optional
Latent variable modeling
  • Required
    • [Tenenbaum00] J. B. Tenenbaum, V. De Silva and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319-2323 2000. [ url ]
    • [Roweis00] S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323-2326 2000. [ url ]
    • [Saul03] L. K. Saul and S. T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research 4:119-155. 2003. [ url ]
    • [Zhang04] Zhenyue Zhang and Hongyuan Zha. Principal manifolds and nonlinear dimensionality reduction via tangent space alignmentSIAM J. on Scientific Computing. 26(1):313-338.2004. [ url ]
    • [Belkin03] M. Belkin and P. Niyogi. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6): 1373-1396. 2003. [ url ]
    • [Belkin06] M. Belkin and P. Niyogi. Manifold Regularization: a Geometric Framework for Learning from Examples. Journal of Machine Learning Research. 2006. [ url ]
    • [Donoho03] D. L. Donoho and C. Grimes. Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data. Proceeding of the National Academy of Sciences . 2003. [ url ]
    • [Weinberger06] K. Q. Weinberger and L. K. Saul. Maximum Variance Unfolding: Unsupervised Learning of Image Manifolds by Semidefinite Programming. International Journal of Computer Vision. 2006. [ url ]
    • [Coifman05a] R. R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric Diffusion as a tool for harmonic analysis and structure definition of data, part I: Diffusion Maps. Proceeding of the National Academy of Sciences 102(21):7426-31. 2005. [ url ]
    • [Coifman05b] R. R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. Zucker. Geometric Diffusion as a tool for harmonic analysis and structure definition of data, part II: Multiscale methods. Proceeding of the National Academy of Sciences 102(21):7432-37. 2005. [ url ]
    • [Mahadevan06] S. Mahadevan and M. Maggioni. Value Function Approximation using Diffusion Wavelets and Laplacian Eigenfunctions. In Y. Weiss and B. Schölkopf and J. Platt, editors, Advances in Neural Information Processing Systems 18, 843--850. 2006. [ url ]
    • [Zhang04] Z. Zhang and H. Zha. LTSA: Principal Manifolds and Nonlinear Dimension Reduction via Tangent Space Alignment. In Y. Weiss and B. Schölkopf and J. Platt, editors, SIAM Journal of Scientific Computing 26 (1): 313-338. 2004. [ url ]
    • [Bishop98] C. M. Bishop, M. Svensen, and C. K. I. Williams. GTM: the Generative Topographic Mapping. Neural Computation, 10(1): 215-234. 1998. [ url ]
    • [Tipping99] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, B, 6(3): 611-622. 1999. [ url ]
    • [Lawrence05] N. D. Lawrence Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6:1783-1816. 2005. [ url ]
    • [Urtasun06] R. Urtasun, D. J.Fleet and P. Fua. Gaussian Process Dynamical Models for 3D people tracking. Proc. of IEEE Computer Vision and Pattern Recognition . 2006. [ url ]
    • [Urtasun07] R. Urtasun and T. Darrell. Discriminative Gaussian Process Latent Variable Models for Classification. Proc. of International Conference in Machine Learning . 2007. [ url ]
    • [Kanaujia07] A. Kanaujia, C. Sminchisescu, and D. Metaxas. Spectral Latent Variable Models for Perceptual Inference. Proc. of IEEE International Conference on Computer Vision. 2007. [ url ]
    • [Urtasun08] R. Urtasun, D. J. Fleet, A. Geiger, J. Popovic, T. Darrell and N. D. Lawrence. Topologically-Constrained Latent Variable Models. Proc. of International Conference in Machine Learning . 2008 [ url ]
    • [Lu08] Z. Lu and M. Carreira-Perpinan and C. Sminchisescu. People Tracking with the Laplacian Eigenmaps Latent Variable Model. In J.C. Platt and D. Koller and Y. Singer and S. Roweis, editors, Advances in Neural Information Processing Systems 20, 1705--1712. 2008 [ url ]
  • Optional
Metric learning
  • Required
    • [Xing03] E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems 15 2003. [ url ]
    • [Globerson06] A. Globerson and S. Roweis Metric Learning by Collapsing Classes. In Y. Weiss, B. Schökopf and J. Platt, editors, Advances in Neural Information Processing Systems 18, 451-458 2006. [ url ]
    • [Hinton03] G. Hinton and S. Roweis. Stochastic Neighbor Embedding. Advances in Neural Information Processing Systems 15 2003. [ url ]
    • [Goldberger05] J. Goldberger, S. Roweis, G. Hinton and R. Salakhutdinov. Neighbourhood Components Analysis. In L. K. Saul, Y. Weiss and L. Bottou, editors, Advances in Neural Information Processing Systems 17 2005. [ url ]
    • [Weinberger06b] K. Weinberger and J. Blitzer and L. Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification. In Y. Weiss, B. Schölkopf and J. Platt, editors, Advances in Neural Information Processing Systems 18 2005. [ url ]
    • [Shalev04] S. Shalev-Shwartz, Y. Singer and A. Ng. Online and Batch Learning of Pseudo-Metrics. Proc. of International Conference on Machine Learning 2004. [ url ]
    • [Davis07] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon. Information-Theoretic Metric Learning. Proc. of International Conference on Machine Learning, 209-216 2007. [ url ]
    • [Torresani07] L. Torresani and K. Lee. Large Margin Component Analysi. In B. Schökopf, J. Platt and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, 1385-1392 2007. [ url ]
    • [Davis08] J. Davis and I. S. Dhillon. Structured Metric Learning for High-Dimensional Problems. Proc. of Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. [ url ]
    • [Weinberger08] K. Weinberger and L. Saul. Fast Solvers and Efficient Implementations for Distance Metric Learning. Proc. of International Conference on Machine Learning. 2008. [ url ]
    • [Sriperumbudur08] B. Sriperumbudur, O. Lang, and G. Lanckriet. Metric Embedding for Kernel Classification Rules. Proc. of International Conference on Machine Learning. 2008. [ url ]
    • [Chopra05] S. Chopra, R. Hadsell and Y. LeCun. Learning a Similarity Metric Discriminatively, with Application to Face Verification. Proc. of Computer Vision and Pattern Recognition Conference. 2005. [ url ]
  • Optional
Transfer learning
  • Required
    • [Caruana97] R. Caruana. Multitask learning. Machine Learning, 1997. [ url ]
    • [Evgeniou05] T. Evgeniou, C. Michelli and M. Pontil. Learning multiple tasks with kernel methods. J. Machine Learning Research, 6: 615-637, 2005. [ url ]
    • [Argyriou07] A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In B. Schölkopf and J. Platt and T. Hoffman, editors, Advances in Neural Information Processing Systems 19. 2007. [ url ]
    • [Argyriou08] A. Argyriou, T. Evgeniou, M. Pontil and Y. Ying. A Spectral Regularization Framework for Multi-Task Structure Learning. In J.C. Platt and D. Koller and Y. Singer and S. Roweis, editors, Advances in Neural Information Processing Systems 20. 2008. [ url ]
    • [Jacob08] L. Jacob, F. Bach, J.-P. Vert. Clustered Multi-Task Learning: A Convex Formulation. To appear. 2008. [ url ]
    • [Ando05] R. Ando and T. Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research 2005. [ url ]
    • [Blitzer06] J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. Proceedings of the Conference on Empiri- cal Methods in Natural Language 2006. [ url ]
    • [Daume06] H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, 26:101-126 2006. [ url ]
    • [Daume07] H. Daume III. Frustratingly easy domain adaptation. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 256-263, . 2007. [ url ]
    • [Blitzer08] J. Blitzer, K. Crammer, A. Kulesza, F. Pereira and J. Wortman. Learning Bounds for Domain Adaptation. Advances in Neural Information Processing Systems 20. 2008. [ url ]
    • [Raina06] R. Raina , A. Y. Ng and D. Koller. Transfer Learning by constructing informative priors. Proc. of International Conference on Machine Learning. 2006. [ url ]
    • [Raina07] R. Raina , A. Battle, H. Lee, B. Packer and A. Y. Ng. Self-taught learning: Transfer learning from unlabeled data. Proc. of International Conference on Machine Learning. 2007. [ url ]
    • [Liu08] Q. Liu, X. Liao and L. Carin. Semi-Supervised Multitask Learning. Advances in Neural Information Processing Systems 20 2008. [ url ]
    • [Yu08] K. Yu and W. Chu. Gaussian Process Models for Link Analysis and Transfer Learning . dvances in Neural Information Processing Systems 20 2008. [ url ]
    • [Bonilla08] E. Bonilla, K. M. Chai, and C. Williams . Multi-task Gaussian Process Prediction . Advances in Neural Information Processing Systems 20 2008. [ url ]
    • [Ahmed08] A. Ahmed, K. Yu, W. Xu, Y. Gong and E. P. Xing. Training Hierarchical Feed-forward Visual Recognition Models Using Transfer Learning from Pseudo Tasks. Proc. of European Conference on Computer Vision 2008. [ url ]
  • Optional
Deep Architecture
  • Required
    • [Bengio07] Y. Bengio. Learning deep architecture for AI Tech Report, U. of Montreal. 2007. [ url ]
    • [Hinton06] G. E. Honton, S. Osindero and Y. Teh. A fast learning algorithm for deep belief nets .Neural Computation 2006. [ url ]
    • [Bengio07b] Y. Bengio, P. Lamblin, D. Popvici and H. Larochelle. Greedy Layer-Wise Training of Deep Networks. Advances in Neural Information Processing Systems 19. 2007. [ url ]
    • [Hinton06b] G. E. Honton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science 2006. [ url ]
  • Optional
Compressed sensing
  • Required
    • [Candes08] E. Candes and M. Wakin. Introduction to compressive sampling sensing IEEE Signal Processing. 2008. [ url ]
  • Optional
Contact
941 West 37th Place,
Los Angeles, CA 90089
Tel: (213) 740-5924
Fax: (213) 740-7512
Office: RTH 403
Email: feisha@usc.edu