ABSTRACT
Sparse coding---that is, modelling data vectors as sparse linear combinations of basis elements---is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on learning the basis set, also called dictionary, to adapt it to specific data, an approach that has recently proven to be very effective for signal reconstruction and classification in the audio and image processing domains. This paper proposes a new online optimization algorithm for dictionary learning, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples. A proof of convergence is presented, along with experiments with natural images demonstrating that it leads to faster performance and better dictionaries than classical batch algorithms for both small and large datasets.
- Aharon, M., & Elad, M. (2008). Sparse and redundant modeling of image content using an image-signature-dictionary. SIAM Imaging Sciences, 1, 228--247. Google ScholarDigital Library
- Aharon, M., Elad, M., & Bruckstein, A. M. (2006). The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representations. IEEE Transactions Signal Processing, 54, 4311--4322 Google ScholarDigital Library
- Bertsekas, D. (1999). Nonlinear programming. Athena Scientific Belmont, Mass.Google Scholar
- Bickel, P., Ritov, Y., & Tsybakov, A. (2007). Simultaneous analysis of Lasso and Dantzig selector. preprint.Google Scholar
- Bonnans, J., & Shapiro, A. (1998). Optimization problems with perturbation: A guided tour. SIAM Review, 40, 202--227. Google ScholarDigital Library
- Borwein, J., & Lewis, A. (2006). Convex analysis and nonlinear optimization: theory and examples. Springer.Google Scholar
- Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (Ed.), Online learning and neural networks. Google ScholarDigital Library
- Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. Advances in Neural Information Processing Systems, 20, 161--168.Google Scholar
- Chen, S., Donoho, D., & Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20, 33--61. Google ScholarDigital Library
- Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407--499.Google ScholarCross Ref
- Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions Image Processing, 54, 3736--3745. Google ScholarDigital Library
- Fisk, D. (1965). Quasi-martingale. Transactions of the American Mathematical Society, 359--388.Google ScholarCross Ref
- Friedman, J., Hastie, T., Hölfling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Statistics, 1, 302--332.Google ScholarCross Ref
- Fu, W. (1998). Penalized Regressions: The Bridge Versus the Lasso. Journal of computational and graphical statistics, 7, 397--416.Google Scholar
- Fuchs, J. (2005). Recovery of exact sparse representations in the presence of bounded noise. IEEE Transactions Information Theory, 51, 3601--3608. Google ScholarDigital Library
- Lee, H., Battle, A., Raina, R., & Ng, A. Y. (2007). Efficient sparse coding algorithms. Advances in Neural Information Processing Systems, 19, 801--808.Google ScholarDigital Library
- Mairal, J., Elad, M., & Sapiro, G. (2008). Sparse representation for color image restoration. IEEE Transactions Image Processing, 17, 53--69. Google ScholarDigital Library
- Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Supervised dictionary learning. Advances in Neural Information Processing Systems, 21, 1033--1040.Google Scholar
- Mallat, S. (1999). A wavelet tour of signal processing, second edition. Academic Press, New York.Google Scholar
- Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37, 3311--3325.Google ScholarCross Ref
- Osborne, M., Presnell, B., & Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20, 389--403.Google ScholarCross Ref
- Protter, M., & Elad, M. (2009). Image sequence denoising via sparse and redundant representations. IEEE Transactions Image Processing, 18, 27--36. Google ScholarDigital Library
- Raina, R., Battle, A., Lee, H., Packer, B., & Ng, A. Y. (2007). Self-taught learning: transfer learning from unlabeled data. Proceedings of the 26th International Conference on Machine Learning, 759--766. Google ScholarDigital Library
- Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 67, 267--288.Google Scholar
- Van der Vaart, A. (1998). Asymptotic Statistics. Cambridge University Press.Google Scholar
- Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, 67, 301--320.Google ScholarCross Ref
Index Terms
- Online dictionary learning for sparse coding
Recommendations
Hierarchical sparse dictionary learning
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IISparse coding plays a key role in high dimensional data analysis. One critical challenge of sparse coding is to design a dictionary that is both adaptive to the training data and generalizable to unseen data of same type. In this paper, we propose a ...
Parametric dictionary design for sparse coding
This paper introduces a new dictionary design method for sparse coding of a class of signals. It has been shown that one can sparsely approximate some natural signals using an overcomplete set of parametric functions. A problem in using these parametric ...
Multi-attributed Dictionary Learning for Sparse Coding
ICCV '13: Proceedings of the 2013 IEEE International Conference on Computer VisionWe present a multi-attributed dictionary learning algorithm for sparse coding. Considering training samples with multiple attributes, a new distance matrix is proposed by jointly incorporating data and attribute similarities. Then, an objective function ...
Comments