research-article

Free Access

Heterogeneous transfer learning for image clustering via the social web

Authors:
Qiang Yang

Hong Kong University of Science and Technology, Kowloon, Hong Kong

Hong Kong University of Science and Technology, Kowloon, Hong Kong
View Profile

,
Yuqiang Chen

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Gui-Rong Xue

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Wenyuan Dai

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Yong Yu

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1August 2009Pages 1–9

Published:02 August 2009Publication History

ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

Pages 1–9

ABSTRACT

In this paper, we present a new learning scenario, heterogeneous transfer learning, which improves learning performance when the data can be in different feature spaces and where no correspondence between data instances in these spaces is provided. In the past, we have classified Chinese text documents using English training data under the heterogeneous transfer learning framework. In this paper, we present image clustering as an example to illustrate how unsupervised learning can be improved by transferring knowledge from auxiliary heterogeneous data obtained from the social Web. Image clustering is useful for image sense disambiguation in query-based image search, but its quality is often low due to imagedata sparsity problem. We extend PLSA to help transfer the knowledge from social Web data, which have mixed feature representations. Experiments on image-object clustering and scene clustering tasks show that our approach in heterogeneous transfer learning based on the auxiliary data is indeed effective and promising.

References

Alina Andreevskaia and Sabine Bergler. 2008. When specialists and generalists work together: Overcoming domain dependence in sentiment tagging. In ACL-08: HLT, pages 290--298, Columbus, Ohio, June.Google Scholar
Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2007. A comparative study of methods for transductive transfer learning. In ICDM 2007 Workshop on Mining and Management of Biological Data, pages 77--82. Google ScholarDigital Library
Andrew Arnold, Ramesh Nallapati, and William W. Cohen. 2008. Exploiting feature hierarchy for transfer learning in named entity recognition. In ACL-08: HLT.Google Scholar
Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney. 2004. A probabilistic framework for semi-supervised clustering. In ACM SIGKDD 2004, pages 59--68. Google ScholarDigital Library
John Blitzer, Ryan Mcdonald, and Fernando Pereira. 2006. Domain adaptation with structural correspondence learning. In EMNLP 2006, pages 120--128, Sydney, Australia. Google ScholarDigital Library
John Blitzer, Mark Dredze, and Fernando Pereira. 2007. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL 2007, pages 440--447, Prague, Czech Republic.Google Scholar
Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT 1998, pages 92--100, New York, NY, USA. ACM. Google ScholarDigital Library
Rich Caruana. 1997. Multitask learning. Machine Learning, 28(1):41--75. Google ScholarDigital Library
Yee Seng Chan and Hwee Tou Ng. 2007. Domain adaptation with active learning for word sense disambiguation. In ACL 2007, Prague, Czech Republic.Google Scholar
David A. Cohn and Thomas Hofmann. 2000. The missing link - a probabilistic model of document content and hypertext connectivity. In NIPS 2000, pages 430--436.Google Scholar
Wenyuan Dai, Yuqiang Chen, Gui-Rong Xue, Qiang Yang, and Yong Yu. 2008a. Translated learning: Transfer learning across different feature spaces. In NIPS 2008, pages 353--360.Google ScholarDigital Library
Wenyuan Dai, Qiang Yang, Gui-Rong Xue, and Yong Yu. 2008b. Self-taught clustering. In ICML 2008, pages 200--207. Omnipress. Google ScholarDigital Library
Hal Daume, III. 2007. Frustratingly easy domain adaptation. In ACL 2007, pages 256--263, Prague, Czech Republic.Google Scholar
Jesse Davis and Pedro Domingos. 2008. Deep transfer via second-order markov logic. In AAAI 2008 Workshop on Transfer Learning, Chicago, USA.Google Scholar
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. L, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, pages 391--407.Google ScholarCross Ref
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the em algorithm. J. of the Royal Statistical Society, 39:1--38.Google Scholar
Thomas Finley and Thorsten Joachims. 2005. Supervised clustering with support vector machines. In ICML 2005, pages 217--224, New York, NY, USA. ACM. Google ScholarDigital Library
G. Griffin, A. Holub, and P. Perona. 2007. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology.Google Scholar
Thomas Hofmann. 1999 Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, UAI99. Pages 289--296 Google ScholarDigital Library
Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning. volume 42, number 1--2, pages 177--196. Kluwer Academic Publishers. Google ScholarDigital Library
Jing Jiang and Chengxiang Zhai. 2007. Instance weighting for domain adaptation in NLP. In ACL 2007, pages 264--271, Prague, Czech Republic, June.Google Scholar
Leonard Kaufman and Peter J. Rousseeuw. 1990. Finding groups in data: an introduction to cluster analysis. John Wiley and Sons, New York.Google Scholar
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR 2006, pages 2169--2178, Washington, DC, USA. Google ScholarDigital Library
Fei-Fei Li and Pietro Perona. 2005. A bayesian hierarchical model for learning natural scene categories. In CVPR 2005, pages 524--531, Washington, DC, USA. Google ScholarDigital Library
Xiao Ling, Gui-Rong Xue, Wenyuan Dai, Yun Jiang, Qiang Yang, and Yong Yu. 2008. Can chinese web pages be classified with english data source? In WWW 2008, pages 969--978, New York, NY, USA. ACM. Google ScholarDigital Library
Nicolas Loeff, Cecilia Ovesdotter Alm, and David A. Forsyth. 2006. Discriminating image senses by clustering with multimodal features. In COLING/ACL 2006 Main conference poster sessions, pages 547--554. Google ScholarDigital Library
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV) 2004, volume 60, number 2, pages 91--110. Google ScholarDigital Library
J. B. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 1:281--297, Berkeley, CA, USA.Google Scholar
Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the Ninth International Conference on Information and Knowledge Management, pages 86--93, New York, USA. Google ScholarDigital Library
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: transfer learning from unlabeled data. In ICML 2007, pages 759--766, New York, NY, USA. ACM. Google ScholarDigital Library
Roi Reichart and Ari Rappoport. 2007. Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In ACL 2007.Google Scholar
Roi Reichart, Katrin Tomanek, Udo Hahn, and Ari Rappoport. 2008. Multi-task active learning for linguistic annotations. In ACL-08: HLT, pages 861--869.Google Scholar
C. E. Shannon. 1948. A mathematical theory of communication. Bell system technical journal, 27.Google Scholar
J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. 2005. Discovering object categories in image collections. In ICCV 2005.Google Scholar
Naftali Tishby, Fernando C. Pereira, and William Bialek. The information bottleneck method. 1999. In Proc. of the 37th Annual Allerton Conference on Communication, Control and Computing, pages 368--377.Google Scholar
Pengcheng Wu and Thomas G. Dietterich. 2004. Improving svm accuracy by training on auxiliary data sources. In ICML 2004, pages 110--117, New York, NY, USA. Google ScholarDigital Library
Yejun Wu and Douglas W. Oard. 2008. Bilingual topic aspect classification with a few training examples. In ACM SIGIR 2008, pages 203--210, New York, NY, USA. Google ScholarDigital Library
Xiaojin Zhu. 2007. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison.Google Scholar

Index Terms

Heterogeneous transfer learning for image clustering via the social web
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction paradigms
      1. Web-based interaction

Recommendations

General heterogeneous transfer distance metric learning via knowledge fragments transfer
IJCAI'17: Proceedings of the 26th International Joint Conference on Artificial Intelligence

Transfer learning aims to improve the performance of target learning task by leveraging information (or transferring knowledge) from other related tasks. Recently, transfer distance metric learning (TDML) has attracted lots of interests, but most of ...
Read More
Heterogeneous transfer learning for image classification
AAAI'11: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

Transfer learning as a new machine learning paradigm has gained increasing attention lately. In situations where the training data in a target domain are not sufficient to learn predictive models effectively, transfer learning leverages auxiliary source ...
Read More
Transfer spectral clustering
ECMLPKDD'12: Proceedings of the 2012th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II

Transferring knowledge from auxiliary datasets has been proved useful in machine learning tasks. Its adoption in clustering however is still limited. Despite of its superior performance, spectral clustering has not yet been incorporated with knowledge ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
August 2009
572 pages
ISBN:9781932432459
General Chair:
Keh-Yih Su
Behavior Design Corp., Taiwan
Sponsors
In-Cooperation
Publisher
Association for Computational Linguistics
United States
Publication History
- Published: 2 August 2009
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate85of443submissions,19%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 1,041
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Heterogeneous transfer learning for image clustering via the social web

ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

General heterogeneous transfer distance metric learning via knowledge fragments transfer

Heterogeneous transfer learning for image classification

Transfer spectral clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Heterogeneous transfer learning for image clustering via the social web

ACL '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1

ABSTRACT

References

Cited By

Index Terms

Recommendations

General heterogeneous transfer distance metric learning via knowledge fragments transfer

Heterogeneous transfer learning for image classification

Transfer spectral clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media