A High Performance CRF Model for Clothes Parsing

Simo-Serra, Edgar; Fidler, Sanja; Moreno-Noguer, Francesc; Urtasun, Raquel

doi:10.1007/978-3-319-16811-1_5

Edgar Simo-Serra¹⁷,
Sanja Fidler¹⁸,
Francesc Moreno-Noguer¹⁷ &
…
Raquel Urtasun¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9005))

Included in the following conference series:

Asian Conference on Computer Vision

3009 Accesses
22 Citations

Abstract

In this paper we tackle the problem of clothing parsing: Our goal is to segment and classify different garments a person is wearing. We frame the problem as the one of inference in a pose-aware Conditional Random Field (CRF) which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts. We demonstrate the effectiveness of our approach on the Fashionista dataset [1] and show that we can obtain a significant improvement over the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: Parsing clothing in fashion photographs. In: CVPR. (2012)
Google Scholar
Forbes Magazine: US online retail sales to reach ${\$}$370B By 2017; €191B in Europe (2013). http://www.forbes.com. Accessed 14 March 2013
Bossard, L., Dantone, M., Leistner, C., Wengert, C., Quack, T., Gool, L.V.: Apparel classifcation with style. In: ACCV (2012)
Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: ICCV (2011)
Google Scholar
Chen, H., Gallagher, A., Girod, B.: Describing clothing by semantic attributes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 609–623. Springer, Heidelberg (2012)
Chapter Google Scholar
Gallagher, A.C., Chen, T.: Clothing cosegmentation for recognizing people. In: CVPR (2008)
Google Scholar
Hasan, B., Hogg, D.: Segmentation using deformable spatial priors with application to clothing. In: BMVC (2010)
Google Scholar
Jammalamadaka, N., Minocha, A., Singh, D., Jawahar, C.: Parsing clothes in unrestricted images. In: BMVC (2013)
Google Scholar
Liu, S., Song, Z., Liu, G., Xu, C., Lu, H., Yan, S.: Street-toshop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In: CVPR (2012)
Google Scholar
Wang, N., Ai, H.: Who blocks who: simultaneous clothing segmentation for grouping images. In: ICCV (2011)
Google Scholar
Song, Z., Wang, M., s. Hua, X., Yan, S.: Predicting occupation via human clothing and contexts. In: ICCV (2011)
Google Scholar
Murillo, A.C., Kwak, I.S., Bourdev, L., Kriegman, D., Belongie, S.: Urban tribes: analyzing group photos from a social perspective. In: CVPR Workshops (2012)
Google Scholar
Yamaguchi, K., Kiapour, M.H., Berg, T.L.: Paper doll parsing: retrieving similar styles to parse clothing items. In: ICCV (2013)
Google Scholar
Chen, H., Xu, Z.J., Liu, Z.Q., Zhu, S.C.: Composite templates for cloth modeling and sketching. In: CVPR (2006)
Google Scholar
Liu, S., Feng, J., Song, Z., Zhang, T., Lu, H., Changsheng, X., Yan, S.: Hi, magic closet, tell me what to wear! In: Proceedings of the 20th ACM International Conference on Multimedia (2012)
Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)
Google Scholar
Yang, Y., Ramanan, D.: Articulated pose estimation using flexible mixtures of parts. In: CVPR (2011)
Google Scholar
Dong, J., Chen, Q., Xia, W., Huang, Z., Yan, S.: A deformable mixture parsing model with parselets. In: ICCV (2013)
Google Scholar
Ladicky, L., Torr, P.H.S., Zisserman, A.: Human pose estimation using a joint pixel-wise and part-wise formulation. In: CVPR (2013)
Google Scholar
Wang, H., Koller, D.: Multi-level inference by relaxed dual decomposition for human pose segmentation. In: CVPR (2011)
Google Scholar
Yao, Y., Fidler, S., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: CVPR (2012)
Google Scholar
Fidler, S., Sharma, A., Urtasun, R.: A sentence is worth a thousand pixels. In: CVPR (2013)
Google Scholar
Ladicky, L., Russell, C., Kohli, P., Torr, P.H.S.: Graph cut based inference with co-occurrence statistics. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 239–253. Springer, Heidelberg (2010)
Chapter Google Scholar
Brox, T., Bourdev, L., Maji, S., Malik, J.: Object segmentation by alignment of poselet activations to image contours. In: CVPR (2011)
Google Scholar
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. In: PAMI (2011)
Google Scholar
Carreira, J., Sminchisescu, C.: CPMC: automatic object segmentation using constrained parametric min-cuts. TPAMI 34, 1312–1328 (2012)
Article Google Scholar
Carreira, J., Caseiro, R., Batista, J., Sminchisescu, C.: Semantic segmentation with second-order pooling. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 430–443. Springer, Heidelberg (2012)
Chapter Google Scholar
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. IJCV 104, 154–171 (2013)
Article Google Scholar
Schwing, A., Hazan, T., Pollefeys, M., Urtasun, R.: Distributed message passing for large scale graphical models. In: CVPR (2011)
Google Scholar
Hazan, T., Urtasun, R.: A primal-dual message-passing algorithm for approximated large scale structured prediction. In: NIPS (2010)
Google Scholar
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient structured prediction with latent variables for general graphical models. In: ICML (2012)
Google Scholar
Miller, G.A.: Wordnet: a lexical database for english. Commun. ACM 38, 39–41 (1995)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2010)
Article Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the kitti dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)
Article Google Scholar
Deng, J., Dong, W., Socher, R., jia Li, L., Li, K., Fei-fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Simo-Serra, E., Quattoni, A., Torras, C., Moreno-Noguer, F.: A joint model for 2D and 3D pose estimation from a single image. In: CVPR (2013)
Google Scholar

Download references

Acknowledgements

This work has been partially funded by Spanish Ministry of Economy and Competitiveness under projects PAU+ DPI2011-27510 and ERA-Net Chistera project ViSen PCIN-2013-047.

Author information

Authors and Affiliations

IRI (CSIC-UPC), Barcelona, Spain
Edgar Simo-Serra & Francesc Moreno-Noguer
University of Toronto, Toronto, Canada
Sanja Fidler & Raquel Urtasun

Authors

Edgar Simo-Serra
View author publications
You can also search for this author in PubMed Google Scholar
Sanja Fidler
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Moreno-Noguer
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Urtasun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edgar Simo-Serra .

Editor information

Editors and Affiliations

Technische Universität München, Garching, Bayern, Germany
Daniel Cremers
University of Adelaide, Adelaide, South Australia, Australia
Ian Reid
Keio University, Yokohama, Kanagawa, Japan
Hideo Saito
University of California at Merced, Merced, California, USA
Ming-Hsuan Yang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material (pdf 103 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Simo-Serra, E., Fidler, S., Moreno-Noguer, F., Urtasun, R. (2015). A High Performance CRF Model for Clothes Parsing. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-16811-1_5
Published: 16 April 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16810-4
Online ISBN: 978-3-319-16811-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics