Learning Grammars for Architecture-Specific Facade Parsing

Gadde, Raghudeep; Marlet, Renaud; Paragios, Nikos

doi:10.1007/s11263-016-0887-4

Learning Grammars for Architecture-Specific Facade Parsing

Published: 01 March 2016

Volume 117, pages 290–316, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Raghudeep Gadde¹,
Renaud Marlet¹ &
Nikos Paragios²

1105 Accesses
32 Citations
1 Altmetric
Explore all metrics

Abstract

Parsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four different datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Image Generation: A Review

Article 11 March 2022

Generative approaches for solving tangram puzzles

Article Open access 08 February 2024

Taxonomy and Nomenclature for the Stone Domain in New England

Article 21 September 2023

Notes

https://github.com/raghudeep/ParisArtDecoFacadesDataset/

References

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.
Article Google Scholar
Alegre, F., & Dellaert, F. (2004). A probabilistic approach to the semantic interpretation of building facades. In CIPA international workshop on vision techniques applied to the rehabilitation of city centres (pp. 25–27).
Benz, F., & Kötzing, T. (2013). An effective heuristic for the smallest grammar problem. In Proceedings of the 15th annual conference on genetic and evolutionary computation (pp. 487–494). ACM.
Berg, A.C., Grabler, F., & Malik, J. (2007). Parsing images of architectural scenes. In IEEE 11th International Conference on Computer Vision, 2007 (ICCV 2007). (pp. 1–8). IEEE
Bod, R. (2003). An efficient implementation of a new DOP model. In 10th Conference on European Chapter of the Association for Computational Linguistics (EACL 2003) (Vol 1, pp 19–26).
Bod, R. (2006). An all-subtrees approach to unsupervised parsing. In 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL 2006) (pp. 865–872). Association for Computational Linguistics.
Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning, 44(1–2), 185–197.
Article MATH Google Scholar
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., & Sahai, A., et al. (2002). Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the thiry-fourth annual ACM symposium on theory of computing (STOC) (pp. 792–801). ACM.
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., et al. (2005). The smallest grammar problem. IEEE Transactions on Information Theory, 51(7), 2554–2576.
Article MathSciNet MATH Google Scholar
Chi, Y., Muntz, R. R., Nijssen, S., & Kok, J. N. (2005). Frequent subtree mining - an overview. Fundamenta Informaticae, 66(1), 161–198.
MathSciNet MATH Google Scholar
Clark, A. (2010). Distributional learning of some context-free languages with a minimally adequate teacher. In Grammatical Inference: Theoretical Results and Applications (pp. 24–37). Springer.
Cohen, A., Schwing, A.G., & Pollefeys, M. (2014). Efficient structured parsing of facades using dynamic programming. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
Cohen, S. B., Stratos, K., Collins, M., Foster, D. P., & Ungar, L. (2014). Spectral learning of latent-variable pcfgs: Algorithms and sample complexity. The Journal of Machine Learning Research, 15(1), 2399–2449.
MathSciNet MATH Google Scholar
Cohen, S.B., Stratos, K., Collins, M., Foster, D.P., & Ungar, L.H. (2013). Experiments with spectral learning of latent-variable PCFGs. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL 2013) (pp. 148–157).
Cohn, T., Blunsom, P., & Goldwater, S. (2010). Inducing tree-substitution grammars. The Journal of Machine Learning Research, 11, 3053–3096.
MathSciNet MATH Google Scholar
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.
Article Google Scholar
Dai, D., Prasad, M., Schmitt, G., & Van Gool, L. (2012). Learning domain knowledge for façade labelling. In Computer Vision–ECCV 2012 (pp. 710–723). Springer.
Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227.
Article Google Scholar
De La Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38(9), 1332–1348.
Article Google Scholar
D’Ulizia, A., Ferri, F., & Grifoni, P. (2011). A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review, 36(1), 1–27.
Article Google Scholar
Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95–104.
Article MathSciNet MATH Google Scholar
Flajolet, P., Sipala, P., & Steyaert, J.M. (1990). Analytic variations on the common subexpression problem. In Proceedings of the 17th international colloquium on automata, languages and programming (pp. 220–234). Springer.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Article MathSciNet MATH Google Scholar
Gould, S. (2012). DARWIN: a framework for machine learning and computer vision research and development. The Journal of Machine Learning Research, 13(1), 3533–3537.
MathSciNet MATH Google Scholar
Grünwald, P. (1996). A minimum description length approach to grammar inference. In Connectionist, statistical, and symbolic approaches to learning for natural language processing, (pp. 203–216). Springer.
De la Higuera, C. (2010). Grammatical inference: Learning automata and grammars. New York: Cambridge University Press.
Book MATH Google Scholar
Jampani, V., Gadde, R., & Gehler, P.V. (2015). Efficient facade segmentation using auto-context. In 2015 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1038–1045). IEEE.
Johnson, M., Griffiths, T., & Goldwater, S. (2007). Bayesian inference for PCFGs via Markov Chain Monte Carlo. In Human Language Technologies 2007: The conference of the north american chapter of the association for computational linguistics (pp. 139–146).
Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1(4), 321–331.
Article MATH Google Scholar
Kolmogorov, V., & Zabin, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.
Article Google Scholar
Komodakis, N., Paragios, N., & Tziritas, G. (2009). Clustering via lp-based stabilities. In Advances in neural information processing systems (Vol 21, pp. 865–872).
Korč, F., & Förstner, W. (2009). eTRIMS Image Database for interpreting images of man-made scenes. Tech. Rep. TR-IGG-P-2009-01, Dept. of Photogrammetry, University of Bonn. http://www.ipb.uni-bonn.de/projects/etrims_db/
Koutsourakis, P., Simon, L., Teboul, O., Tziritas, G., & Paragios, N. (2009). Single view reconstruction using shape grammars for urban environments. In 2009 IEEE 12th international conference on computer vision (pp. 1795–1802). IEEE.
Koziński, M., Gadde, R., Zagoruyko S., Marlet, R., & Obozinski, G. (2015). A MRF shape prior for facade parsing with occlusions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR).
Koziński, M., & Marlet, R. (2014). Image parsing with graph grammars and markov random fields. In Winter conference on applications of computer vision (WACV 2014).
Koziński, M., Obozinski, G., & Marlet, R. (2014). Beyond procedural facade parsing: Bidirectional alignment via linear programming. In 12th asian conference on computer vision (ACCV 2014).
Lehman, E., & Shelat, A. (2002). Approximation algorithms for grammar-based compression. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 205–212). Society for Industrial and Applied Mathematics.
Mäkinen, E. (1989). On the subtree isomorphism problem for ordered trees. Information Processing Letters, 32(5), 271–273.
Article MathSciNet MATH Google Scholar
Manning, C.D. (2011). Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In: 12th international conference on computational linguistics and intelligent text processing (CICLing 2011) (Vol Part I, pp. 171–189). Springer
Martinović, A., Mathias, M., Weissenberg, J., & Van Gool, L. (2012). A three-layered approach to facade parsing. In ECCV 2012 computer vision (pp. 416–429). Springer.
Martinovic, A., & Van Gool, L. (2013). Bayesian grammar learning for inverse procedural modeling. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 201–208). IEEE.
Martinović, A., & Van Gool, L. (2013). Earley parsing for 2D stochastic context free grammars. Tech. Rep. KUL/ESAT/PSI/1301, KU Leuven.
Matsuzaki, T., Miyao, Y., & Tsujii, J. (2005). Probabilistic CFG with latent annotations. In 43rd annual meeting on association for computational linguistics (ACL 2005) (pp. 75–82).
Miller, P. (1999). Strong generative capacity. Stanford: CSLI Publications.
MATH Google Scholar
Müller, P., Wonka, P., Haegler, S., Ulmer, A., & Van Gool, L. (2006). Procedural modeling of buildings. In ACM SIGGRAPH 2006 / ACM transactions on graphics (pp. 614–623).
Nevill-Manning, C.G., & Witten, I.H. (1997). Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 67–82
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al. (2007). Malt parser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.
Google Scholar
Ok, D., Kozinski, M., Marlet, R., & Paragios, N. (2012). High-level bottom-up cues for top-down parsing of facade images. In: 2nd Joint 3DIM/3DPVT conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT).
Osher, S., & Paragios, N. (2003). Geometric level set methods in imaging, vision, and graphics. New York: Springer.
MATH Google Scholar
Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2011). Graph based spatial position mapping of low-grade gliomas. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 508–515). Springer
Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2012). Graph-based detection, segmentation & characterization of brain tumors. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 988–995). IEEE.
Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The conference of the North American Chapter of the Association for computational linguistics (pp. 404–411). Association for Computational Linguistics.
Riemenschneider, H., Krispel, U., Thaller, W., Donoser, M., Havemann, S., Fellner, D., & Bischof, H. (2012). Irregular lattices for complex shape grammar facade parsing. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1640–1647). IEEE.
Ripperda, N., & Brenner, C. (2006). Reconstruction of façade structures using a formal grammar and RJMCMC. In Pattern recognition (pp. 750–759). Springer.
Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.
Article MATH Google Scholar
Sakakibara, Y., & Kondo, M. (1999). GA-based learning of context-free grammars using tabular representations. In ICML (Vol 99, pp. 354–360).
Si, Z., & Zhu, S. C. (2013). Learning and-or templates for object recognition and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2189–2205. doi:10.1109/TPAMI.2013.35.
Article Google Scholar
Simon, L., Teboul, O., Koutsourakis, P., & Paragios, N. (2011). Random exploration of the procedural space for single-view 3D modeling of buildings. International Journal of Computer Vision, 93(2), 253–271.
Article MathSciNet MATH Google Scholar
Simon, L., Teboul, O., Koutsourakis, P., Van Gool, L., & Paragios, N. (2012). Parameter-free/Pareto-driven procedural 3D reconstruction of buildings from ground-level sequences. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 518–525). IEEE.
Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning. Cambridge, MA: MIT Press.
Google Scholar
Teboul, O. (2011). Shape grammar parsing: Application to image-based modeling. Ph.D. thesis, Ecole Centrale Paris.
Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2011). Shape grammar parsing via reinforcement learning. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2273–2280). IEEE.
Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2013). Parsing facades with shape grammars and reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1744–1756.
Article Google Scholar
Teboul, O., Simon, L., Koutsourakis, P., & Paragios, N. (2010). Segmentation of building facades using procedural shape priors. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3105–3112). IEEE.
Tomita, M. (1991). Parsing 2-dimensional language. In M. Tomita (Ed.), Current issues in parsing technology (Vol. 126, pp. 277–289)., The springer international series in engineering and computer science New York: Springer.
Chapter Google Scholar
Tu, K., Pavlovskaia, M., & Zhu, S.C. (2013). Unsupervised structure learning of stochastic and-or grammars. In Advances in neural information processing systems (pp. 1322–1330)
Tylecek, R. (2012). The cmp facade database. Tech. rep., CTU–CMP–2012–24, Czech Technical University.
Valiente, G. (2002). Algorithms on trees and graphs. Berlin: Springer.
Wang, C., Komodakis, N., & Paragios, N. (2013). Markov random field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627.
Article Google Scholar
Weissenberg, J., Riemenschneider, H., Prasad, M., & Van Gool, L. (2013). Is there a procedural logic to architecture? In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 185–192). IEEE.
Wonka, P., Wimmer, M., Sillion, F., & Ribarsky, W. (2003). Instant architecture. ACM Transactions on Graphics (TOG), 22(3), 669–677.
Article Google Scholar
Zaki, M.J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71–80). ACM.

Download references

Acknowledgments

We thank Prof. Nikos Komodakis for providing the code for LP-based clustering. This work was partly carried out in IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). It was partly supported by ANR project Semapolis ANR-13-CORD-0003 and the European Research Council Starting Grant ERC-STG-259112.

Author information

Authors and Affiliations

Université Paris-Est, LIGM (UMR CNRS 8049), ENPC, 77455, Marne-la-Vallée, France
Raghudeep Gadde & Renaud Marlet
Center for Visual Computing, CentraleSupélec, Inria, Universit Paris-Saclay, 92295, Châtenay-Malabry, France
Nikos Paragios

Authors

Raghudeep Gadde
View author publications
You can also search for this author in PubMed Google Scholar
Renaud Marlet
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Paragios
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raghudeep Gadde.

Additional information

Communicated by Carsten Rother.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gadde, R., Marlet, R. & Paragios, N. Learning Grammars for Architecture-Specific Facade Parsing. Int J Comput Vis 117, 290–316 (2016). https://doi.org/10.1007/s11263-016-0887-4

Download citation

Received: 18 July 2014
Accepted: 06 February 2016
Published: 01 March 2016
Issue Date: May 2016
DOI: https://doi.org/10.1007/s11263-016-0887-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Grammars for Architecture-Specific Facade Parsing

Abstract

Access this article

Similar content being viewed by others

Image Generation: A Review

Generative approaches for solving tangram puzzles

Taxonomy and Nomenclature for the Stone Domain in New England

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning Grammars for Architecture-Specific Facade Parsing

Abstract

Access this article

Similar content being viewed by others

Image Generation: A Review

Generative approaches for solving tangram puzzles

Taxonomy and Nomenclature for the Stone Domain in New England

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation