Skip to main content
Log in

Learning Grammars for Architecture-Specific Facade Parsing

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Parsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four different datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. https://github.com/raghudeep/ParisArtDecoFacadesDataset/

References

  • Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.

    Article  Google Scholar 

  • Alegre, F., & Dellaert, F. (2004). A probabilistic approach to the semantic interpretation of building facades. In CIPA international workshop on vision techniques applied to the rehabilitation of city centres (pp. 25–27).

  • Benz, F., & Kötzing, T. (2013). An effective heuristic for the smallest grammar problem. In Proceedings of the 15th annual conference on genetic and evolutionary computation (pp. 487–494). ACM.

  • Berg, A.C., Grabler, F., & Malik, J. (2007). Parsing images of architectural scenes. In IEEE 11th International Conference on Computer Vision, 2007 (ICCV 2007). (pp. 1–8). IEEE

  • Bod, R. (2003). An efficient implementation of a new DOP model. In 10th Conference on European Chapter of the Association for Computational Linguistics (EACL 2003) (Vol 1, pp 19–26).

  • Bod, R. (2006). An all-subtrees approach to unsupervised parsing. In 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL 2006) (pp. 865–872). Association for Computational Linguistics.

  • Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning, 44(1–2), 185–197.

    Article  MATH  Google Scholar 

  • Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., & Sahai, A., et al. (2002). Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the thiry-fourth annual ACM symposium on theory of computing (STOC) (pp. 792–801). ACM.

  • Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., et al. (2005). The smallest grammar problem. IEEE Transactions on Information Theory, 51(7), 2554–2576.

    Article  MathSciNet  MATH  Google Scholar 

  • Chi, Y., Muntz, R. R., Nijssen, S., & Kok, J. N. (2005). Frequent subtree mining - an overview. Fundamenta Informaticae, 66(1), 161–198.

    MathSciNet  MATH  Google Scholar 

  • Clark, A. (2010). Distributional learning of some context-free languages with a minimally adequate teacher. In Grammatical Inference: Theoretical Results and Applications (pp. 24–37). Springer.

  • Cohen, A., Schwing, A.G., & Pollefeys, M. (2014). Efficient structured parsing of facades using dynamic programming. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.

  • Cohen, S. B., Stratos, K., Collins, M., Foster, D. P., & Ungar, L. (2014). Spectral learning of latent-variable pcfgs: Algorithms and sample complexity. The Journal of Machine Learning Research, 15(1), 2399–2449.

    MathSciNet  MATH  Google Scholar 

  • Cohen, S.B., Stratos, K., Collins, M., Foster, D.P., & Ungar, L.H. (2013). Experiments with spectral learning of latent-variable PCFGs. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL 2013) (pp. 148–157).

  • Cohn, T., Blunsom, P., & Goldwater, S. (2010). Inducing tree-substitution grammars. The Journal of Machine Learning Research, 11, 3053–3096.

    MathSciNet  MATH  Google Scholar 

  • Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.

    Article  Google Scholar 

  • Dai, D., Prasad, M., Schmitt, G., & Van Gool, L. (2012). Learning domain knowledge for façade labelling. In Computer Vision–ECCV 2012 (pp. 710–723). Springer.

  • Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227.

    Article  Google Scholar 

  • De La Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38(9), 1332–1348.

    Article  Google Scholar 

  • D’Ulizia, A., Ferri, F., & Grifoni, P. (2011). A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review, 36(1), 1–27.

    Article  Google Scholar 

  • Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95–104.

    Article  MathSciNet  MATH  Google Scholar 

  • Flajolet, P., Sipala, P., & Steyaert, J.M. (1990). Analytic variations on the common subexpression problem. In Proceedings of the 17th international colloquium on automata, languages and programming (pp. 220–234). Springer.

  • Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.

    Article  MathSciNet  MATH  Google Scholar 

  • Gould, S. (2012). DARWIN: a framework for machine learning and computer vision research and development. The Journal of Machine Learning Research, 13(1), 3533–3537.

    MathSciNet  MATH  Google Scholar 

  • Grünwald, P. (1996). A minimum description length approach to grammar inference. In Connectionist, statistical, and symbolic approaches to learning for natural language processing, (pp. 203–216). Springer.

  • De la Higuera, C. (2010). Grammatical inference: Learning automata and grammars. New York: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Jampani, V., Gadde, R., & Gehler, P.V. (2015). Efficient facade segmentation using auto-context. In 2015 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1038–1045). IEEE.

  • Johnson, M., Griffiths, T., & Goldwater, S. (2007). Bayesian inference for PCFGs via Markov Chain Monte Carlo. In Human Language Technologies 2007: The conference of the north american chapter of the association for computational linguistics (pp. 139–146).

  • Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1(4), 321–331.

    Article  MATH  Google Scholar 

  • Kolmogorov, V., & Zabin, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.

    Article  Google Scholar 

  • Komodakis, N., Paragios, N., & Tziritas, G. (2009). Clustering via lp-based stabilities. In Advances in neural information processing systems (Vol 21, pp. 865–872).

  • Korč, F., & Förstner, W. (2009). eTRIMS Image Database for interpreting images of man-made scenes. Tech. Rep. TR-IGG-P-2009-01, Dept. of Photogrammetry, University of Bonn. http://www.ipb.uni-bonn.de/projects/etrims_db/

  • Koutsourakis, P., Simon, L., Teboul, O., Tziritas, G., & Paragios, N. (2009). Single view reconstruction using shape grammars for urban environments. In 2009 IEEE 12th international conference on computer vision (pp. 1795–1802). IEEE.

  • Koziński, M., Gadde, R., Zagoruyko S., Marlet, R., & Obozinski, G. (2015). A MRF shape prior for facade parsing with occlusions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR).

  • Koziński, M., & Marlet, R. (2014). Image parsing with graph grammars and markov random fields. In Winter conference on applications of computer vision (WACV 2014).

  • Koziński, M., Obozinski, G., & Marlet, R. (2014). Beyond procedural facade parsing: Bidirectional alignment via linear programming. In 12th asian conference on computer vision (ACCV 2014).

  • Lehman, E., & Shelat, A. (2002). Approximation algorithms for grammar-based compression. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 205–212). Society for Industrial and Applied Mathematics.

  • Mäkinen, E. (1989). On the subtree isomorphism problem for ordered trees. Information Processing Letters, 32(5), 271–273.

    Article  MathSciNet  MATH  Google Scholar 

  • Manning, C.D. (2011). Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In: 12th international conference on computational linguistics and intelligent text processing (CICLing 2011) (Vol Part I, pp. 171–189). Springer

  • Martinović, A., Mathias, M., Weissenberg, J., & Van Gool, L. (2012). A three-layered approach to facade parsing. In ECCV 2012 computer vision (pp. 416–429). Springer.

  • Martinovic, A., & Van Gool, L. (2013). Bayesian grammar learning for inverse procedural modeling. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 201–208). IEEE.

  • Martinović, A., & Van Gool, L. (2013). Earley parsing for 2D stochastic context free grammars. Tech. Rep. KUL/ESAT/PSI/1301, KU Leuven.

  • Matsuzaki, T., Miyao, Y., & Tsujii, J. (2005). Probabilistic CFG with latent annotations. In 43rd annual meeting on association for computational linguistics (ACL 2005) (pp. 75–82).

  • Miller, P. (1999). Strong generative capacity. Stanford: CSLI Publications.

    MATH  Google Scholar 

  • Müller, P., Wonka, P., Haegler, S., Ulmer, A., & Van Gool, L. (2006). Procedural modeling of buildings. In ACM SIGGRAPH 2006 / ACM transactions on graphics (pp. 614–623).

  • Nevill-Manning, C.G., & Witten, I.H. (1997). Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 67–82

  • Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al. (2007). Malt parser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.

    Google Scholar 

  • Ok, D., Kozinski, M., Marlet, R., & Paragios, N. (2012). High-level bottom-up cues for top-down parsing of facade images. In: 2nd Joint 3DIM/3DPVT conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT).

  • Osher, S., & Paragios, N. (2003). Geometric level set methods in imaging, vision, and graphics. New York: Springer.

    MATH  Google Scholar 

  • Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2011). Graph based spatial position mapping of low-grade gliomas. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 508–515). Springer

  • Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2012). Graph-based detection, segmentation & characterization of brain tumors. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 988–995). IEEE.

  • Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The conference of the North American Chapter of the Association for computational linguistics (pp. 404–411). Association for Computational Linguistics.

  • Riemenschneider, H., Krispel, U., Thaller, W., Donoser, M., Havemann, S., Fellner, D., & Bischof, H. (2012). Irregular lattices for complex shape grammar facade parsing. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1640–1647). IEEE.

  • Ripperda, N., & Brenner, C. (2006). Reconstruction of façade structures using a formal grammar and RJMCMC. In Pattern recognition (pp. 750–759). Springer.

  • Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.

    Article  MATH  Google Scholar 

  • Sakakibara, Y., & Kondo, M. (1999). GA-based learning of context-free grammars using tabular representations. In ICML (Vol 99, pp. 354–360).

  • Si, Z., & Zhu, S. C. (2013). Learning and-or templates for object recognition and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2189–2205. doi:10.1109/TPAMI.2013.35.

    Article  Google Scholar 

  • Simon, L., Teboul, O., Koutsourakis, P., & Paragios, N. (2011). Random exploration of the procedural space for single-view 3D modeling of buildings. International Journal of Computer Vision, 93(2), 253–271.

    Article  MathSciNet  MATH  Google Scholar 

  • Simon, L., Teboul, O., Koutsourakis, P., Van Gool, L., & Paragios, N. (2012). Parameter-free/Pareto-driven procedural 3D reconstruction of buildings from ground-level sequences. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 518–525). IEEE.

  • Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning. Cambridge, MA: MIT Press.

    Google Scholar 

  • Teboul, O. (2011). Shape grammar parsing: Application to image-based modeling. Ph.D. thesis, Ecole Centrale Paris.

  • Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2011). Shape grammar parsing via reinforcement learning. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2273–2280). IEEE.

  • Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2013). Parsing facades with shape grammars and reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1744–1756.

    Article  Google Scholar 

  • Teboul, O., Simon, L., Koutsourakis, P., & Paragios, N. (2010). Segmentation of building facades using procedural shape priors. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3105–3112). IEEE.

  • Tomita, M. (1991). Parsing 2-dimensional language. In M. Tomita (Ed.), Current issues in parsing technology (Vol. 126, pp. 277–289)., The springer international series in engineering and computer science New York: Springer.

    Chapter  Google Scholar 

  • Tu, K., Pavlovskaia, M., & Zhu, S.C. (2013). Unsupervised structure learning of stochastic and-or grammars. In Advances in neural information processing systems (pp. 1322–1330)

  • Tylecek, R. (2012). The cmp facade database. Tech. rep., CTU–CMP–2012–24, Czech Technical University.

  • Valiente, G. (2002). Algorithms on trees and graphs. Berlin: Springer.

  • Wang, C., Komodakis, N., & Paragios, N. (2013). Markov random field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627.

    Article  Google Scholar 

  • Weissenberg, J., Riemenschneider, H., Prasad, M., & Van Gool, L. (2013). Is there a procedural logic to architecture? In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 185–192). IEEE.

  • Wonka, P., Wimmer, M., Sillion, F., & Ribarsky, W. (2003). Instant architecture. ACM Transactions on Graphics (TOG), 22(3), 669–677.

    Article  Google Scholar 

  • Zaki, M.J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71–80). ACM.

Download references

Acknowledgments

We thank Prof. Nikos Komodakis for providing the code for LP-based clustering. This work was partly carried out in IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). It was partly supported by ANR project Semapolis ANR-13-CORD-0003 and the European Research Council Starting Grant ERC-STG-259112.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raghudeep Gadde.

Additional information

Communicated by Carsten Rother.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gadde, R., Marlet, R. & Paragios, N. Learning Grammars for Architecture-Specific Facade Parsing. Int J Comput Vis 117, 290–316 (2016). https://doi.org/10.1007/s11263-016-0887-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0887-4

Keywords

Navigation