Abstract
Automatic detection and reconstruction of buildings have become essential in many remote sensing and computer vision applications. In this paper, the capability of Convolutional Neural Networks (CNNs) is investigated for building detection as well as recognition of roof shapes using a single image. The major steps are including training dataset generation, model training, image segmentation, building detection and roof shape recognition. First, a CNN is trained for extracting urban objects such as trees, roads and buildings. Next, classification of different roof types into flat, gable and hip shapes is performed using the second trained CNN. The assessment results prove effectiveness of the proposed method with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.
Zusammenfassung
Ein CNN-basierter Ansatz zur automatischen Erkennung von Gebäuden und Dachtypen in einem einzelnen Luftbild. Die automatische Erkennung und Rekonstruktion von Gebäuden ist bei vielen Anwendungen in Fernerkundung und Computer-Vision unerlässlich geworden. In diesem Beitrag wird die Fähigkeit von Convolutional Neural Networks (CNNs) zur Erkennung von Gebäuden und Dachformen in einem einzelnen Bild untersucht. Die wichtigsten Schritte sind die Erstellung von Trainingsdatensätzen, das Modelltraining, die Bildsegmentierung sowie die Gebäude- und Dachformerkennung. Zunächst wird ein CNN für das Extrahieren von städtischen Objekten wie Bäumen, Straßen und Gebäuden trainiert und der Datensatz klassifiziert. Anschließend erfolgt die Klassifizierung der Dächer in Flach-, Giebel- und Satteldach mit dem zweiten trainierten CNN. Die Ergebnisse belegen den Erfolg der vorgeschlagenen Methode mit ca. 97% bzw. 92% Klassifizierungsgenauigkeit bei Gebäudedetektion und Klassifizierung der Dachformen.
Similar content being viewed by others
References
Alidoost F, Arefi H (2016) Knowledge based 3D building model recognition using convolutional neural networks from lidar and aerial imageries. Int Arch Photogramm Remote Sens Spat Inf Sci XLI-B3:833–840. https://doi.org/10.5194/isprsarchives-xli-b3-833-2016
Awrangjeb M, Zhang C, Fraser CS (2013) Automatic extraction of building roofs using lidar data and multispectral imagery. ISPRS J Photogramm Remote Sens 83:1–18. https://doi.org/10.1016/j.isprsjprs.2013.05.006
Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall Inc, New Jersey
Benedek C, Descombes X, Zerubia J (2012) Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics. IEEE Trans Pattern Anal Mach Intell 34(1):33–50. https://doi.org/10.1109/TPAMI.2011.94
Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127. https://doi.org/10.1561/2200000006.
Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. JMLR 27:17–37
Chatfield K, Simoyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. Proc B Mach Vision Conf. arXiv:1405.3531
Chen Y, Zhao X, Jia X et al (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Sel Top Appl Earth Obs Remote Sens 8(6):2381–2392. https://doi.org/10.1109/JSTARS.2015.2388577
Cheng L, Gong J, Li M, Liu Y (2011) 3D building model reconstruction from multi-view aerial imagery and lidar data. Photogramm Eng Remote Sens 77(2):125–139. https://doi.org/10.14358/PERS.77.2.125
Cramer M (2010) The DGPF-test on digital airborne camera evaluation—overview and test design. Photogramm Fernerkundung Geoinf 2010:73–82. https://doi.org/10.1127/1432-8364/2010/0041
Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends® Signal Process 7(3–4):197–387. https://doi.org/10.1136/bmj.319.7209.0a
Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2009). IEEE, Miami, FL, USA, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Donahue J, Jia Y, Vinyals O et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. Proc 31st Int Conf Mach Learn, PMLR 32(1):647–655
Dornaika F, Moujahid A, El Merabet Y, Ruichek Y (2016) Building detection from orthophotos using a machine learning approach: an empirical study on image segmentation and descriptors. Expert Syst Appl 58:130–142. https://doi.org/10.1016/j.eswa.2016.03.024
Dorninger P, Pfeifer N (2008) A comprehensive automated 3D approach for building extraction, reconstruction, and regularization from airborne laser scanning point clouds. Sensors 8:7323–7343. https://doi.org/10.3390/s8117323
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comp Vision 59(2):167–181. https://doi.org/10.1023/B:VISI.0000022288.19776.77
Gamba P, Houshmand B (2000) Digital surface models and building extraction: a comparison of IFSAR and LIDAR data. IEEE Trans Geosci Remote Sens 38(4):1959–1968. https://doi.org/10.1109/36.851777
Ghaffarian S, Ghaffarian S (2014) Automatic building detection based on purposive FastICA (PFICA) algorithm using monocular high resolution google earth images. ISPRS J Photogramm Remote Sens 97:152–159. https://doi.org/10.1016/j.isprsjprs.2014.08.017
Girshick R (2015) Fast R-CNN. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Santiago, Chile, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Columbus, Ohio, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384
Guo L, Chehata N, Mallet C, Boukir S (2011) Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J Photogramm Remote Sens 66:56–66. https://doi.org/10.1016/j.isprsjprs.2010.08.007
Haala N, Brenner C (1999) Extraction of buildings and trees in urban environments. ISPRS J Photogramm Remote Sens 54:130–137. https://doi.org/10.1016/S0924-2716(99)00010-6
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV2017). IEEE, Venice, Italy, pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322
Hermosilla T, Ruiz LA, Recio JA, Estornell J (2011) Evaluation of automatic building detection approaches combining high resolution images and lidar data. Remote Sens 3:1188–1210. https://doi.org/10.3390/rs3061188
Höfle B, Mücke W, Dutter M, Rutzinger M (2009) Detection of building regions using airborne lidar: a new combination of raster and point cloud based GIS methods. Proc Geoinformatics Forum Salzburg. pp 66–75. https://ezproxy2.utwente.nl/login?url=https://webapps.itc.utwente.nl/library/2009/chap/rutzinger_det.pdf. Accessed 15 Jan 2017
Huang J, Rathod V, Sun C et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2017). IEEE, Honolulu, HI, USA, pp 3296–3297. https://doi.org/10.1109/CVPR.2017.351
ISPRS (2012) Web site of the ISPRS test project on urban classification and 3D building reconstruction. Available at http://www2.isprs.org/commissions/comm3/wg4/detection-and-reconstruction.html. Accessed 17 Sep. 2016
Izadi M, Saeedi P (2012) Three-dimensional polygonal building model estimation from single satellite images. Geosci Remote Sens IEEE Trans 50(6):2254–2272. https://doi.org/10.1109/TGRS.2011.2172995
Kabolizade M, Ebadi H, Ahmadi S (2010) An improved snake model for automatic extraction of buildings from urban aerial images and lidar data. Comput Environ Urban Syst 34:435–441. https://doi.org/10.1016/j.compenvurbsys.2010.04.006
Karantzalos K, Koutsourakis P, Kalisperakis I, Grammatikopoulos L (2015) Model-based building detection from low-cost optical sensors onboard unmanned aerial vehicles. Int Arch Photogramm Remote Sens Spat Inf Sci 40:293–297. https://doi.org/10.5194/isprsarchives-xl-1-w4-293-2015
Khurana M, Wadhwa V (2015) Automatic building detection using modified grab cut algorithm from high resolution satellite image. Int J Adv Res Comput Commun Eng 4(8):158–164. https://doi.org/10.17148/IJARCCE.2015.4833
Kim K, Shan J (2011) Building roof modeling from airborne laser scanning data based on level set approach. ISPRS J Photogramm Remote Sens 66:484–497. https://doi.org/10.1016/j.isprsjprs.2011.02.007
Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf Neural Infor Proc Syst, NIPS’12 1:1097–1105. https://doi.org/10.1109/5.726791
LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT Press
Li E, Femiani J, Xu S et al (2015) Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans Geosci Remote Sens 53(8):4483–4495. https://doi.org/10.1109/TGRS.2015.2400462
Liu T, Fang S, Zhao Y et al (2015) Implementation of training convolutional neural networks. arXiv:1506.01195
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science. Springer, Cham
Maas HG, Vosselman G (1999) Two algorithms for extracting building models from raw laser altimetry data. ISPRS J Photogramm Remote Sens 54:153–163. https://doi.org/10.1016/S0924-2716(99)00004-0
Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of international conference on document analysis recognition (ICDAR2015). IEEE, Tunis, Tunisia, pp 1021–1025. https://doi.org/10.1109/icdar.2015.7333916
Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. IEEE Int Geosci Remote Sens Symp 2015:4959–4962. https://doi.org/10.1109/IGARSS.2015.7326945
Maltezos E, Ioannidis C (2015) Automatic detection of building points from lidar and dense image matching point clouds. ISPRS Ann Photogramm Remote Sens Spat Inf Sci II-3/W5:33–40. https://doi.org/10.5194/isprsannals-ii-3-w5-33-2015
Manno-Kovacs A, Ok AO (2015) Building detection from monocular vhr images by integrated urban area knowledge. IEEE Geosci Remote Sens Lett 12(10):2140–2144. https://doi.org/10.1109/LGRS.2015.2452962
McGlone JC, Shufelt JA (1994) Projective and object space geometry for monocular building extraction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR94). IEEE, Seattle, WA, USA, pp 54–61. https://doi.org/10.1109/CVPR.1994.323810
McKeown DM, Bulwinkle T, Cochran S, Harvey W, McGlone C, Shufelt JA (2000) Performance evaluation for automatic feature extraction. Int Arch Photogramm Remote Sens Spat Inf Sci XXXII I(B2):379–394
Nalani HA (2014) Automatic reconstruction of urban objects from mobile laser scanner data. Dissertation for awarding the academic degree Doktor-Ingenieur. Dresden, Germany
Ok AO, Senaras C, Yuksel B (2013) Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery. IEEE Trans Geosci Remote Sens 51(3):1701–1717. https://doi.org/10.1109/TGRS.2012.2207123
Oztimur Karadag O, Senaras C, Yarman Vural FT (2015) Segmentation fusion for building detection using domain-specific information. IEEE J Sel Top Appl Earth Obs Remote Sens 8(7):3305–3315. https://doi.org/10.1109/JSTARS.2015.2403617
Phung SL, Bouzerdoum A (2009) Matlab library for convolutional neural networks. Technical report, ICT research institute, visual and audio signal processing lab, university of Wollongong. https://www.uow.edu.au/~phung. Accessed 15 Aug 2016
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2016). IEEE, Las Vegas, NV, USA, pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Rottensteiner F, Trinder J, Clode S, Kubik K (2007) Building detection by fusion of airborne laser scanner data and multi-spectral images: performance evaluation and sensitivity analysis. ISPRS J Photogramm Remote Sens 62:135–149. https://doi.org/10.1016/j.isprsjprs.2007.03.001
Saito S, Aoki Y (2015) Building and road detection from large aerial imagery. Proc. SPIE 9405, Image processing: machine vision applications VIII:94050K. https://doi.org/10.1117/12.2083273
Sampath A, Shan J (2010) Segmentation and reconstruction of polyhedral building roofs from aerial lidar point clouds. IEEE Trans Geosci Remote Sens 48(3):1554–1567. https://doi.org/10.1109/TGRS.2009.2030180
Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Networks 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Senaras C, Vural FTY (2016) A self-supervised decision fusion framework for building detection. IEEE J Sel Top Appl Earth Obs Remote Sens 9(5):1780–1791. https://doi.org/10.1109/JSTARS.2015.2463118
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Singh G, Jouppi M, Zhang Z, Zakhor A (2015) Shadow based building extraction from single satellite image. Comput Imaging XIII:94010F. https://doi.org/10.1117/12.2083500
Tuia D, Flamary R, Courty N (2015) Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions. ISPRS J Photogramm Remote Sens 105:272–285. https://doi.org/10.1016/j.isprsjprs.2015.01.006
Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5
Vakalopoulou M, Karantzalos K, Komodakis N, Paragios N (2015) Building detection in very high resolution multispectral data with deep learning features. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2015). IEEE, Milan, Italy, pp 1873–1876. https://doi.org/10.1109/igarss.2015.7326158
Vedaldi A, Lenc K (2015) MatConvNet-Convolutional neural networks for MATLAB. In: Proceedings of the ACM international conference on multimedia. ACM, Brisbane, Australia, pp 689–692. https://doi.org/10.1145/2733373.2807412
Von Gioi RG, Jakubowicz J, Morel J-M, Randall G (2010) LSD: a fast line segment detector with a false detection control. IEEE Trans Pattern Anal Mach Intell 32(4):722–732. https://doi.org/10.1109/TPAMI.2008.300
Vu TT, Yamazaki F, Matsuoka M (2009) Multi-scale solution for building extraction from lidar and image data. Int J Appl Earth Obs Geoinf 11(4):281–289. https://doi.org/10.1016/j.jag.2009.03.005
Yu B, Liu H, Wu J et al (2010) Automated derivation of urban building density information using airborne lidar data and object-based method. Landsc Urban Plan 98(3–4):210–219. https://doi.org/10.1016/j.landurbplan.2010.08.004
Yuan J (2016) Automatic building extraction in aerial scenes using convolutional networks. http://jiangyeyuan.com/bldgExt.html. arXiv:1602.06564. Accessed 15 Jan 2017
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Comput vision–ECCV 2014 8689:818–833. https://doi.org/10.1007/978-3-319-10590-1_53
Zhang K, Yan J, Chen SC (2006) Automatic construction of building footpoints from airborne lidar data. IEEE Trans Geosci Remote Sens 44(9):2523–2533. https://doi.org/10.1109/TGRS.2006.874137
Zhang Y, Sohn K, Villegas R et al (2015) Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2015), Boston, MA, USA, pp 249–258. https://doi.org/10.1109/cvpr.2015.7298621
Zhang Q, Wang Y, Liu Q et al (2016) CNN based suburban building detection using monocular high resolution google earth images. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2016). IEEE, Beijing, China, pp 661–664. https://doi.org/10.1109/IGARSS.2016.7729166
Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. IEEE Signal Process Lett 21(9):1159–1163. https://doi.org/10.1109/LSP.2014.2298888
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alidoost, F., Arefi, H. A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image. PFG 86, 235–248 (2018). https://doi.org/10.1007/s41064-018-0060-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41064-018-0060-5