Skip to main content
Log in

The NITRDrone Dataset to Address the Challenges for Road Extraction from Aerial Images

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Recent years have witnessed a dramatic evolution in small-scale remote sensors such as Unmanned aerial vehicles (UAVs). Characteristics such as automatic flight control, flight time, and image acquisition have fueled various computer-vision tasks, providing better efficiency and usefulness than fixed viewing surveillance cameras. However, in constrained scenarios, the number of UAV-based aerial datasets is still low, which comparatively focuses on specific tasks such as image segmentation. In this paper, we present a high-resolution UAV-based image-dataset, named “NITRDrone” focusing on aerial image segmentation tasks especially extracting the road networks from the aerial images. The images and video sequences in this dataset are captured over different locations of the NITR campus area, covering around 650 acres. Thus, it provides many diversified scenarios to be considered while analyzing aerial images. In particular, the dataset is prepared to address the existing challenges in UAV-based aerial image segmentation problems. Extensive experiments have been conducted to prove the effectiveness of the proposed dataset to address the aerial segmentation problems through the existing state-of-the-art methodologies. Out of the considered baseline methodologies, U-Net performs the best with an intersection of union (IoU) of 0.77, followed DeepLabplusException (IoU: 0.74) and SegNet (IoU: 0.68). We hope the NITRDrone dataset will encourage the researchers while boosting the research and development in the visual analysis of UAV platforms. The NITRDrone dataset is available online at: [https://github.com/drone-vision/NITRDrone-Dataset].

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

Notes

  1. NITRDrone Dataset: https://github.com/drone-vision/NITRDrone-Dataset

  2. https://pytorch.org/docs/stable/index.html

Abbreviations

AI:

Artificial Intelligence

CCE:

Categorical Cross Entropy

CNN:

Convolutional Neural Network

FCN:

Fully Convolutional Network

FN:

False Negative

FP:

False Positive

GSD:

Ground Sampling Distance

IoT:

Internet of Things

IoU:

Intersection of Union

MR:

Magnetic Resonance

NITR:

National Institute of Technology Rourkela

ReLU:

Rectified Linear Unit

RS:

Remote Sensing

TN:

True Negative

TP:

True Positive

UAV:

Unmanned Aerial Vehicle

References

  1. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (pp. 248–255). https://doi.org/10.1109/CVPR.2009.5206848

  2. Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (pp. 3354–3361). IEEE. https://doi.org/10.1109/CVPR.2012.6248074

  3. Demir, I., Koperski, K., Lindenbaum, D., Pang, G., Huang, J., Basu, S., Hughes, F., Tuia, D., & Raskar, R. (2018). DeepGlobe 2018: A Challenge to Parse the Earth Through Satellite Images. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. IEEE. https://doi.org/10.1109/CVPRW.2018.00031

  4. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014a). Microsoft COCO: Common Objects in Context. In European conference on computer vision (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

  5. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223). https://doi.org/10.1109/CVPR.2016.350

  6. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., & Torralba, A. (2017). Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 633–641). https://doi.org/10.1109/CVPR.2017.544

  7. Castiglione, A., Vijayakumar, P., Nappi, M., Sadiq, S., & Umer, M. (2021). COVID-19: Automatic Detection of the Novel Coronavirus Disease From CT Images Using an Optimized Convolutional Neural Network. IEEE Transactions on Industrial Informatics, 17, 6480–6488.

    Article  Google Scholar 

  8. Cheng, J.-Z., Ni, D., Chou, Y.-H., Qin, J., Tiu, C.-M., Chang, Y.-C., et al. (2016). Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary Nodules in CT Scans. Scientific Reports, 6, 1–13. https://doi.org/10.1038/srep24454

    Article  Google Scholar 

  9. Sirinukunwattana, K., Raza, S. E. A., Tsang, Y.-W., Snead, D. R., Cree, I. A., & Rajpoot, N. M. (2016). Locality Sensitive Deep Learning for Detection and Classification of Nuclei in Routine Colon Cancer Histology Images. IEEE Transactions on Medical Imaging, 35, 1196–1206. https://doi.org/10.1109/TMI.2016.2525803

    Article  Google Scholar 

  10. Li, X., Wang, Y., Zhang, L., Liu, S., Mei, J., & Li, Y. (2020). Topology-Enhanced Urban Road Extraction via a Geographic Feature-Enhanced Network. IEEE Transactions on Geoscience and Remote Sensing, 58, 8819–8830. https://doi.org/10.1109/TGRS.2020.2991006

    Article  Google Scholar 

  11. Tan, X., Xiao, Z., Wan, Q., & Shao, W. (2021). Scale Sensitive Neural Network for Road Segmentation in High-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, 58, 8819–8830. https://doi.org/10.1109/LGRS.2020.2976551

    Article  Google Scholar 

  12. Badrinarayanan, V., Kendall, A., & Cipolla, R. (2017). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  13. Paszke, A., Chaurasia, A., Kim, S., & Culurciello, E. (2016). Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147.

  14. Franke, U., Pfeiffer, D., Rabe, C., Knoeppel, C., Enzweiler, M., Stein, F., & Herrtwich, R. (2013). Making Bertha See. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 214–221). https://doi.org/10.1109/ICCVW.2013.36

  15. Varma, G., Subramanian, A., Namboodiri, A., Chandraker, M., & Jawahar, C. (2019). IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1743–1751). IEEE. https://doi.org/10.1109/WACV.2019.00190

  16. Pan, X., Shi, J., Luo, P., Wang, X., & Tang, X. (2018). Spatial as deep: Spatial CNN for traffic scene understanding. In Thirty-Second AAAI Conference on Artificial Intelligence.

  17. Di, S., Zhang, H., Li, C.-G., Mei, X., Prokhorov, D., & Ling, H. (2017). Cross-domain traffic scene understanding: A dense correspondence-based transfer learning approach. IEEE Transactions on Intelligent Transportation Systems, 19, 745–757. https://doi.org/10.1109/TITS.2017.2702012

    Article  Google Scholar 

  18. Xie, J., Kiefel, M., Sun, M., & Geiger, A. (2016). Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3688–3697). https://doi.org/10.1109/CVPR.2016.401

  19. Anzalone, L., Barra, P., Barra, S., Castiglione, A., & Nappi, M. (2022). An End-to-End Curriculum Learning Approach for Autonomous Driving Scenarios. IEEE Transactions on Intelligent Transportation Systems, (pp. 1–10). https://doi.org/10.1109/TITS.2022.3160673

  20. Ruwaimana, M., Satyanarayana, B., Otero, V., M. Muslim, A., Syafiq A, M., Ibrahim, S., Raymaekers, D., Koedam, N., & Dahdouh-Guebas, F. (2018). The advantages of using drones over space-borne imagery in the mapping of mangrove forests. PloS One, 13, e0200288. https://doi.org/10.1371/journal.pone.0200288

  21. Mundhenk, T. N., Konjevod, G., Sakla, W. A., & Boakye, K. (2016). A large contextual dataset for classification, detection and counting of cars with deep learning. In European Conference on Computer Vision (pp. 785–800). Springer. https://doi.org/10.1007/978-3-319-46487-948

  22. Barekatain, M., Martí, M., Shih, H.-F., Murray, S., Nakayama, K., Matsuo, Y., & Prendinger, H. (2017). Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 2153–2160). https://doi.org/10.1109/CVPRW.2017.267

  23. Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., & Zhang, L. (2018). DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 3974–3983). https://doi.org/10.1109/CVPR.2018.00418

  24. Hsieh, M. R., Lin, Y. L., & Hsu, W. H. (2017). Drone-Based Object Counting by Spatially Regularized Regional Proposal Network. In The IEEE International Conference on Computer Vision (ICCV) (pp. 4165–4173). IEEE. https://doi.org/10.1109/ICCV.2017.446

  25. Robicquet, A., Sadeghian, A., Alahi, A., & Savarese, S. (2016). Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. In European Conference on Computer Vision (pp. 549–565). Springer. https://doi.org/10.1007/978-3-319-46484-8-33

  26. Nigam, I., Huang, C., & Ramanan, D. (2018). Ensemble Knowledge Transfer for Semantic Segmentation. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1499–1508). IEEE. https://doi.org/10.1109/WACV.2018.00168

  27. Chen, Y., Wang, Y., Lu, P., Chen, Y., & Wang, G. (2018b). Large-Scale Structure from Motion with Semantic Constraints of Aerial Images. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (pp. 347–359). Springer. https://doi.org/10.1007/978-3-030-03398-9-30

  28. Semantic Drone Dataset. https://www.tugraz.at/index.php?id=22387 accessed 27 Oct 2021.

  29. Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., & Tian, Q. (2018). The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. arXiv preprint arXiv:1804.00518.

  30. Li, S., & Yeung, D.-Y. (2017). Visual Object Tracking for Unmanned Aerial Vehicles: A Benchmark and New Motion Models. In AAAI (pp. 4140–4146). volume 31.

  31. Zhu, P., Wen, L., Bian, X., Ling, H., & Hu, Q. (2018). Vision Meets Drones: A Challenge. arXiv preprint arXiv:1804.07437.

  32. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM, 60, 84–90. https://doi.org/10.1145/3065386

    Article  Google Scholar 

  33. Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. A. (2017). Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Thirty-first AAAI conference on artificial intelligence. https://doi.org/10.5555/3298023.3298188

  34. Mnih, V. (2013). Machine Learning for Aerial Image Labeling. Ph.D. thesis University of Toronto.

  35. Ma, A., Zhong, Y., & Zhang, L. (2015). Adaptive multiobjective memetic fuzzy clustering algorithm for remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 53, 4202–4217. https://doi.org/10.1109/TGRS.2015.2393357

    Article  Google Scholar 

  36. Inria Dataset. https://project.inria.fr/aerialimagelabeling accessed 27 Oct 2021.

  37. Chen, Q., Wang, L., Wu, Y., Wu, G., Guo, Z., & Waslander, S. L. (2019). Aerial imagery for roof segmentation: A large-scale dataset towards automatic mapping of buildings. ISPRS Journal of Photogrammetry and Remote Sensing, 147, 42–55. https://doi.org/10.1016/j.isprsjprs.2018.11.011

    Article  Google Scholar 

  38. Van Etten, A., Lindenbaum, D., & Bacastow, T. M. (2018). Spacenet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232.

  39. Azimi, S. M., Henry, C., Sommer, L., Schumann, A., & Vig, E. (2019). Skyscapes fine-grained semantic understanding of aerial scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7393–7403). https://doi.org/10.1109/ICCV.2019.00749

  40. Yousaf, N., Hussein, S., & Sultani, W. (2021). Estimation of BMI from facial images using semantic segmentation based region-aware pooling. Computers in Biology and Medicine, 133, 104392. https://doi.org/10.1016/j.compbiomed.2021.104392

  41. Sagar, A., & Soundrapandiyan, R. (2021). Semantic segmentation with multi scale spatial attention for self driving cars. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (pp. 2650–2656). https://doi.org/10.1109/ICCVW54120.2021.00299

  42. Chen, Z., Wang, C., Li, J., Xie, N., Han, Y., & Du, J. (2021). Reconstruction Bias U-Net for Road Extraction From Optical Remote Sensing Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14, 2284–2294. https://doi.org/10.1109/JSTARS.2021.3053603

    Article  Google Scholar 

  43. Chen, L.-C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587.

  44. Lin, G., Milan, A., Shen, C., & Reid, I. (2017). RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 5168–5177). https://doi.org/10.1109/CVPR.2017.549

  45. Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 640–651. https://doi.org/10.1109/TPAMI.2016.2572683

    Article  Google Scholar 

  46. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 234–241). Springer. https://doi.org/10.1007/978-3-319-24574-4_28

  47. Behera, T. K., Bakshi, S., & Sa, P. K. (2021). Aerial Data Aiding Smart Societal Reformation: Current Applications and Path Ahead. IEEE IT Professional, 23, 82–88. https://doi.org/10.1109/MITP.2020.3020433

    Article  Google Scholar 

  48. Everingham, M., Van Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88, 303–338. https://doi.org/10.1007/s11263-009-0275-4

    Article  Google Scholar 

  49. IKONOS Dataset. http://www.isprs.org/data/ikonos/default.aspx Accessed 27 Oct 2021.

  50. WorldView-2Dataset (). https://earth.esa.int/web/guest/data-access/browse-data-products/-/article/worldview-2-full-archive-and-tasking accessed 27 Oct 2021.

  51. Yang, Y., & Newsam, S. (2010). Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 270–279). ACM. https://doi.org/10.1145/1869790.1869829

  52. Wada, K. (2016). labelme: Image Polygonal Annotation with Python. https://github.com/wkentaro/labelme

  53. Mottaghi, R., Chen, X., Liu, X., Cho, N.-G., Lee, S.-W., Fidler, S., Urtasun, R., & Yuille, A. (2014). The Role of Context for Object Detection and Semantic Segmentation in the Wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2014.119

  54. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014b). Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

  55. Aeroscapes Dataset. https://github.com/ishann/aeroscapes accessed 27 Oct 2021.

  56. Chen, Y., Wang, Y., Lu, P., Chen, Y., & Wang, G. (2018c). Large-Scale Structure from Motion with Semantic Constraints of Aerial Images. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV) (pp. 347–359). Springer. https://doi.org/10.1007/978-3-030-03398-9_30

  57. Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., & Bengio, Y. (2017). The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1175–1183). https://doi.org/10.1109/CVPRW.2017.156

  58. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90

  59. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4700–4708). https://doi.org/10.1109/CVPR.2017.243

  60. Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., & Adam, H. (2018a). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European conference on computer vision (ECCV) (pp. 801–818). volume 11211. https://doi.org/10.1007/978-3-030-01234-2_49

  61. Chaurasia, A., & Culurciello, E. (2017). LinkNet: Exploiting encoder representations for efficient semantic segmentation. In 2017 IEEE Visual Communications and Image Processing (VCIP) (pp. 1–4). https://doi.org/10.1109/VCIP.2017.8305148

  62. PyTorch Documents. https://pytorch.org/docs/stable/index.html accessed 27 Oct 2021.

  63. Kingma, D. P., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. arXiv preprint arXiv:1412.6980.

  64. ReLu Activation Function. https://www.tinymind.com/learn/terms/relu accessed 27 Oct 2021.

  65. Misra, D. (2019). Mish: A Self Regularized Non-Monotonic Activation Function. arXiv preprint arXiv:1908.08681, 4, 2.

  66. Giordan, D., Adams, M. S., Aicardi, I., Alicandro, M., Allasia, P., Baldo, M., et al. (2020). The use of unmanned aerial vehicles (UAVs) for engineering geology applications. Bulletin of Engineering Geology and the Environment, 79, 3437–3481. https://doi.org/10.1007/s10064-020-01766-2

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the following projects: 1. Project titled “Deep learning applications for computer vision task” funded by NITROAA with support of Lenovo P920 and Dell Inception 7820 workstation and NVIDIA Corporation with support of NVIDIA Titan V and Quadro RTX 8000 GPU. 2. Project titled “Applications of Drone Vision using Deep Learning” funded by Technical Education Quality Improvement Programme (referred to as TEQIP-III), National Project Implementation Unit, Government of India. 3. Project titled “PRediction of activities and Events by Vision in an Urban Environment,” (PRIN 2017 PREVUE) funded by the Italian Ministry of Education, University and Research, under grant 2017N2RK7K.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Aniello Castiglione or Brij Bhooshan Gupta.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Behera, T.K., Bakshi, S., Sa, P.K. et al. The NITRDrone Dataset to Address the Challenges for Road Extraction from Aerial Images. J Sign Process Syst 95, 197–209 (2023). https://doi.org/10.1007/s11265-022-01777-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-022-01777-0

Keywords

Navigation