A Lightweight Deep Learning Architecture for Vegetation Segmentation using UAV-captured Aerial Images
Introduction
The world has come a long way since the launch of the first-ever satellite Sputnik. It can be analyzed from data that the number of satellites operating across the earth stands at , reforming our daily lives in various possible ways [1]. The data, especially image and video data collected from these airborne sensors, are increasingly becoming the backbone for conducting cutting-edge research in computer vision and remote sensing (RS) domain. Generally, these remote-sensing images are considered fuel for applications including vehicle/human/animal tracking [2], forest mapping [3], [4], environmental surveying [5], [6], natural disaster management [7], and so forth. However, satellite-based remote sensing has various disadvantages, such as high price, less revisiting time, and poor resolution in tropical regions, which open up space for UAV-inspired remote sensing.
Recently, UAVs have been widely used for RS-inspired tasks because of their cost-effectiveness, easy deployment, and customized flight planning, replacing satellite-based works at a smaller scale. In a short period, it has gained popularity and can be seen in various remote sensing applications such as urban mapping, traffic management, wildlife tracking, disaster management, precision agriculture, etc. [8], [9], [10]. Besides, the UAV-captured images could be exploited alongside satellite-inspired RS images for diversified applications. Typically, the UAV-inspired image analysis remains limited for computer vision tasks such as ground object detection–extraction. Nevertheless, to the best of our knowledge, very few works have been performed on semantic segmentation using UAV-captured images/videos [11], [12].
Segmentation is a crucial activity for image scene understanding and can be used in several applications [13], [14], [15]. It is one of the trivial tasks in computer vision assigning a class label to each pixel in an image. Semantic segmentation in UAV-inspired RS images provides solutions for developing autonomous robots capable of understanding semantic description [16]. Autonomy within the UAV is essential from the point of its path and navigation to avoid collision and no-fly zones. Besides these intrinsic tasks, semantic segmentation in UAV-captured images can be used to pursue remote sensing tasks such as urban mapping [17], forest and vegetation surveying [18], accessing damages caused due to natural disasters [19], and so forth. However, these small devices produce many high-resolution (VHR) images for the applications mentioned above. As the dataset size increases, it is merely impossible for the classical machine learning-based method to identify the essential features manually [20]. This opens space for the new age of technology, such as deep learning to play its part in managing the growing amount of remote sensing data.
Deep learning mechanisms such as CNNs have become a vital tool for remote sensing image/video analysis to address tasks such as terrain classification (image classification problem), traffic management (object detection problem), and urban and vegetation mapping (semantic segmentation problem). Typically, deep learning has been used for these remote sensing-based semantic segmentation tasks utilizing the remotely sensed images by satellites. However, the satellite-image-influenced deep architecture may not be suitable for the UAV-captured VHR images [21]. This can be analyzed from the fact that the images of UAVs contain more spatial and semantic details than satellite-captured images. The intensities of colors within various objects may become inhomogeneous. In contrast, the spatial details like the width and length of the same objects may vary in intra/inter images making the task more difficult in the case of UAV-inspired images. Hence, a robust framework must be developed to attend to critical tasks such as disaster management, autonomous flying, etc.
Some deep baseline architectures proposed for typical scene understanding have been used to attend to the UAV-inspired semantic segmentation tasks. However, these deeper networks usually suffer from three issues: degradation problems, overfitting issues, and vanishing gradient problems resulting in gradient disappearance in the deeper layers. Hence, it may return erroneous output as a result of improper training. These issues are common in deeper segmentation architectures such as fully convolutional neural network-based architectures (FCNs) [22]. Similarly, borrowing the concepts of fully convolutional networks, various baseline deep CNN architectures, such as U-Net [13], were proposed to perform pixel classification in medical images. It has become one of the popular frameworks and can be seen addressing many remote sensing image-inspired segmentation tasks [23]. However, these architectures also suffer various related issues, where the middle layers are unable to capture the essential features to generate the segmented image of input size. Moreover, because of their increased network size, they require a massive amount of computation, thereby becoming unsuitable for being implemented over IoT edge devices.
To solve these above-mentioned issues, the article proposed a lightweight encoder–decoder-based architecture that can overcome the problems faced by the state-of-the-art methods to retain the vital feature space to generate the segmented masks. The proposed deep framework is lightweight (as it has fewer trainable parameters) compared to the baseline architectures, enabling it to be implemented over IoT edge computing devices. Hence, it becomes an excellent option for real-time critical tasks like autonomous flying to handle emergency situations during natural disaster management. The following are the main contributions of the paper:
- 1.
A deep CNN architecture has been proposed that uses the skip connection mechanism to maintain the propagation of gradient throughout the model to eliminate the vanishing gradient problem. Several cues have been borrowed from the dense architectures, and internal layers are replaced by partially dense modules that help reduce the trainable parameters.
- 2.
Unlike standard convolution, the proposed deep architecture uses depthwise separable convolution that not only helps the model overcome the overfitting problem but also reduces the number of trainable parameters, making it suitable for deploying IoT-edge devices.
- 3.
A comparative analysis of different baseline architectures with various performance metrics is presented to show the superiority of the proposed architecture.
The rest of the manuscript is organized as follows: Section 2 presents the background research works in the domain of segmentation and aerial image parsing. The proposed deep architecture is properly described in Section 3. Section 4 discusses the step-by-step process for conducting experimentation to validate the proposed model. The results obtained from the experimentation are compared with the related state-of-the-art mechanisms in detail in Section 5. Finally, Section 6 concludes the manuscript and states the future directions.
Section snippets
Related work
This section discusses the prior research works on UAV-inspired remote sensing images in detail. It includes the work of UAV-based RS, such as object extraction, classical semantic segmentation, and new-age deep learning-inspired segmentation methodologies.
Proposed architecture
This section explores different stages of the proposed architecture in detail. The proposed framework is based on encoder–decoder end-to-end CNN architecture that can extract the important local and global features focusing on the segmenting objects such as roads and vegetation that comprise most of an aerial image.
Experiments
This section describes each entity and process through which experimentation has been conducted, illustrating the efficacy of the proposed architecture. The results obtained from the experimentation are compared with other state-of-the-art mechanisms, including U-Net [13], FCN-8s [22], FC_DenseNet-103 [44], SegNet [45]. The experimental flow is presented in Fig. 2.
Results and discussion
This section presents the results obtained for the proposed lightweight segmentation model and considered state-of-the-art frameworks. We have also analyzed and discussed various improvements achieved through the proposed approach over the baseline architectures in terms of quantitative results (with respect to the performance metrics presented in Eqs. (2)–(6)) and qualitative results.
Conclusions
In this article, we have proposed a lightweight convolution neural network for semantic segmentation for UAV remote sensing images, where depth-wise separable convolution modules replace the typical CNN modules. Extensive experimentation has been performed to demonstrate the impact of the proposed architecture on the UAV image dataset. Obtained results reveal that the proposed lightweight architecture has produced an excellent quality segmentation map on UAV remote sensing datasets with fewer
Abbreviations
CCE : Categorical Cross Entropy CNN : Convolutional Neural Network FCN : Fully Convolutional Network FN : False Negative FP : False Positive GT : Ground Truth IoU : Intersection of Union ReLU : Rectified Linear Unit RS : Remote Sensing TN : True Negative TP : True Positive UAV : Unmanned Aerial Vehicle UDD : Urban Drone Dataset
CRediT authorship contribution statement
Tanmay Kumar Behera: Conceptualization, Data curation, Methodology, Formal analysis, Investigation, Resources, Software, Validation, Writing – original draft, Writing – review & editing. Sambit Bakshi: Conceptualization, Project administration, Funding acquisition, Resources, Supervision, Validation, Writing – review & editing. Pankaj Kumar Sa: Conceptualization, Project administration, Supervision, Validation, Writing – review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research work is assisted and supported by the projects as mentioned below:
- a.
Project entitled “Deep learning applications for computer vision task” funded by NITROAA with the support of Lenovo P920 and Dell Inception 7820 workstations and NVIDIA Corporation with the support of NVIDIA Titan V and Quadro RTX 8000 GPU cards.
- b.
Project entitled “Applications of Drone Vision using Deep Learning” funded by Technical Education Quality Improvement Programme (known as TEQIP-III), National Project
References (53)
- et al.
Remote sensing of coastal hydro-environment with portable unmanned aerial vehicles (pUAVs) a state-of-the-art review
J. Hydro-Environ. Res.
(2021) - et al.
Multi-stage cascaded deconvolution for depth map and surface normal prediction from single image
Pattern Recognit. Lett.
(2019) - et al.
Deep learning-based object detection in low-altitude UAV datasets: A survey
Image Vis. Comput.
(2020) - et al.
Deep neural network based date palm tree detection in drone imagery
Comput. Electron. Agric.
(2022) - et al.
Growing status observation for oil palm trees using Unmanned Aerial Vehicle (UAV) images
ISPRS J. Photogramm.
(2021) - et al.
Vegetation extraction from UAV-based aerial images through deep learning
Comput. Electron. Agric.
(2022) Satellite statistics
(2022)- et al.
Fast vehicle detection in UAV images
- et al.
Deep learning based supervised image classification using UAV images for forest areas classification
J. Indian Soc. Remote. Sens.
(2021) - et al.
Urban forest monitoring based on multiple features at the single tree scale by UAV
Urban for. Urban Green.
(2021)
Applications of unmanned aerial vehicles in Antarctic environmental research
Sci. Report.
Applications of drone in disaster management: A scoping review
Sci. Justice
6G enabled unmanned aerial vehicle traffic management: A perspective
IEEE Access
Collection of data with drones in precision agriculture: Analytical model and LoRa case study
IEEE Int. Thin.
Self-supervised pretraining and controlled augmentation improve rare wildlife recognition in UAV images
Semantic segmentation of UAV aerial videos using convolutional neural networks
Deep learning for semantic segmentation of UAV videos
U-net: Convolutional networks for biomedical image segmentation
Segmentation and size estimation of tomatoes from sequences of paired images
Eurasip J. Image Video Process.
The cityscapes dataset for semantic Urban scene understanding
Deep-learning-based multispectral satellite image segmentation for water body detection
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
Research on a single-tree point cloud segmentation method based on UAV tilt photography and deep learning algorithm
IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
RescueNet: Joint building segmentation and damage assessment from satellite imagery
On Detecting Road Regions in a single UAV image
IEEE Trans. Intell. Transp. Syst.
Fully convolutional networks for semantic segmentation
Application of UNet fully convolutional neural network to impervious surface segmentation in Urban environment from high resolution satellite imagery
Cited by (6)
ETNAS: An energy consumption task-driven neural architecture search
2023, Sustainable Computing: Informatics and SystemsATS-YOLOv7: A Real-Time Multi-Scale Object Detection Method for UAV Aerial Images Based on Improved YOLOv7
2023, Electronics (Switzerland)VEDAM: Urban Vegetation Extraction Based on Deep Attention Model from High-Resolution Satellite Images
2023, Electronics (Switzerland)An adaptive multichannel DeepLabv3 + for semantic segmentation of aerial images using improved Beluga Whale Optimization Algorithm
2023, Multimedia Tools and ApplicationsSuperpixel-Based Multiscale CNN Approach Toward Multiclass Object Segmentation from UAV-Captured Aerial Images
2023, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing