A Lightweight Deep Learning Architecture for Vegetation Segmentation using UAV-captured Aerial Images

https://doi.org/10.1016/j.suscom.2022.100841Get rights and content

Highlights

  • Presents a deep learning architecture inspired from depthwise separable convolution.

  • Discusses architecture’s efficiency of having reduced number of trainable parameters.

  • Explains architectures ability to address vegetation segmentation problems.

  • Highlights the architectures use cases in robotics-based solutions for smart agriculture.

Abstract

The unmanned aerial vehicle (UAV)-captured panoptic remote sensing images have great potential to promote robotics-inspired intelligent solutions for land cover mapping, disaster management, smart agriculture through automatic vegetation detection, and real-time environmental surveillance. However, many of these applications require faster execution of the tasks to get the job done in real-time. In this regard, this article proposes a lightweight convolutional neural network (CNN) architecture, also known as LW-AerialSegNet, that helps preserve the network’s feed-forward nature by increasing the intermediate layers to gather more crucial features for the segmentation task. Moreover, the network uses the concept of densely connected architecture and depth-wise separable convolution mechanisms to reduce the number of parameters of the model that can be deployed in the internet of things (IoT) edge devices to perform real-time segmentation. A UAV-based image segmentation dataset NITRDrone Dataset and Urban Drone dataset (UDD) are used to evaluate the proposed architecture. It has achieved an intersection over union (IoU) of 82% and 71% on the NITRDrone datasets UDD, respectively, thereby illustrating its superiority among the considered state-of-the-art mechanisms. The experimental results indicate that the implementation of depth-wise separable convolutions helps reduce the number of trainable parameters significantly, making it suitable to be applied on edge-computing devices at a smaller scale. The proposed architecture can be deployed in real-life settings on a UAV to extract objects such as vegetation and road lines, hence can be used in mapping urban areas, agricultural lands, etc.

Introduction

The world has come a long way since the launch of the first-ever satellite Sputnik. It can be analyzed from data that the number of satellites operating across the earth stands at 6542, reforming our daily lives in various possible ways [1]. The data, especially image and video data collected from these airborne sensors, are increasingly becoming the backbone for conducting cutting-edge research in computer vision and remote sensing (RS) domain. Generally, these remote-sensing images are considered fuel for applications including vehicle/human/animal tracking [2], forest mapping [3], [4], environmental surveying [5], [6], natural disaster management [7], and so forth. However, satellite-based remote sensing has various disadvantages, such as high price, less revisiting time, and poor resolution in tropical regions, which open up space for UAV-inspired remote sensing.

Recently, UAVs have been widely used for RS-inspired tasks because of their cost-effectiveness, easy deployment, and customized flight planning, replacing satellite-based works at a smaller scale. In a short period, it has gained popularity and can be seen in various remote sensing applications such as urban mapping, traffic management, wildlife tracking, disaster management, precision agriculture, etc. [8], [9], [10]. Besides, the UAV-captured images could be exploited alongside satellite-inspired RS images for diversified applications. Typically, the UAV-inspired image analysis remains limited for computer vision tasks such as ground object detection–extraction. Nevertheless, to the best of our knowledge, very few works have been performed on semantic segmentation using UAV-captured images/videos [11], [12].

Segmentation is a crucial activity for image scene understanding and can be used in several applications [13], [14], [15]. It is one of the trivial tasks in computer vision assigning a class label to each pixel in an image. Semantic segmentation in UAV-inspired RS images provides solutions for developing autonomous robots capable of understanding semantic description [16]. Autonomy within the UAV is essential from the point of its path and navigation to avoid collision and no-fly zones. Besides these intrinsic tasks, semantic segmentation in UAV-captured images can be used to pursue remote sensing tasks such as urban mapping [17], forest and vegetation surveying [18], accessing damages caused due to natural disasters [19], and so forth. However, these small devices produce many high-resolution (VHR) images for the applications mentioned above. As the dataset size increases, it is merely impossible for the classical machine learning-based method to identify the essential features manually [20]. This opens space for the new age of technology, such as deep learning to play its part in managing the growing amount of remote sensing data.

Deep learning mechanisms such as CNNs have become a vital tool for remote sensing image/video analysis to address tasks such as terrain classification (image classification problem), traffic management (object detection problem), and urban and vegetation mapping (semantic segmentation problem). Typically, deep learning has been used for these remote sensing-based semantic segmentation tasks utilizing the remotely sensed images by satellites. However, the satellite-image-influenced deep architecture may not be suitable for the UAV-captured VHR images [21]. This can be analyzed from the fact that the images of UAVs contain more spatial and semantic details than satellite-captured images. The intensities of colors within various objects may become inhomogeneous. In contrast, the spatial details like the width and length of the same objects may vary in intra/inter images making the task more difficult in the case of UAV-inspired images. Hence, a robust framework must be developed to attend to critical tasks such as disaster management, autonomous flying, etc.

Some deep baseline architectures proposed for typical scene understanding have been used to attend to the UAV-inspired semantic segmentation tasks. However, these deeper networks usually suffer from three issues: degradation problems, overfitting issues, and vanishing gradient problems resulting in gradient disappearance in the deeper layers. Hence, it may return erroneous output as a result of improper training. These issues are common in deeper segmentation architectures such as fully convolutional neural network-based architectures (FCNs) [22]. Similarly, borrowing the concepts of fully convolutional networks, various baseline deep CNN architectures, such as U-Net [13], were proposed to perform pixel classification in medical images. It has become one of the popular frameworks and can be seen addressing many remote sensing image-inspired segmentation tasks [23]. However, these architectures also suffer various related issues, where the middle layers are unable to capture the essential features to generate the segmented image of input size. Moreover, because of their increased network size, they require a massive amount of computation, thereby becoming unsuitable for being implemented over IoT edge devices.

To solve these above-mentioned issues, the article proposed a lightweight encoder–decoder-based architecture that can overcome the problems faced by the state-of-the-art methods to retain the vital feature space to generate the segmented masks. The proposed deep framework is lightweight (as it has fewer trainable parameters) compared to the baseline architectures, enabling it to be implemented over IoT edge computing devices. Hence, it becomes an excellent option for real-time critical tasks like autonomous flying to handle emergency situations during natural disaster management. The following are the main contributions of the paper:

  • 1.

    A deep CNN architecture has been proposed that uses the skip connection mechanism to maintain the propagation of gradient throughout the model to eliminate the vanishing gradient problem. Several cues have been borrowed from the dense architectures, and internal layers are replaced by partially dense modules that help reduce the trainable parameters.

  • 2.

    Unlike standard convolution, the proposed deep architecture uses depthwise separable convolution that not only helps the model overcome the overfitting problem but also reduces the number of trainable parameters, making it suitable for deploying IoT-edge devices.

  • 3.

    A comparative analysis of different baseline architectures with various performance metrics is presented to show the superiority of the proposed architecture.

The rest of the manuscript is organized as follows: Section 2 presents the background research works in the domain of segmentation and aerial image parsing. The proposed deep architecture is properly described in Section 3. Section 4 discusses the step-by-step process for conducting experimentation to validate the proposed model. The results obtained from the experimentation are compared with the related state-of-the-art mechanisms in detail in Section 5. Finally, Section 6 concludes the manuscript and states the future directions.

Section snippets

Related work

This section discusses the prior research works on UAV-inspired remote sensing images in detail. It includes the work of UAV-based RS, such as object extraction, classical semantic segmentation, and new-age deep learning-inspired segmentation methodologies.

Proposed architecture

This section explores different stages of the proposed architecture in detail. The proposed framework is based on encoder–decoder end-to-end CNN architecture that can extract the important local and global features focusing on the segmenting objects such as roads and vegetation that comprise most of an aerial image.

Experiments

This section describes each entity and process through which experimentation has been conducted, illustrating the efficacy of the proposed architecture. The results obtained from the experimentation are compared with other state-of-the-art mechanisms, including U-Net [13], FCN-8s [22], FC_DenseNet-103 [44], SegNet [45]. The experimental flow is presented in Fig. 2.

Results and discussion

This section presents the results obtained for the proposed lightweight segmentation model and considered state-of-the-art frameworks. We have also analyzed and discussed various improvements achieved through the proposed approach over the baseline architectures in terms of quantitative results (with respect to the performance metrics presented in Eqs. (2)–(6)) and qualitative results.

Conclusions

In this article, we have proposed a lightweight convolution neural network for semantic segmentation for UAV remote sensing images, where depth-wise separable convolution modules replace the typical CNN modules. Extensive experimentation has been performed to demonstrate the impact of the proposed architecture on the UAV image dataset. Obtained results reveal that the proposed lightweight architecture has produced an excellent quality segmentation map on UAV remote sensing datasets with fewer

Abbreviations

CCE:Categorical Cross Entropy
CNN:Convolutional Neural Network
FCN:Fully Convolutional Network
FN:False Negative
FP:False Positive
GT:Ground Truth
IoU:Intersection of Union
ReLU:Rectified Linear Unit
RS:Remote Sensing
TN:True Negative
TP:True Positive
UAV:Unmanned Aerial Vehicle
UDD:Urban Drone Dataset

CRediT authorship contribution statement

Tanmay Kumar Behera: Conceptualization, Data curation, Methodology, Formal analysis, Investigation, Resources, Software, Validation, Writing – original draft, Writing – review & editing. Sambit Bakshi: Conceptualization, Project administration, Funding acquisition, Resources, Supervision, Validation, Writing – review & editing. Pankaj Kumar Sa: Conceptualization, Project administration, Supervision, Validation, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research work is assisted and supported by the projects as mentioned below:

  • a.

    Project entitled “Deep learning applications for computer vision task” funded by NITROAA with the support of Lenovo P920 and Dell Inception 7820 workstations and NVIDIA Corporation with the support of NVIDIA Titan V and Quadro RTX 8000 GPU cards.

  • b.

    Project entitled “Applications of Drone Vision using Deep Learning” funded by Technical Education Quality Improvement Programme (known as TEQIP-III), National Project

References (53)

  • Tovar-SánchezA. et al.

    Applications of unmanned aerial vehicles in Antarctic environmental research

    Sci. Report.

    (2021)
  • DaudS.M.S.M. et al.

    Applications of drone in disaster management: A scoping review

    Sci. Justice

    (2022)
  • ShresthaR. et al.

    6G enabled unmanned aerial vehicle traffic management: A perspective

    IEEE Access

    (2021)
  • CarusoA. et al.

    Collection of data with drones in precision agriculture: Analytical model and LoRa case study

    IEEE Int. Thin.

    (2021)
  • ZhengX. et al.

    Self-supervised pretraining and controlled augmentation improve rare wildlife recognition in UAV images

  • S.G. et al.

    Semantic segmentation of UAV aerial videos using convolutional neural networks

  • WangY. et al.

    Deep learning for semantic segmentation of UAV videos

  • RonnebergerO. et al.

    U-net: Convolutional networks for biomedical image segmentation

  • VermaU. et al.

    Segmentation and size estimation of tomatoes from sequences of paired images

    Eurasip J. Image Video Process.

    (2015)
  • CordtsM. et al.

    The cityscapes dataset for semantic Urban scene understanding

  • YuanK. et al.

    Deep-learning-based multispectral satellite image segmentation for water body detection

    IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

    (2021)
  • HuX. et al.

    Research on a single-tree point cloud segmentation method based on UAV tilt photography and deep learning algorithm

    IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

    (2020)
  • GuptaR. et al.

    RescueNet: Joint building segmentation and damage assessment from satellite imagery

  • ZhouH. et al.

    On Detecting Road Regions in a single UAV image

    IEEE Trans. Intell. Transp. Syst.

    (2017)
  • LongJ. et al.

    Fully convolutional networks for semantic segmentation

  • McGlinchyJ. et al.

    Application of UNet fully convolutional neural network to impervious surface segmentation in Urban environment from high resolution satellite imagery

  • Cited by (6)

    View full text