Mathematical Methods for IoT-Based Annotating Object Datasets with Bounding Boxes

Zaidi, Abdelhamid

doi:https://doi.org/10.1155/2022/3001939

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusion Appendix Data Availability Disclosure Conflicts of Interest References Copyright Related Articles

Special Issue

Mathematical Methods for IoT-based Signal and Image Processing Applications

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 3001939 | https://doi.org/10.1155/2022/3001939

Mathematical Methods for IoT-Based Annotating Object Datasets with Bounding Boxes

Abdelhamid Zaidi¹

Academic Editor: Savita Gupta

Received21 May 2022

Revised29 Jun 2022

Accepted11 Jul 2022

Published23 Aug 2022

Abstract

Object datasets used in the construction of object detectors are typically annotated with horizontal or oriented bounding rectangles for IoT-based. The optimality of an annotation is obtained by fulfilling two conditions: (i) the rectangle covers the whole object and (ii) the area of the rectangle is minimal. Building a large-scale object dataset requires annotators with equal manual dexterity to carry out this tedious work. When an object is horizontal for IoT-based, it is easy for the annotator to reach the optimal bounding box within a reasonable time. However, if the object is oriented, the annotator needs additional time to decide whether the object will be annotated with a horizontal rectangle or an oriented rectangle for IoT-based. Moreover, in both cases, the final decision is not based on any objective argument, and the annotation is generally not optimal. In this study, we propose a new method of annotation by rectangles for IoT-based, called robust semi-automatic annotation, which combines speed and robustness. Our method has two phases. The first phase consists in inviting the annotator to click on the most relevant points located on the contour of the object. The outputs of the first phase are used by an algorithm to determine a rectangle enclosing these points. To carry out the second phase, we develop an algorithm called RANGE-MBR, which determines, from the selected points on the contour of the object, a rectangle enclosing these points in a linear time. The rectangle returned by RANGE-MBR always satisfies optimality condition (i). We prove that the optimality condition (ii) is always satisfied for objects with isotropic shapes. For objects with anisotropic shapes, we study the optimality condition (ii) by simulations. We show that the rectangle returned by RANGE-MBR is quasi-optimal for the condition (ii) and that its performance increases with dilated objects, which is the case for most of the objects appearing on images collected by aerial photography.

1. Introduction

The construction of an object detector generally goes through a learning phase, followed by a testing phase, and ends with a tuning phase. Each phase requires an independent annotated object dataset. Annotating an image IoT-based signal and image processing applications consists of locating all the objects present in this image and determining their categories. The way to locate an object varies from detector to detector. For example, mask R-CNN detector uses segmentation mask to locate objects [1], CoKe detector uses key points and landmarks to locate objects [2], and poly-YOLO detector represents objects using polygons [3]. However, the rectangle is considered to be the simplest and most used polygonal shape for locating or representing an object in many computer vision applications for IoT-based [4]. Locating an object using a rectangle consists of drawing a rectangle surrounding this object. The annotation of an object dataset is performed using free or commercial software designed for a particular annotation choice.

There are two types of annotations with rectangles depending on the orientations of the objects in the images for IoT-based. To locate objects, the first type uses horizontal bounding rectangles (HBR), while the second type uses oriented bounding rectangles (OBR). Annotation with horizontal rectangles is suitable for natural scenes, where the photographer is usually in front of the object and adjusts their camera so that the objects appear aligned with the horizontal edges of the image. By contrast, for aerial photography, where images are captured by Earth observation satellites or other flying devices, objects often appear in the image with arbitrary orientations.

The optimality of an annotation results in the satisfaction of two conditions: (i) the bounding rectangle must cover the whole of the object and (ii) the area of the bounding rectangle must be minimal. For a horizontal object, it is easy to reach the optimal HBR that satisfies conditions (i) and (ii) mentioned above. On the other hand, for an oriented object, it is difficult to reach the optimal OBR if the annotation method does not render account of optimality conditions (i) and (ii).

In the literature, we distinguish two types of object detectors. The first type includes horizontal object detectors, which detect objects using HBR. The second type includes oriented object detectors, which detect objects using OBR. Any detector of the first class is trained, tested, and tuned on horizontal object datasets (HOD). On the other hand, some oriented object detectors are trained and tuned on HOD, such as OAOD [5] and BBAVectors [6], but others are trained tuned on to oriented object datasets (OOD), such as RoI transformer [7], R-RoI [8], RRPN [9], R2CNN++ [10], DMPNet [11], FOTS [12], RPN [13], and DDR [14]. However, all of these oriented object detectors must be tested on OOD.

A large-scale HOD (OOD, respectively) is a set made up of a large number of images annotated with HBR (OBR, respectively). Each image contains objects of a wide variety of scales, orientations, and shapes. These objects are divided into classes (or categories) that vary from one dataset to another.

Table 1 (Table 2, respectively) presents the number of classes, instances, images, and year of creation of the most cited HOD (OOD, respectively) in the literature. To our knowledge, DOTA is the largest public Earth vision object detection dataset [27]. It contains objects exhibiting a wide variety of scales, orientations, and shapes. Moreover, images of DOTA are manually annotated by experts in aerial image interpretation.

2. State of the Art

In order to generate a dataset for object identification, we need to gather a large number of photographs, all of which must be annotated using the same technique and organized in accordance with a set of categories that have been predetermined. The dataset has to include many photographs that match to various instances of each object class. These images must be included for each object class. The currently available annotation techniques are amenable to being split up into two primary categories. In the first class, all methods of manual annotation are grouped together, and in the second class, all techniques of semi-automatic annotation are categorized together [28].

2.1. Manual Annotation

There are two main methods of manually annotating objects with bounding boxes, which are used in the construction of most large-scale object datasets. The first one is called the consensus method, and the second one is called the sequential tasks method. For each instance of an object in an image, the first method asks several annotators to draw a rectangle around the object and then defines the position of the object by the rectangle elected by the majority of annotators. To annotate an object by the second method, we need at least three annotators. The first one is asked to draw a rectangle around a single instance of the object. The task of the second annotator is to validate the drawn rectangle. The third person investigates whether additional object class instances require annotation. Determining a precise bounding rectangle takes more time and resources than validating the annotation, and thus, the consensus technique is more efficient. Annotation quality affects the development of accurate object detectors. The determination of a large-scale object dataset requires expert annotators, significant working time, and remuneration in line with the desired accuracy. As examples of common use datasets, annotated with the manual methods, we cite PASCAL VOC [18], MS COCO [17], ImageNet [19], DOTA [20], etc.

Although studies on the description of OOD are abundant, very few of them explain how to draw a rectangle enclosing an oriented object. The only annotation method explicitly described in the literature appears in the article [27] by Ding et al. This method consists in drawing a HBR around the object of interest and then adjusting the angle manually by rotating the rectangle with the mouse.

2.2. Semi-Automatic Annotation

The traditional semi-automatic approach of annotating objects with rectangles consists of four phases. First phase: in this phase, the images of the object dataset are divided into two subsets of unequal sizes. The smallest subset is annotated with rectangles by the manual method. Second phase: the second phase consists of choosing an object detector and training this detector with the images from the dataset already annotated. Third phase: once the detector is ready, it will be used to annotate the images of the second object dataset with the prediction rectangles. Fourth phase: during this phase, the annotator validates the correctly annotated objects during the third phase and manually draws the bounding rectangles of the poorly annotated objects.

As this study is limited to locating objects by bounding boxes, we cite the Faster Bounding Box (FBB) as an example of semi-automatic annotation method [16]. This method was used to generate the Tampere University of Technology (TUT) Indoor dataset. The largest subset is annotated with prediction rectangles, generated by the Faster R-CNN object detector [29], trained on the smallest annotated subset.

2.3. Performance Measure of Detectors Using Rectangles

The evaluation of the effectiveness of the object detector is then performed on the second subset of the annotated dataset. Since we are only dealing here with annotations using bounding boxes, the most used criterion to compare two rectangles is the Jaccard’s similarity index, also known as the Intersection over Union (IoU). For each image of the testing dataset, the IoU measures the percentage of overlap between the prediction bounding rectangle generated by the detector, and the ground truth bounding box , as follows:

We note that the score lies in the interval . The closer this index gets to 1, the better the detection of the object. We say that an object is well detected (or true positive) if the score is greater than or equal to a threshold discussed by experts. In most studies, the threshold value is set at 0.5 [30]. The score causes a problem when its value is zero. This value is embarrassing since it does not explain how far is the prediction bounding rectangle from the ground truth bounding box. To work around this problem, Rezatofighi et al. [30] suggest replacing the with the Generalized Intersection over Union index. The between two rectangles and is defined as follows:where is the smallest convex set enclosing both and .

The exact IoU computation between two HBR is simple. Much software integrates functions to calculate this index. On the other hand, the exact IoU computation between two OBR is not as simple as that between two HBR. Liu et al. (Yao et al., respectively) proposed in Ref. [31] (in [32]) heuristic to estimate the IoU between two OBR. Recently, Zaidi has shown in Ref. [28] that these heuristics give reliable results only when the centers of the rectangles are very close and the angle between the rectangles is small. Moreover, he has developed in Ref. [28] an algorithm that calculates the exact IoU value between two OBR, as well as an estimator of the IoU.

The annotation method has a direct bearing on the performance measurement of an object detector. The OOD reserved for the test phase must be carefully annotated so that the IoU computation between the ground truth and prediction rectangles is not biased.

3. Motivation for the Study

The manual or semi-automatic annotation methods described in Section 2 have made it possible to build many large-scale object datasets. These datasets have given rise to very powerful object detectors. However, we cannot ignore the following concerns:(1)According to paragraph A, accurate manual annotation is expensive, time-consuming, and requires annotation experts.(2)In many situations, the annotator can be in front of an object with an ambiguous orientation as shown in Figure 1. In this case, he will take a considerable time to decide whether the object will be annotated with a horizontal rectangle or an oriented rectangle. Moreover, in both cases, the final decision is not based on any objective argument and depends solely on the dexterity of the annotator.(3)The semi-automatic annotation defined in paragraph B does not guarantee the optimality of the annotation rectangle, in the sense that it must have a minimum area and cover the whole of the object.(4)According to paragraph C, the use of any approximation methods of , whether to measure the accuracy of an oriented object detector or to compare the performance of two oriented object detectors, could lead to biased results. Indeed, the function is strictly increasing over the interval , where and denote the areas of and , respectively. Therefore, a large or a small ground truth bounding rectangle directly induces a bias in the computation of the IoU.

(a)

(b)

The contribution of this study is the scarcity of annotation methods motivated us to develop a robust semi-automatic annotation method of OOD. Our method is semi-automatic because it consists of a manual step followed by a computer-assisted step. In addition, this method is robust because the bounding rectangle generated by our algorithm is insensitive to the dexterity of the annotator. More precisely, the steps are as follows:(i)We develop an algorithm called RANGE-MBR, which determines from a set of the most relevant (in the sense given by Definition 1) points picked on the object outline, a rectangle enclosing and having a quasi-minimal area, in time.(ii)We propose a new approach to simultaneously build HOD and OOD from a large-scale image bank, based on both the RANGE-MBR algorithm and threshold angles.(iii)We conduct a large experimental study to quantify the performance of the RANGE-MBR algorithm.(iv)We compare the performance of RANGE-MBR to that of the benchmark algorithm, RC-MBR, which determines the minimum rectangle enclosing in time.

This study seems essential to us because it responds to a real need in the development of oriented object detectors. In addition, it allows researchers to build their own oriented object datasets in a rigorous manner.

Definition 1. (relevant point of an object) Any point located on the contour of the object is said to be relevant if it is a local maximum, a local minimum, the most right point, the most left point, a cusp of the first type, a cusp of the second type, an inflection point, and so on.
The remainder of this manuscript is structured as follows. Section 4 deals with the parametrization of rectangles. Section 5 is devoted to the development of the RANGE-MBR algorithm. We explain in Section 6 how to use the RANGE-MBR (or RC-MBR) algorithm to simultaneously generate horizontal and oriented datasets from a large-scale image bank. In Section 7, we perform various numerical experiments to evaluate the performance of the RANGE-MBR algorithm.

4. Parametrization of Bounding Rectangles

First, we give an overview of the different types of parameterization encountered in the literature. Then, we justify the choice of the parameterization associated with the annotation method that we propose in this study. Our choice of parameterization was discussed in a previous study on the accurate computation of the IoU [28]. However, it seems useful to us to recall this choice to facilitate the reading of the manuscript.

4.1. Parameterization of Horizontal Rectangles

The minimum parameterization of a horizontal rectangle requires four parameters. The object dataset PASCAL VOC [18] annotates the rectangle with , where are the coordinates of the vertex , (cf. Figure 2). The object dataset Ms COCO [17] annotates the rectangle with , where and denote the lengths of the line segments and , respectively, and are the coordinates of the vertex (cf. Figure 2). The object dataset ImageNet [19] annotates the rectangle with , where and denote the lengths of the line segments and , respectively, and are the coordinates of the center of (cf. Figure 2).

4.2. Parameterization of Oriented Rectangles

Five parameters—, and —are necessary for the minimal parameterization of an orientated rectangle denoted by the formula ; these are the only parameters that are required. The coordinates for the center of are , which correspond to the coupling and . The lengths of the line segments and are indicated by the parameters and , respectively (cf. Figure 2). The parameter is used to determine the acute angle that exists between the lines and , where denotes the horizontal line that goes through . These two object datasets, HRSC2016 [22] and UCAS-AOD [23], are annotated with the help of this parametrization.

The above -based parametrization is not suitable for annotating satellite image datasets. These OOD have the particularity of containing a large number of instances per image, at different scales, and whose parts overlap [20]. A simple solution that overcomes the drawbacks of the -based annotation consists in defining a rectangle using the coordinates of the vertices , respectively. To solve the indeterminacy problem linked to the permutations of the vertices, we can sort the vertices clockwise and fix the first point according to the following rules [27]:(i)For objects with a distinguished head and a tail (e.g., vehicles and helicopters), the annotator carefully selects to indicate the left corner of the instance head.(ii)For other objects (e.g., tennis courts and bridges), is the point at the top left of the instance.

This type of parametrization was used for the annotation of the DOTA object dataset [20].

4.3. Our Choice of Parametrization

We use the notation to denote an oriented rectangle whose vertices are , , , and . We assume that these vertices are sorted in counterclockwise order, such that is the left vertex with the smallest vertical coordinate.

To sort the vertices set of the rectangle as described above, we first consider the barycenter of the rectangle . Then, we calculate the polar angle of each vertex relative to the barycenter [28]. We denote by , the sorted vertices in ascending order of their polar angle. If , then , else (cf. Figure 2). For more details on the implementation of this sorting method, we refer the reader to our article [28].

In this investigation, we describe a bounding rectangle by using the eight-tuple , where are the coordinates of the -th vertex of the sorted rectangle .

This choice of parameterization is ideally suited to the annotation approach that we suggest since it eliminates the need for extra computations to draw rectangles or to locate the images of rectangles by applying affine transformations. This is one of the reasons why the method is so efficient. In addition, we demonstrate in Section B how to format the outputs of the annotation technique that we propose in accordance with the annotation rule of the DOTA dataset. This is done by referring to the annotation rule.

5. Determination of the Bounding Rectangle

It is natural to annotate the object with the minimum rectangle enclosing the set of relevant points. This problem is formulated as follows:

Problem 1. Given a set of points , we find a rectangle with the smallest area, enclosing . Such a rectangle is called a minimum bounding rectangle of and denoted by MBR .
The solution of problem 5.1 is not unique as shown in Figure 3. So, MBR denotes any solution of Problem 1.
Freeman and Shapira proved in 1975 that one edge of MBR must be collinear with an edge of the convex hull (CH ) of . They proposed in Ref. [33] a natural algorithm to find MBR in time, based on sweeping all minimum rectangles, enclosing , and having an edge collinear with an edge of CH . In 1978, Shamos proposed in Ref. [34] the famous rotating calipers algorithm, which return all pairs of antipodal vertices of a -sided convex polygon in time. In 1983, Toussaint used the rotating calipers technique and developed the algorithm RC-MBR to find MBR in time [35]. In 2006, Dimitrov et al. proposed in Ref. [36, 37] the algorithm PCA-MBR, to approximate MBR , in time, by the minimum bounding rectangle, which is aligned with the eigenvectors of the covariance matrix of CH . They also proved that the relative error between PCA-MBR and MBR is bounded from above by . The main drawback of the algorithm PCA-MBR is that it admits an infinite number of solutions if the covariance matrix of CH has a double eigenvalue. To overcome the problem of nonuniqueness inherent in algorithms RC-MBR and PCA-MBR, we propose the method RANGE-MBR to approximate MBR in time.

5.1. The Range-MBR Algorithm

Let be a Cartesian frame of the two-dimensional affine plane. All angles we refer to here are measured counterclockwise from the positive -axis. Let be the line passing through and making an angle with the axis . Let be perpendicular to and passing through . Let and ( and , respectively) be the extreme points of the orthogonal projection of on (, respectively). Let be the minimum bounding rectangle having an edge collinear with . Then, are the intersection points of the two parallel lines to and passing through and , respectively, with the two parallel lines to and passing through and respectively. The area of is given by the product of lengths of the line segments and . From now on, the MBR enclosing and having an edge collinear with the direction making an angle from the positive -axis will be denoted by .

Let , and ) be two unit direction vector of and , respectively. Then, and are two Cartesian frames of the two-dimensional affine plane. For all , we denote by (, respectively) the coordinates of with respect to the frame (, respectively). Let be the rotation matrix of angle , and then , where is the transpose of .

Let be a random variable with mean and standard deviation . Let be observations on the variable . Let be the order statistics of . We define the range of by . Therefore, the lengths of and are given by:

We denote by any estimation of obtained from . On the basis of numerous works carried out on the estimation of the standard deviation from the sample range ([38–43]), we can assume that there is a real constant such that

Besides, it is well known thatwhich is an estimate of , where . It follows that there is a real constant such that:

We assume that the coordinates of the set of points are observations of two random variables and . Since the , then combining equations (6) and (3) gives an approximation of , satisfying the following relation:where . To alleviate notions, we will also use and to designate and . Since , we obtain

Using (8) and some trigonometric identities, we prove thatwhere and

Combining equations (7), (9), and (10) givesThus, the area of MBR is approximately equal toTherefore, we propose to approximate MBR by MBR such that:

The first derivative of with respect to is given by:where

The function has a unique critical point such that

Lemma 1. The second derivative of at the critical point has the same sign as .

Proof. The second derivative of with respect to is given by:Multiplying both sides of the relation (17) by yieldsMultiplying both sides of the relation (14) by yieldsUsing relations (19) and (18) with (i.e., ), we obtain:Therefore, has the same sign as . Since , then what ever the set may be. Thus, has the same sign as .
Therefore, the determination of goes through six cases: Case 1: if , then . Case 2: if and , then is determined through the eigen decomposition of the empirical covariance matrix of . This case is discussed below. Case 3: if and , then . Case 4: if and , then . Case 5: if and , then . Case 6: if and , then it is equivalent to and . This case is discussed below.

5.1.1. Illustrative Example of Case 6

It is difficult to illustrate all the scenarios, which fall into Case 6. The special case of regular polygons is the one that the annotator may encounter when annotating regular objects such as road signs, buildings, and human faces. Moreover, we have pointed out this case to warn the user to pay attention to regularly shaped objects when using our annotation method.

Proposition 1. If the elements of are the vertices of a regular -sided polygon, then and .

Proof. In the general case, the vertices of a regular polygon are uniformly distributed over a circle with radius and center equal to the barycenter of . Thus, without loss of generality, we can assume that , , and the vertex of isConsider the complex sequence , where and is the imaginary unit. Then,Since , then , andSince , thenBesides, is the real part ofThus, . Moreover, using , we deduce thatSince is the imaginary part ofthen .
Note that in the case where the elements of are the vertices of a regular -sided polygon, problem 5.1 has exactly different solutions , . For all , is the image of by the rotation of angle .

5.1.2. Handling of the Exceptional Cases 2 and 6

Cases 2 and 6 are indeterminate cases for which the minimum of the objective function does not exist. This is why we have thought of a trick to get out of the indeterminacy of each case, by returning to the initial definition of the solution.(i)For Case 2, the application of an affine isometry to the set of points makes it possible to migrate to another case for which the angle is well determined. We then obtain the solution of the initial problem by applying the inverse isometry to the solution of the transformed problem, since isometry preserves the areas.(ii)For Case 6, the use of an extra point belonging to the convex hull of the set , allows to migrate to another case for which the angle is well determined. Another alternative consists in asking the user to click on a point of the object’s outline until we get out this state.

Case 2: is negative because of . To get rid of , we have just to follow the same reasoning on a set of pointswhose coordinates and are uncorrelated. Letbe the covariance matrix of the elements of , where . Let be an eigen decomposition of , where (the set of 2-orthogonal matrices), and . For all , set and . Then,twhere . Since , then the linear map defined by preserves the norm and the dot product. Thus, and MBR is the image of MBR under the map .

Proposition 2. If the vectors and are uncorrelated and have different variances , then has a unique minimum at the angle .

Proof. Since and are uncorrelated, then , and and . According to equation (16), the function has a unique critical point . Using Lemma 1, is the maximum of . Case 6. To get out of the indeterminacy of Case 6, we propose to add to the set an artificial point located in the convex hull of so that the empirical covariance matrix of the new set is different from a scalar matrix. We assume that are the Cartesian coordinates of with respect to , where is the barycenter of . Without any loss of generality, we assume that (, respectively) is the vector of the -coordinates (-coordinates, respectively) of the points of with respect to . We denote by and . Lemma 2 gives the relation between and , respectively.

Lemma 2. If and , then

Proof. We set and . Since , then and . On the one hand, since , we haveOn the other hand, since , we have:Let be the function defined from in the same way that was defined from in (10). Then,whereUsing the identities (31), we haveIt follows that(1) i.f.f , where are the lines whose Cartesian equations are given by(2) i.f.f , where and .In summary, any point defined as a convex combination of , such that overcomes the indeterminacy problem posed by Case 6. However, the resulting RANGE-MBR depends on the values of and . We tested this technique on a set composed of vertices of a regular -sided polygon, and we observed that the choiceWe allow our algorithm to reach MBR whatever the value of and . The generalization of the technique applied to the vertices of a regular polygon requires the calculation of CH , and the determination of the midpoint, , of any two consecutive vertices of CH , such that the product of the coordinates of with respect to , is different from 0.

5.2. Annotate the DOTA Dataset Using Range-MBR

To adapt our annotation method to the output format required by the DOTA dataset, it suffices to ask the annotator that the first click points to the object’s head, and to add a binary variable equals to 1 if we can differentiate the head of the object from its tail, and zero otherwise. When we pass the set to the RANGE-MBR algorithm, we save the coordinates of the first point . Next, we determine the rectangle .(1)If the binary variable equals one, then we determine the vertex of that is closest to , and we permute the vertices of clockwise so that occupies the first position in this permutation.(2)If the binary variable equals zero, then we determine the top left vertex of , and we permute the vertices of clockwise so that occupies the first position in this permutation.

6. Robust Semi-Automatic Annotation

In this section, we provide a new approach for building object detection datasets based on both MBR algorithm and threshold angles. This method is semi-automatic because it consists of a manual step followed by a computer-assisted step. In addition, this method is robust because the bounding rectangle generated by our algorithm is insensitive to the dexterity of the annotator.(i)The problem of MBR was dealt with in Section 5. If the user needs an optimal annotation in the sense given by Definition 2, then he calls the RC-MBR algorithm. Otherwise, he calls the RANGE-MBR algorithm for quasi-optimal annotation. However, optimality and complexity are inversely proportional.(ii)By default, the threshold angle is equal to zero. It can also be adjusted by experts in object detection, or it can be defined experimentally as the largest angle between the ground truth bounding box and the positive -axis that gives no significant difference between the performance of horizontal and oriented detectors when tested on oriented objects. The experimental determination of the threshold angle requires horizontal and oriented object detectors, as well as the already existing oriented object datasets.

6.1. Properties of the Robust Semi-Automatic Annotation Method

By construction, the robust semi-automatic annotation method ensures the following properties:(1)The bounding rectangle generated by our approach is quasi-optimal in the sense that it covers the whole object, and its area is close to the area of the MBR enclosing the object.(2)The angle of the rectangle is determined by an algorithm based on some relevant points collected on the contour of the object.(3)The bounding rectangle is insensitive to annotators provided all relevant points on the object have been selected.(4)The determination of the bounding rectangle requires elementary operations, where is the number of relevant points.(5)It allows the user to build simultaneously, from an image bank, two databases: one for horizontal objects and another for oriented objects.

In summary, the robust semi-automatic annotation provides a simple solution to all the drawbacks mentioned in paragraph III, which are inherent in the old annotation methods.

Let:(i) be a threshold angle fixed by the user (by default, ),(ii) be the set of relevant points selected on the contour of the object,(iii)H-MBR be the horizontal minimum bounding rectangle enclosing ,(iv)O-MBR be the minimum bounding rectangle enclosing ,(v) be the angle of O-MBR ,

then, the Algorithm 1, and the flowchart shown in Figure4 summarize the stages of construction of horizontal and oriented datasets, from a large-scale image bank.

	Input: An image; threshold angle; HOD and OOD datasets
	Output: Annotated image assigned to HOD or OOD
(1)	for (each object in the image) do
(2)	Determine the class of the object
(3)	Ask the annotator to click on the most relevant points of the object as defined in Definition 1
(4)	Determine the RANGE-MBR enclosing as described in Section A
(5)	Set the angle between the positive -axis and
(6)	if then
(7)	Assign (image, object, , class) to the OOD dataset
(8)	else
(9)	Determine the horizontal MBR enclosing the relevant points, as shown in Figure 5, by setting
(10)	Assign (image, object, , class) to the HOD dataset
(11)	end if
(12)	end for

7. Experimental Study

This experimental study was carried out exclusively with the Matlab R2007b software. We implemented the RC-MBR, PCA-MBR, and RANGE-MBR algorithms in MATLAB language, and we wrote a script (see the Appendix), which(1)reads the image then displays it(2)ask the annotator to click on the most relevant points of the object ( is a matrix)(3)determine the rectangle corresponding to RC-MBR (4)determine the rectangle corresponding to RANGE-MBR (5)draw the two rectangles with the colors red and green, respectively

The images used in experiments B,…,D are free of rights and collected on the net. Moreover, the optimality criterion of an annotation is given in Definition 2.

Definition 2. (optimal annotation) A rectangle enclosing an object is said to be optimal, if it fulfills the conditions (i) and (ii):(i)the rectangle covers the whole object,(ii)the area of is minimal.

7.1. Experiment 1

This experiment consists of studying the optimality condition (ii) of the RANGE-MBR algorithm. Equation (28) describes how to generate the coordinates of the vertices of a random -sided polygon where each is call of a generator of uniform random numbers on the interval . The parameter , called the factor of dilation, controls the aspect ratio of the polygon as shown in Figure 6. The more is greater than 1, the more the polygon is dilated in the direction of the -axis. Note that such a polygon is not necessarily convex as shown in Figure 7. This is also the case for any polygon whose vertices are defined by the relevant points of an object.

Since(1)the optimal annotation criterion that we have chosen results in a rectangle which has a minimum area and which covers the maximum of visible parts of the object,(2)RC-MBR is the fastest algorithm that determines the smallest rectangle enclosing a set of points,

it seems natural to consider this algorithm as a reference in the comparative study that we carried out.

For each value of the pair , we generate random -sided polygons. For each polygon , , we determine the relative error , between the areas of RANGE-MBR and RC-MBR , as well as the CPU times and , used by the algorithms RANGE-MBR and RC-MBR to compute RANGE-MBR and RC-MBR , respectively. Finally, we denote by , , , and , the means and the standard deviation of the sequences , , , and respectively.

Figure 8 and 9 represent and versus and for and . We have retained the ranges of values for and for , because in our estimation, they are those which correspond most to reality. In practice, rare are objects that have more than 20 relevant points or less than 4 relevant points.

We deduce from Figures 8 and 9 that:(1)The more the polygon is dilated in one direction, the more the algorithm RANGE-MBR is accurate and precise(2)The more vertices the polygon has, the more the algorithm RANGE-MBR is accurate and precise

We ran other simulations with larger values of and , and we got the same previous conclusion. The parameter controls the dilation of the polygon in the vertical direction. In fact, we could choose any other direction, since the RANGE-MBR algorithm is not sensitive to the direction of expansion. On the other hand, it is sensitive to the number of vertices of the polygon and to its dilation.

Out of the 680000 generated polygons, we do not encounter any case for which and . In addition, the algorithm RANGE-MBR is 9 times faster than algorithm RC-MBR. The mean CPU times of RANGE-MBR and CR-MBR are both independent of and . It is about seconds for the algorithm RANGE-MBR, and seconds for the algorithm RC-MBR. Since the complexity of the algorithm RANGE-MBR is , and that of the algorithm RC-MBR is , then the difference in CPU time, for each algorithm, is only observed if is large enough. Based on the response time per click given in [44] and considering the mean CPU time of the RANGE-MBR algorithm, we can state that determining a bounding rectangle requires seconds, where is the number of relevant points collected on the contour of the image.

7.2. Experiment 2

We made a comparison between RANGE-MBR and PCA-MBR based on sets of vertices of regular -sided polygons, with . We observe that the relative error of the algorithm RANGE-MBR is always equal to 0, while that of the algorithm PCA-MBR is different from 0 for the values of reported in Table 3. Although Case 6 does not contain only regular polygons, this experiment shows that the RANGE-MBR method achieves optimality on regular polygons, but for the other shapes (which go in Case 6), it offers a better solution than that obtained by PCA-MBR.

7.3. Experiment 3

This experiment consists in studying the effect of the dilation factor on the performance of the RANGE-MBR algorithm. Figure 10 represents a basic experiment. The relevant points are colored in yellow. We observe that RANG-MBR is close to RC-MBR . Although the bird on the left is smaller than the bird on the right, the relative error equals for the bird on the right, and for the bird on the left. These scores are expected, since the bird on the right has a more elongated shape than the bird on the left, and RANGE-MBR is more effective on dilated objects.

7.4. Experiment 4

This experiment consists in confirming the conclusion obtained in the experiment D. Figure 11 contains two mangoes. The one on the left is almost circular, while the one on the right is clearly elongated. The relevant points are colored in yellow. The red rectangle corresponds to RANG-MBR , while the green rectangle corresponds to RC-MBR . The relative error equals for the left mango and for the right mango. This real example agrees with the simulation results shown in Figure 8. The mango on the right is more dilated than the one on the left that is why the relative error for the circular mango is greater than relative error for the oval mango.

7.5. Experiment 5

In order to verify the robustness of our annotation method with respect to the annotator, we asked a colleague to click on the relevant points of the two mangoes. Figure 12 illustrates the result of this experiment. The relative error equals for the left mango, and for the right mango. The relative errors reported in Figure 12 are larger than those reported in Figure 11. We explain this difference by the number of points used in each experiment: the more points we use, the more the RANGE-MBR algorithm reduces the relative error. This real example agrees with the simulation results shown in Figure 8.

7.6. Experiment 6

This experiment consists in comparing our annotation method to the FBB method introduced in Section B. For this, we have chosen a random image from the TUT indoor dataset, and we have annotated the objects it contains with FBB, RC-MBR, and RANGE-MBR. Note that the FBB annotation uses blue rectangles, RC-MBR annotation uses green rectangles, RANGE-MBR annotation uses red rectangles, and the relevant points are marked in yellow.

Based on Figure 13 and Table 4, it can be seen that the annotation by the FBB method is not optimal. Indeed,(i)For the upper extinguisher and the exit sign, condition (i) is violated(ii)For the lower extinguisher, condition (ii) is violated

In addition, the annotation by the RANGE-MBR method satisfies condition (i) and the relative error between the area of RANGE-MBR and RC-MBR is for the exit sign, for the upper extinguisher, and for the lower extinguisher. We can conclude that condition (ii) is almost satisfied by the RANGE-MBR method.(i)The values of the relative error are in agreement with the simulation results presented in Figure 8. Indeed, in terms of dilation, the lower extinguisher is the most dilated, followed by the upper extinguisher, then the exit sign.(ii)The exit sign and the upper extinguisher have regular shapes, while the lower extinguisher has an irregular shape. We have already underlined in Section 1 and that the RANGE-MBR method is sensitive to regular forms. This example then illustrates the situation of Case 6.

For all these reasons, the annotation of the lower extinguisher by the RANGE-MBR method is the best.

7.7. Experiment 7

This experiment highlights the importance of the annotation method on the calculation of the IoU. Figure 14 corresponds to image from the DOTA dataset. The objects of interest in these images are the airplanes. We use the BBAVectors detector [6] to generate the red prediction bounding boxes. Ground truth bounding boxes used in [6] are colored green. The blue rectangles correspond to the annotations of the airplanes by the RANGE-MBR method. For all , we denote by (, respectively) the Intersection over Union between the green (blue, respectively) rectangle enclosing the airplane and the corresponding red rectangle. The results of this experiment are reported in Table 5. The use of minimum ground truth rectangles best reflects the performance of an object detector. In view of the results of Table V, it is reasonable to rely on the results of the second row rather than those of the first row.

8. Conclusion

In this piece, we provide a novel approach that is both resilient and semi-automatic in nature for the annotation of objects. According to the results of the experimental research, we are able to assert that(1)Robust semi-automatic annotation is quasi-optimal in the sense that Definition 2 describes, and that its optimality increases with dilated objects, which is the case for the majority of the objects that appear on the images that were collected by the use of aerial photography.(2)Robust semi-automatic annotation is quick and reliable in the sense that the bounding rectangle does not have any gaps, and is oblivious to the skill of the person doing the annotation.(3)Robust semi-automatic annotation is simple to put into action and might be easily incorporated into platforms for annotating text.(4)Robust semi-automatic annotation is sensitive to the annotation of items displaying symmetry, with respect to one or more directions. Given this scenario, it is not appropriate for the relevant points to follow the same symmetry in order for the created rectangle to be somewhat close to the ideal rectangle.

Thqe critical angle, as described in Section 6, would look somewhat like this: the emphasis on experimental research using large-scale collected data on horizontal and oriented object datasets, as well as detectors of state-of-the-art oriented objects. Once a good estimate of the threshold angle has been made, the robust semi-automatic annotation technique is the one that allows the simultaneous construction, from the same image library, of two different data sets: one for the elements horizontal and another for oriented objects.

Appendix

A. [Script. 1: Collect , draw RC-MBR and RANGE-MBR ] Here, we assume that we have already implemented the RANGE-MBR and RC-MBR functions, in the MATLAB language. The following scripts(1)Read the image and then display it(2)Ask the annotator to click on the most relevant points of the object ( is a matrix)(3)Determine the rectangle corresponding to RC-MBR (4)Determine the rectangle corresponding to RANGE-MBR (5)Draw the two rectangles with the colors red and green, respectively

Data Availability

The data used to support the findings of this study are included within the article.

Disclosure

There is a basic version [45] of this study accessible at https://www.researchsquare.com/article/rs-860574/v1. The authors confirm that this study has not been published elsewhere.

Conflicts of Interest

The author declares that there are no conflicts of interest.

References

K. He, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, IEEE, Venice, Italy, October 2017.
View at: Google Scholar
Y. Bai, “Coke: localized contrastive learning for robust keypoint detection,” 2020, https://arxiv.org/abs/2009.14115.
View at: Google Scholar
P. Hurtik, “Poly-yolo: higher speed, more precise detection and instance segmentation for yolov3,” 2020, https://arxiv.org/abs/2005.13243.
View at: Google Scholar
A. Ullah, “Mathematical problems in engineering. experimental and numerical research of paved microcrack using histogram equalization for detection and segmentation,” Mathematical Problems in Engineering, vol. 2022, Article ID 2684983, 13 pages, 2022.
View at: Publisher Site | Google Scholar
J. Iqbal, M. A. Munir, A. Mahmood, A. R. Ali, and M. Ali, “Leveraging orientation for weakly supervised object detection with application to firearm localization,” Neurocomputing, vol. 440, pp. 310–320, 2021.
View at: Publisher Site | Google Scholar
J. Yi, “Oriented object detection in aerial images with box boundary-aware vectors,” 2020, https://arxiv.org/abs/2008.07043.
View at: Google Scholar
J. Ding, “Learning roi transformer for detecting oriented objects in aerial images,” in Proceedings of the The IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, CA, USA, June 2019.
View at: Google Scholar
S. Azimi, “Towards multi-class object detection in unconstrained remote sensing imagery,” in Proceedings of the Asian Conference on Computer Vision, pp. 150–165, Perth, Australia, 2018.
View at: Google Scholar
J. Ma, W. Shao, H. Ye et al., “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
View at: Publisher Site | Google Scholar
X. Yang, “R2cnn++: multi-dimensional attention based rotation invariant detector with robust anchor strategy,” 2018, https://arxiv.org/pdf/1811.07126.
View at: Google Scholar
Y. Liu, “Deep matching prior network: toward tighter multi-oriented text detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1962–1969, IEEE, Honolulu, HI, USA, July 2017.
View at: Google Scholar
X. Liu, Q. Yu, and J. Yan, “Fots: fast oriented text spotting with a unified network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5676–5685, IEEE, Salt Lake City, UT, USA, June 2018.
View at: Google Scholar
L. Li, Z. Zhou, B. Wang, L. Miao, and H. Zong, “A novel cnn-based method for accurate ship detection in hr optical remote sensing images via rotated bounding box,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 1, pp. 686–699, 2021.
View at: Publisher Site | Google Scholar
W. He, Y. Fei, and L. Cheng-Lin, “Deep direct regression for multi-oriented scene text detection,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 745–753, IEEE, Venice, Italy, October 2017.
View at: Google Scholar
K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: a survey and a new benchmark,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 296–307, 2020.
View at: Publisher Site | Google Scholar
B. Adhikari, P. Jussi, and H. Heikki, “Faster bounding box annotation for object detection in indoor scenes,” in Proceedings of the European Workshop on Visual Information Processing, pp. 1–6, London, UK, 2018.
View at: Google Scholar
T. Lin, M. Michael, B. Serge et al., “Microsoft coco: common objects in context,” European Conference on Computer Vision, Springer, Cham, New York, NY, USA, 2014.
View at: Google Scholar
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes challenge: a retrospective,” International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, 2015.
View at: Publisher Site | Google Scholar
O. Russakovsky, J. Deng, H. Su et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
View at: Publisher Site | Google Scholar
G. Xia, D. Xiang, Z. Jian, and B. Zhen, “Dota: a large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, UT, USA, June 2018.
View at: Google Scholar
K. Chen and M. Wu, “Fgsd: a dataset for fine-grained ship detection in high resolution satellite images,” 2020, https://arxiv.org/abs/2003.06832.
View at: Google Scholar
Z. Liu, H. Wang, L. Weng, and Y. Yang, “Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 8, pp. 1074–1078, 2016.
View at: Publisher Site | Google Scholar
H. Zhu, X. Chen, W. Dai, K. Fu, and Q. Ye, “Orientation robust object detection in aerial images using deep convolutional neural network,” IEEE International Conference on Image Processing, vol. 13, pp. 3735–3739, 2015.
View at: Google Scholar
K. Liu and G. Mattyus, “Fast multiclass vehicle detection on aerial images,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 9, pp. 1938–1942, 2015.
View at: Publisher Site | Google Scholar
S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: a small target detection benchmark,” Journal of Visual Communication and Image Representation, vol. 34, pp. 187–203, 2016.
View at: Publisher Site | Google Scholar
C. Benedek, X. Descombes, and J. Zerubia, “Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 1, pp. 33–50, 2012.
View at: Publisher Site | Google Scholar
J. Ding, B. Xiang, Y. Wen et al., “Object detection in aerial images: a large-scale benchmark and challenges,” 2021, https://arxiv.org/abs/2102.12219.
View at: Google Scholar
A. Zaïdi, “Accurate IoU computation for rotated bounding boxes in $${\mathbb {R}}^2$$ and $${\mathbb {R}}^3$$,” Machine Vision and Applications, vol. 32, no. 6, 2021.
View at: Publisher Site | Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards realtime object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
View at: Publisher Site | Google Scholar
H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, and I. Reid, “Generalized intersection over union: a metric and a loss for bounding box regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 658–666, IEEE, Long Beach, CA, USA, June 2019.
View at: Google Scholar
L. Liu, Z. Pan, and B. Lei, “Learning a rotation invariant detector with rotatable bounding box,” 2017, https://arxiv.org/abs/1711.09405.
View at: Google Scholar
C. Yao, B. Xiang, and L. Wenyu, “Detecting texts of arbitrary orientations in natural images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090, IEEE, Providence, RI, USA, June 2012.
View at: Google Scholar
H. Freeman and R. Shapira, “Determining the minimum-area encasing rectangle for an arbitrary closed curve,” Communications of the ACM, vol. 18, no. 7, pp. 409–413, 1975.
View at: Publisher Site | Google Scholar
M. Shamos, “Computational geometry,” Yale University, New Haven, CT, USA, 1978, Ph.D. thesis.
View at: Google Scholar
H. Toussaint, “Solving geometric problems with the rotating calipers,” in Proceedings of the IEEE MELECON, Canada, CA, USA, May 1983.
View at: Google Scholar
D. Dimitrov, C. Knauer, K. Kriegel, and G. Rote, “On the bounding boxes obtained by principal component analysis,” in Proceedings of the European Workshop on Computational Geometry, pp. 193–1196, March 2006.
View at: Google Scholar
D. Dimitrov, C. Knauer, K. Kriegel, and G. Rote, “Bounds on the quality of the pca bounding boxes,” Computational Geometry, vol. 42, no. 8, pp. 772–789, 2009.
View at: Publisher Site | Google Scholar
A. Ramirez and C. Cox, “Improving on the range rule of thumb,” Rose Hulman Undergraduate Mathematics Journal, vol. 13, no. 2, 2012.
View at: Google Scholar
S. P. Hozo, B. Djulbegovic, and I. Hozo, “Estimating the mean and variance from the median, range, and the size of a sample,” BMC Medical Research Methodology, vol. 5, no. 1, 2005.
View at: Publisher Site | Google Scholar
Z. Yi, K. Zhao, J. Sun, L. Wang, K. Wang, and Y. Ma, “Prediction of the remaining useful life of supercapacitors,” Mathematical Problems in Engineering, vol. 2022, Article ID 7620382, 8 pages, 2022.
View at: Publisher Site | Google Scholar
M. Elloumi, H. Gassara, and O. Naifar, “An overview on modelling of complex interconnected nonlinear systems,” Mathematical Problems in Engineering, vol. 2022, Article ID 4789405, 18 pages, 2022.
View at: Publisher Site | Google Scholar
H. Sun, J. Sun, K. Zhao, L. Wang, and K. Wang, “Data-driven ica-bi-lstm-combined lithium battery soh estimation,” Mathematical Problems in Engineering, vol. 2022, Article ID 9645892, 8 pages, 2022.
View at: Publisher Site | Google Scholar
X. Wan, W. Wang, J. Liu, and T. Tong, “Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range,” BMC Medical Research Methodology, vol. 14, no. 1, 2014.
View at: Publisher Site | Google Scholar
D. Papadopoulos, J. R. R. Uijlings, F. Keller, and V. Ferrari, “Extreme clicking for efficient object annotation,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4940–4949, IEEE, Venice, Italy, 2017.
View at: Google Scholar
A. Zaïdi, “Robust semi-automatic annotation of object data sets with bounding rectangles,” 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Abdelhamid Zaidi. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

174

Downloads

276

Citations