Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications

Correia, Carlos A. M.; Andrade, Fabio A. A.; Sivertsen, Agnar; Guedes, Ihannah Pinto; Pinto, Milena Faria; Manhães, Aline Gesualdi; Haddad, Diego Barreto

doi:10.3390/s22020604

Open AccessArticle

Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications

¹

Federal Center of Technological Education of Rio de Janeiro (CEFET/RJ), Rio de Janeiro 20271-110, Brazil

²

NORCE Norwegian Research Centre, 5838 Bergen, Norway

³

Department of Microsystems, Faculty of Technology, Natural Sciences and Maritime Sciences, University of South-Eastern Norway (USN), 3184 Borre, Norway

^*

Author to whom correspondence should be addressed.

Sensors 2022, 22(2), 604; https://doi.org/10.3390/s22020604

Submission received: 25 November 2021 / Revised: 6 January 2022 / Accepted: 11 January 2022 / Published: 13 January 2022

(This article belongs to the Section Vehicular Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Optical image sensors are the most common remote sensing data acquisition devices present in Unmanned Aerial Systems (UAS). In this context, assigning a location in a geographic frame of reference to the acquired image is a necessary task in the majority of the applications. This process is denominated direct georeferencing when ground control points are not used. Despite it applies simple mathematical fundamentals, the complete direct georeferencing process involves much information, such as camera sensor characteristics, mounting measurements, attitude and position of the UAS, among others. In addition, there are many rotations and translations between the different reference frames, among many other details, which makes the whole process a considerable complex operation. Another problem is that manufacturers and software tools may use different reference frames posing additional difficulty when implementing the direct georeferencing. As this information is spread among many sources, researchers may face difficulties on having a complete vision of the method. In fact, there is absolutely no paper in the literature that explain this process in a comprehensive way. In order to supply this implicit demand, this paper presents a comprehensive method for direct georeferencing of aerial images acquired by cameras mounted on UAS, where all required information, mathematical operations and implementation steps are explained in detail. Finally, in order to show the practical use of the method and to prove its accuracy, both simulated and real flights were performed, where objects of the acquired images were georeferenced.

Keywords:

direct georeferencing; UAS; pinhole camera model

1. Introduction

Currently, optical image sensors are ubiquitous in Unammned Aerial Systems (UAS). These acquisition devices can be RGB, infrared or hyperspectral cameras, for example. There are a variety of applications using such sensors, e.g., in the fields of inspection [1], mapping [2], search and rescue [3], tracking [4], border patrol [5], sea surveillance [6], agriculture [7], and even recreation [8].

Traditionally, the georeferencing of aerial images was performed with the use of Ground Control Points (GCP) [9,10], which are marked points on the ground that have a known geographic location. However, the need of GCP imposes a considerable limitation in such operations because most of the time the operator cannot place and measure the GCP prior to the mission. For example, in [11], the authors developed a large-scale video fast geo-registration method for forest fire monitoring, using high-resolution camera mounted in UAS. The developed system uses selected terrain points as references, called “landmarks ground control points”, LGCPs, whose positions and geographic coordinates are known from official geographic maps and loaded in the system, in order to precede the obtained image georeferencing. Despite the method may be satisfactory for land monitoring, it is limited for terrains where the mapping was not accurate or known, as well in case there will be no possibility to chose LGCP, as deserts or maritime regions, for example.

The process of georeferencing aerial images without the use of GCPs is called Direct Georeferencing. This process relies on using information about the position, attitude, image sensor and lens characteristics to calculate the position of the aerial image object in the geographic frame. This involves many challenges, such as that there is no range information available, i.e., direct distance between the camera and the target.

One may say that this could be simply solved by equipping the UAS with laser scanners or stereo cameras. However, these sensors are costly for long ranges and would also increase the weight of the UAS payload, reducing its endurance.

In the literature, there are some works that perform direct georeferencing. Hemerly [12] has the merit of being one of the main references in the field, bringing a brief explanation of the direct georeferencing process, which is tested in a real UAS flight; the paper also mentions the camera calibration process. Leira et al. [13] implements a light-weight thermal camera for small fixed-wing UAS, with a georeferencing algorithm. The georeferencing process is explained and a real flight test is performed, where eighty images are georeferenced.

In [14], the authors proposed a real time direct georeferencing system for maritime targets, employing a thermal camera mounted on an UAS. The system is tested for a static target and for a moving vessel. All the steps of the georeferencing process are discussed. The algorithm is later used in [4] to calculate the target’s position for further tracking in two of the four tracking solutions presented with the use of UAS. Real flight tests were taken to evaluate the solutions. In [15], a rapid aerial image georeferencing method for UAS for agricultural activities was implemented. Low cost Inertial Measurement Unit (IMU) sensors, Global Positioning System (GPS) and sensor fusion techniques were employed, and GCPs were not used. The method obtains the georeferenced position by treating navigation data and the camera lens distortion model. Results were compared with sixteen positions given by GCP and the accuracy was considered satisfactory for agriculture applications.

In [16], the authors implement a method for estimate the coordinates of a target from the image obtained by a camera, mounted in a small fixed-wing UAS, given the UAS position and altitude, and the camera posed angles. The georeferencing calculations are similar to those presented in our research, which are detailed and fully explained here. The paper also brings four techniques to reduce the estimated georeferenced position error.

As observed in the previously cited works, although the method of direct georeferencing of images taken by cameras mounted on UAS demands simple mathematical concepts, all of them in linear algebra theory, the entire process involves many other details, resulting in a complex and tricky operation. Therefore, a researcher may face it difficult to find and comprehend these details, as they are treated separately and pulverized in many works in the scientific literature, increasing the risk of making small mistakes, which lead to big errors in the results. In fact, there is absolutely no paper in the literature that explain the process in a comprehensive way.

Therefore, this paper adds to the field by detailing the complete implementation of the direct georeferencing process, clarifying the concepts, and the reasons behind the choices. The method is tested both in real and simulated environments, as there has been significant growth on the use of simulation tools for UAS applications. In addition, the determination of the

z_{C}

factor, briefly, or imprecisely, mentioned or not mentioned at all by the other works, is completely elucidated.

Another contribution of this paper is the inclusion of platform mounting measurements (lever arms) in the calculations. This is often neglected in the UAS community. Since the lever arms are in the range of centimeters, they are usually not considered. However, this can result on significant errors for high resolution sensors, or larger UAS.

Finally, the challenge that this work tries to overcome is the lack of centralized and standardized information about the methods to perform the georeferencing of aerial images taken by cameras mounted on UAS. It is important to well understand the methods because there are many different frames involved in the process and the tools provided by sensors and systems manufacturers may use different frames and axis orientations. Therefore, it is important to understand the mathematical fundamentals and the implementation steps to be able to consciously make the required conversions and adaptations.

2. Materials and Methods

In this section, the definition of direct georeferencing, as well as its applications will be presented. In addition, the most important mathematical tools used in the georeferencing, such as rotations and aircraft maneuver angles, will be formulated, in order to guarantee a self-contained material. Finally, the implementation steps will be described.

2.1. Direct Georeferencing

Georeferencing is the process of assigning to an object or target a location in a geographic frame of Reference [17]. In this paper, the objects of interest, or the targets, are initially available in an aerial image acquired by a monocular camera mounted on an UAS. The goal is, therefore, to calculate the location of the object in a geographic frame, such as the Universal Transverse Mercator frame or World Geodetic System 1984 frame. To perform these calculations, information about the camera sensor, mounting characteristics, and attitude and position of the UAS in the geographic frame is used, among other information. Considering that this process does not involve Ground Control Points, it is defined as Direct Georeferencing.

2.2. Mathematical Fundamentals

In this section, the mathematical support used in the direct georeferencing equations are presented. These comprises of Linear Algebra concepts, such as rotation matrices, and orientation angles fundamentals.

2.2.1. Rotation Matrix

Figure 1 shows two different coordinates systems, represented by the orthogonal reference frames

X Y Z

and

X^{'} Y^{'} Z^{'}

. It also presents the angles between axis

X^{'}

and each axis of the

X Y Z

frame, which will be used in the coordinates transformations between the frames. The others angles that each axis forms with the axes of the other frame are not shown in Figure 1 for the didactic purposes.

From Figure 1, it can be seen that the unit basis vectors of

X^{'}

Y^{'}

and

Z^{'}

axes (

\hat{i^{'}}

,

\hat{j^{'}}

and

\hat{k^{'}}

) projections in the

X Y Z

frame are given by the following Equations [18]:

\hat{i^{'}} = cos A_{X^{'} X} \hat{i} + \cos A_{X^{'} Y} \hat{j} + \cos A_{X^{'} Z} \hat{k},

(1)

\hat{j^{'}} = \cos A_{Y^{'} X} \hat{i} + \cos A_{Y^{'} Y} \hat{j} + \cos A_{Y^{'} Z} \hat{k},

(2)

\hat{k^{'}} = \cos A_{Z^{'} X} \hat{i} + \cos A_{Z^{'} Y} \hat{j} + \cos A_{Z^{'} Z} \hat{k},

(3)

where

A_{{axis}^{'} - axis}

are the angles between the axis of the two frames, and

{cos A}_{{axis}^{'} - axis}

denotes the direction cosines of the vector on the axis.

These equations can be written in the matrix form as:

[\begin{matrix} \hat{i^{'}} \\ \hat{j^{'}} \\ \hat{k^{'}} \end{matrix}] = [\begin{matrix} \cos A_{X^{'} X} \hat{i} & \cos A_{X^{'} Y} \hat{j} & \cos A_{X^{'} Z} \hat{k} \\ \cos A_{Y^{'} X} \hat{i} & \cos A_{Y^{'} Y} \hat{j} & \cos A_{Y^{'} Z} \hat{k} \\ \cos A_{Z^{'} X} \hat{i} & \cos A_{Z^{'} Y} \hat{j} & \cos A_{Z^{'} Z} \hat{k} \end{matrix}] = R_{X Y Z \to X^{'} Y^{'} Z^{'}} [\begin{matrix} \hat{i} \\ \hat{j} \\ \hat{k} \end{matrix}],

(4)

where the matrix

R_{X Y Z \to X^{'} Y^{'} Z^{'}}

is known as the Directional Cosine Matrix (DCM), or the rotation matrix, from the

X Y Z

frame to the

X^{'} Y^{'} Z^{'}

frame, which is given by:

R_{X Y Z \to X^{'} Y^{'} Z^{'}} = [\begin{matrix} \cos A_{X^{'} X} & \cos A_{X^{'} Y} & \cos A_{X^{'} Z} \\ \cos A_{Y^{'} X} & \cos A_{Y^{'} Y} & \cos A_{Y^{'} Z} \\ \cos A_{Z^{'} X} & \cos A_{Z^{'} Y} & \cos A_{Z^{'} Z} \end{matrix}] .

(5)

Now, let V be a vector in the frame

X Y Z

:

V = a \hat{i} + b \hat{j} + c \hat{k} = [a b c] [\begin{matrix} \hat{i} \\ \hat{j} \\ \hat{k} \end{matrix}] .

(6)

Therefore, the vector

V^{'}

, which is the representation of the vector V on the frame

X^{'} Y^{'} Z^{'}

can be obtained by:

V^{'} = [\begin{matrix} a^{'} \hat{i} \\ b^{'} \hat{j} \\ c^{'} \hat{k} \end{matrix}] = R_{X Y Z \to X^{'} Y^{'} Z^{'}} V = [\begin{matrix} \cos A_{X^{'} X} & \cos A_{X^{'} Y} & \cos A_{X^{'} Z} \\ \cos A_{Y^{'} X} & \cos A_{Y^{'} Y} & \cos A_{Y^{'} Z} \\ \cos A_{Z^{'} X} & \cos A_{Z^{'} Y} & \cos A_{Z^{'} Z} \end{matrix}] [\begin{matrix} a \hat{i} \\ b \hat{j} \\ c \hat{k} \end{matrix}] .

(7)

or:

V^{'} = R_{X Y Z \to X^{'} Y^{'} Z^{'}} V .

(8)

To perform the inverse transformation, that is, from the

X^{'} Y^{'} Z^{'}

frame to

X Y Z

frame, the rotation matrix

R_{X^{'} Y^{'} Z^{'} \to X Y Z}

is given by:

R_{X^{'} Y^{'} Z^{'} \to X Y Z} = [\begin{matrix} \cos A_{X X^{'}} & \cos A_{X Y^{'}} & \cos A_{X Z^{'}} \\ \cos A_{Y X^{'}} & \cos A_{Y Y^{'}} & \cos A_{Y Z^{'}} \\ \cos A_{Z X^{'}} & \cos A_{Z Y^{'}} & \cos A_{Z Z^{'}} \end{matrix}] .

(9)

Considering the congruence of the angles between axis of the two frames, given by:

A_{X^{'} X} = A_{X X^{'}}; A_{X^{'} Y} = A_{Y X^{'}}; A_{X^{'} Z} = A_{Z X^{'}},

(10)

A_{Y^{'} X} = A_{X Y^{'}}; A_{Y^{'} Y} = A_{Y Y^{'}}; A_{Y^{'} Z} = A_{Z Y^{'}},

(11)

A_{Z^{'} X} = A_{X Z^{'}}; A_{Z^{'} Y} = A_{Y Z^{'}}; A_{Z^{'} Z} = A_{Z Z^{'}},

(12)

one may conclude that:

\begin{matrix} R_{X^{'} Y^{'} Z^{'} \to X Y Z} & = & [\begin{matrix} \cos A_{X X^{'}} & \cos A_{X Y^{'}} & \cos A_{X Z^{'}} \\ \cos A_{Y X^{'}} & \cos A_{Y Y^{'}} & \cos A_{Y Z^{'}} \\ \cos A_{Z X^{'}} & \cos A_{Z Y^{'}} & \cos A_{Z Z^{'}} \end{matrix}] \end{matrix}

(13)

\begin{matrix} = & [\begin{matrix} \cos A_{X^{'} X} & \cos A_{Y^{'} X} & \cos A_{Z^{'} X} \\ \cos A_{X^{'} Y} & \cos A_{Y^{'} Y} & \cos A_{Z^{'} Y} \\ \cos A_{X^{'} Z} & \cos A_{Y^{'} Z} & \cos A_{Z^{'} Z} \end{matrix}] = R_{X Y Z \to X^{'} Y^{'} Z^{'}}^{⊺}, \end{matrix}

(14)

or:

R_{X^{'} Y^{'} Z^{'} \to X Y Z} = R_{X Y Z \to X^{'} Y^{'} Z^{'}}^{⊺} .

(15)

So, given the rotation matrix from one frame, said

X Y Z

, to another, said

X^{'} Y^{'} Z^{'}

, in order to perform the inverse transformation, the new rotation matrix will be given by its transpose matrix:

V^{'} = R_{X Y Z \to X^{'} Y^{'} Z^{'}} V

(16)

V = R_{X^{'} Y^{'} Z^{'} \to X Y Z} V^{'} = R_{X Y Z \to X^{'} Y^{'} Z^{'}}^{⊺} V^{'} .

(17)

It can also be demonstrated that, in the specific case of rotation matrices, the following statement is true:

R_{X Y Z \to X^{'} Y^{'} Z^{'}}^{⊺} R_{X Y Z \to X^{'} Y^{'} Z^{'}} = R_{X^{'} Y^{'} Z^{'} \to X Y Z}^{⊺} R_{X^{'} Y^{'} Z^{'} \to X Y Z} = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] = I,

(18)

where I canonically represents the identify matrix.

2.2.2. Euler Angles Sequence for Aerodynamics: Yaw, Pitch and Roll

There is a mathematical method to transform coordinates from one frame to another, that is, to perform the rotations of vectors between two frames. This technique is known as Euler angles sequence, and consists to choose a sequence of three angles to project one frame into another. There are many possibilities for these sequences. In aerodynamics, a specific angles sequence is more convenient to use, because its relationship with the aircraft movements: yaw, pitch and roll.

The yaw movement is the turn to the left or right of the nose of the airplane, provoked by the action of the rudder. In other words, it is a rotation in the axis normal to the plane formed by the wings and the fuselage. The pitch movement consists of turning the nose up or down, and it is generated by the movement of the elevators, located in the tail of the airplane. This is equivalent to a rotation in an axis along the wings. Finally, the roll movement is the rotation along the airplane longitudinal axis, which is the axis along the fuselage. This maneuver is provided by the ailerons, placed at the rear of the wings [19]. Figure 2 shows the yaw, pitch and roll airplane maneuvers, with their respective axes.

As seen in Figure 2, the axes are canonically attributed, respectively, to Z axis, for the yaw, Y axis for the pitch and X axis for the roll movements. Its positive rotations are given by the right hand law, with the thumb aligned with the arrow of the axis.

Figure 3 shows the Euler angles sequence for aerodynamics [18]. Figure 3a shows the first rotation around the Z axis, by the yaw angle

ψ

, which carries the frame

X Y Z

to the frame

X_{1} Y_{1} Z_{1}

, where the axis Z is congruent to

Z_{1}

; Figure 3b presents the second rotation, around the Y axis, by the pitch angle

θ

, moving the frame

X_{1} Y_{1} Z_{1}

to the frame

X_{2} Y_{2} Z_{2}

, where the axis

Y_{1}

is congruent to

Y_{2}

; Figure 3c shows the third and last rotation, in the X axis, by the roll angle

ϕ

, rotating from the

X_{2} Y_{2} Z_{2}

reference frame to the final

X^{'} Y^{'} Z^{'}

frame. The notations

ψ

,

θ

and

ϕ

for the Euler angles yaw, pitch and roll, respectively, are canonical in the literature and, therefore, are employed in this work.

The rotation matrix for the Euler sequence yaw, pitch and roll angles,

R_{Y P R}

, is given by [18]:

R_{Y P R} = [\begin{matrix} \cos ψ \cos θ & \sin ψ \cos θ & - \sin θ \\ \cos ψ \sin θ \sin ϕ - \sin ψ \cos θ & \sin ψ \sin θ \sin ϕ + \cos ψ \cos ϕ & \cos θ \sin ϕ \\ \cos ψ \sin θ \cos ϕ + \sin ψ \sin ϕ & \sin ψ \sin θ \cos ϕ - \cos ψ \sin ϕ & \cos θ \cos ϕ \end{matrix}] .

(19)

The Euler angles values are given by:

\tan ψ = \frac{\cos A_{X Y^{'}}}{\cos A_{X X^{'}}}

(20)

\sin θ = - \cos A_{X Z^{'}}

(21)

\tan ϕ = - \frac{\cos A_{Y Z^{'}}}{\cos A_{Z Z^{'}}}

(22)

2.3. Implementation

This section presents each step of the direct georeferencing process, which basically consists of successive rotations between the reference systems, or frames, involved in the chain.

2.3.1. Image Frame to Camera Frame

Photographic cameras are devices that are able to record images from the three-dimensional world in bidimensional surfaces, or the image planes. Their construction is very simple, consisting of a dark box with a small aperture; the photons, reflected on the object surface in the three-dimensional world, will pass through the aperture and will form the image on the opposite surface of the dark box, where an optical sensor may be placed. Figure 4 shows this scheme.

The system can be mathematically modelled by introducing a coordinate reference in the camera aperture, called Camera Frame (C), where the object position is plotted in coordinates

x_{C}

,

y_{C}

,

z_{C}

, and another reference frame, the Image Frame (i), placed on the image plane, or focal plane, where the image position is plotted in the coordinates (

x_{i}

,

y_{i}

). Figure 5 shows the model.

This model is known as the Pinhole Camera Model. To make easier mathematical operations, a geometrically equivalent construction is obtained by mirroring the image plane, or the projection plane, to place it on the other side of the Camera Frame. This simple procedure allows to work only with positive values of the axes. Figure 6 demonstrates the scheme.

From Figure 6a the relation between the coordinates

x_{i}

, in the Image frame (i), and

x_{C}

, in the Camera frame (C), is obtained, where f is the focal distance.

\frac{x_{i}}{f} = \frac{x_{C}}{z_{C}} ∴ x_{i} = f \frac{x_{C}}{z_{C}} .

(23)

Furthermore, from Figure 6b:

\frac{y_{i}}{f} = \frac{y_{C}}{z_{C}} ∴ y_{i} = f \frac{y_{C}}{z_{C}} .

(24)

It is important to note that these equations are only valid for rectified images. If the image is not rectified, a rectifying process is needed prior to the use of these equations.

Figure 7 shows the same scheme in a more detailed perspective view. Here, the Image Frame is oriented in the two axis U and V:

u - c_{x} = x_{i},

(25)

v s . - c_{y} = y_{i},

(26)

where

c_{x}

and

c_{y}

are the coordinates of the principal point of the image.

Therefore, combining Equations (25) and (26) with Equations (23) and (24) leads to:

u = f \frac{x_{C}}{z_{C}} + c_{x},

(27)

v s . = f \frac{y_{C}}{z_{C}} + c_{y} .

(28)

In order to allow matrices operations, such as rotations and translations (the rigid transformations), which will be necessary in posteriors calculations, homogeneous coordinates will be employed. Therefore, Equations (27) and (28) take the following form:

z_{C} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} f & 0 & c_{x} \\ 0 & f & c_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{C} \\ y_{C} \\ z_{C} \end{matrix}] .

(29)

The coordinates u and v in the Image Frame i are conveniently used in pixels unit. The object coordinates in the Camera Frame (C) are usually given in meters. Therefore, naturally, the focal distance f and the coordinates of the principal point (

c_{x}

and

c_{y}

) should be given in pixels.

The focal distance (f) is usually given in millimeters. Therefore, to convert to pixels, multiply by the Image Size in pixels and divide by the Sensor Size in millimeters:

f [px] = f [mm] \frac{Image Size [px]}{Sensor Size [mm]}

(30)

In case of different pixel resolutions for x and for y dimensions, Equation (29) will have the form:

z_{C} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] [\begin{matrix} x_{C} \\ y_{C} \\ z_{C} \end{matrix}] .

(31)

Therefore,

f_{x}

and

f_{y}

in pixels should be calculated based on the focal length (f in millimeters) image dimensions (ImageWidth and ImageHeight in pixels) and sensor dimensions (SensorWidth and SensorHeight in millimeters):

f_{x} = f \frac{ImageWidth}{SensorWidth},

(32)

f_{y} = f \frac{ImageHeight}{SensorHeight} .

(33)

Equation (31) can also be written in the form:

z_{C} P_{i} = K P_{C},

(34)

where

P_{i} = {[u, v, 1]}^{⊺}

is the vector of the object’s coordinates in the Image Frame,

P_{C} = {[x_{C}, y_{C}, z_{C}]}^{⊺}

is the vector of the object’s coordinates in the Camera Frame, and K is the intrinsic parameters camera matrix, also referred as the calibration matrix, as it can be obtained during the camera calibration process.

K = [\begin{matrix} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{matrix}] .

(35)

Equations (31) and (34) can be rewritten as:

z_{C} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = K [\begin{matrix} x_{C} \\ y_{C} \\ z_{C} \end{matrix}] .

(36)

As the objective is to obtain the object’s metric coordinates in the camera frame from the object’s pixel coordinates in the image frame, Equation (36) then becomes:

[\begin{matrix} x_{C} \\ y_{C} \\ z_{C} \end{matrix}] = K^{- 1} z_{C} [\begin{matrix} u \\ v \\ 1 \end{matrix}] .

(37)

Considering that

z_{C}

is unknown, in order to resolve the equation, it will be incorporated to

P_{C}

, generating the

P_{C}^{'}

vector:

P_{C}^{'} = [\begin{matrix} \frac{1}{z_{C}} x_{C} \\ \frac{1}{z_{C}} y_{C} \\ \frac{1}{z_{C}} z_{C} \end{matrix}] = [\begin{matrix} x_{C}^{'} \\ y_{C}^{'} \\ 1 \end{matrix}] = K^{- 1} [\begin{matrix} u \\ v \\ 1 \end{matrix}] .

(38)

As will be further demonstrated,

z_{C}

will be easily determined at the end of the whole process, when the

P_{C}^{'}

coordinates (

x_{C}^{'}

and

y_{C}^{'}

), will be determined (as well the

P_{C}

coordinates,

x_{C}

and

y_{C}

). Until there, however, the process will follow using

P_{C}^{'}

.

2.3.2. Camera Frame to Gimbal Frame

The next step is the coordinates transformation from the Camera Frame (C) to the Gimbal Frame (G). The Gimbal Frame has its X axis,

X_{G}

, parallel to Z axis of the Camera Frame,

Z_{C}

;

Y_{G}

, the Y axis in Gimbal Frame, is parallel to the X axis of the Camera Frame,

X_{C}

;

Z_{G}

, the Z axis in the Gimbal Frame is parallel to the Y axis of the Camera Frame,

Y_{C}

. Figure 8 shows the Gimbal and the Camera frames.

The rotation matrix from the Camera Frame to the Gimbal Frame,

R_{C \to G}

, can be obtained throughout the projection of the Gimbal Frame in the Camera Frame, being given by:

R_{C \to G} = [\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix}] .

(39)

Therefore, the object’s position in the Gimbal Frame prime,

P_{G}^{'}

, is obtained by:

P_{G}^{'} = R_{C \to G} P_{C}^{'},

(40)

or:

[\begin{matrix} x_{G}^{'} \\ y_{G}^{'} \\ z_{G}^{'} \end{matrix}] = [\begin{matrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{matrix}] [\begin{matrix} x_{C}^{'} \\ y_{C}^{'} \\ z_{C}^{'} \end{matrix}] .

(41)

In cases where there is a difference in the position of the origin of the frames, as shown in Figure 8, a translation vector from the Camera Frame to the Gimbal Frame (

T_{C \to G}

) must be inserted. The object’s position in the Gimbal Frame prime will be then given by:

P_{G}^{'} = R_{C \to G} P_{C}^{'} + T_{C \to G} .

(42)

2.3.3. Gimbal Frame to UAS Frame

The next step is the object’s coordinates transformation from Gimbal Frame to the UAS Frame, which is centered on the UAS center of gravity, as shown in Figure 9.

The transformation will require the Direction Cosine Matrix (DCM) from the Gimbal Frame to the UAS Frame,

R_{G \to U A S}

, given by the Euler angles yaw, pitch and roll. The operation is given by:

P_{U A S}^{'} = R_{G \to U A S} P_{G}^{'} + T_{G \to U A S},

(43)

where

T_{G \to U A S}

is the translation vector of the Gimbal Frame in the UAS Frame.

In a more complete form, Equation (43) may be written as:

\begin{matrix} P_{U A S}^{'} & = & R_{G \to U A S} P_{G}^{'} + T_{G \to U A S} \end{matrix}

(44)

\begin{matrix} = & R_{G \to U A S} (R_{C \to G} P_{C}^{'} + T_{C \to G}) + T_{G \to U A S} \end{matrix}

(45)

\begin{matrix} = & R_{G \to U A S} R_{C \to G} P_{C}^{'} + R_{G \to U A S} T_{C \to G} + T_{G \to U A S} \end{matrix}

(46)

2.3.4. UAS Frame to NED Frame

The following step is transform the object’s coordinates from the UAS Frame to the NED Frame (North-East-Down Frame). Figure 10 presents the scheme, also considering a translation vector

T_{U A S \to N E D}

, which is the position of the origin of the UAS frame in the NED frame, if there is a difference between the origins of the reference systems.

To perform the coordinates transformation, the DCM between the frames,

R_{U A S \to N E D}

, must be employed, and so, the Euler angles yaw, pitch and roll must be known. The pixel position in the NED frame prime,

P_{N E D}^{'}

, is given by:

P_{N E D}^{'} = R_{U A S \to N E D} P_{U A S}^{'} + T_{U A S \to N E D} .

(47)

Substituting Equation (44) in Equation (47), in order to obtain a more complete form:

\begin{matrix} P_{N E D}^{'} & = & R_{U A S \to N E D} (R_{G \to U A S} R_{C \to G} P_{C}^{'} + R_{G \to U A S} T_{C \to G} + T_{G \to U A S}) + T_{U A S \to N E D} \\ = & R_{U A S \to N E D} R_{G \to U A S} R_{C \to G} {P^{'}}_{C} + R_{U A S \to N E D} R_{G \to U A S} T_{C \to G} \\ + R_{U A S \to N E D} T_{G \to U A S} + T_{U A S \to N E D} \end{matrix}

(48)

2.3.5. NED Frame to ENU Frame

In many situations, it may be necessary to use the East-North-Up (ENU) reference system, as it is the reference frame employed in the Universal Transverse Mercator (UTM) coordinate system, which is directly associated with the World Geodetic System (WGS84) used by the Global Positioning Systems (GPS), with the well known latitude, longitude and height coordinates. Figure 11 presents the relation between the NED and the ENU frames.

From Figure 11, the rotation matrix from the NED frame to ENU frame,

R_{N E D \to E N U}

, can be obtained:

R_{N E D \to E N U} = [\begin{matrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & - 1 \end{matrix}] .

(49)

Thus, the primed position of the object in the ENU frame is given by:

P_{E N U}^{'} = R_{N E D \to E N U} P_{N E D}^{'} + T_{N E D \to E N U} .

(50)

Considering the previous calculations, a general equation may be extracted, substituting Equation (48) in Equation (50):

\begin{matrix} P_{E N U}^{'} & = & R_{N E D \to E N U} R_{U A S \to N E D} R_{G \to U A S} R_{C \to G} P_{C}^{'} \\ + R_{N E D \to E N U} R_{U A S \to N E D} R_{G \to U A S} T_{C \to G} \\ + R_{N E D \to E N U} R_{U A S \to N E D} T_{G \to U A S} \\ + R_{N E D \to E N U} T_{U A S \to N E D} + T_{N E D \to E N U} \end{matrix}

(51)

To simplify, it can be written as:

P_{E N U}^{'} = R_{C \to E N U} P_{C}^{'} + T_{C \to E N U},

(52)

where the matrix

R_{C \to E N U}

is the product of all rotations:

R_{C \to E N U} = R_{N E D \to E N U} R_{U A S \to N E D} R_{G \to U A S} R_{C \to G},

(53)

and vector

T_{C \to E N U}

is the sum of all translations until this stage, considering the rotations in each step:

\begin{matrix} T_{C \to E N U} & = & R_{N E D \to E N U} R_{U A S \to N E D} R_{G \to U A S} T_{C \to G} \\ + R_{N E D \to E N U} R_{U A S \to N E D} T_{G \to U A S} \\ + R_{N E D \to E N U} T_{U A S \to N E D} \\ + T_{N E D \to E N U} . \end{matrix}

(54)

2.3.6. Determination of $z_{C}$

Equation (38) shows the relation between

P_{C}

and

P_{C}^{'}

:

P_{C}^{'} = \frac{1}{z_{C}} P_{C} .

(55)

Considering Equation (52), the object’s position in the ENU frame is given by:

P_{E N U} = R_{C \to E N U} P_{C} + T_{C \to E N U} .

(56)

In order to determine

z_{C}

, it is necessary to solve the system formed by Equations (52) and (56):

\{\begin{matrix} P_{E N U}^{'} = R_{C \to E N U} P_{C}^{'} + T_{C \to E N U} \\ P_{E N U} = R_{C \to E N U} P_{C} + T_{C \to E N U} \end{matrix} .

(57)

The second equation of the system in Equation (57) can be rewritten in the form:

z_{C} P_{E N U}^{'} - z_{C} T_{C \to E N U} = R_{C \to E N U} P_{C} .

(58)

The combination of Equations (58) and (57) leads to:

P_{E N U} = z_{C} P_{E N U}^{'} - z_{C} T_{C \to E N U} + T_{C \to E N U} .

(59)

Equation (59) is a matrix equation, and has solution in the z coordinates, because the

z_{E N U}

coordinate, or the target object’s z coordinate in the ENU Frame, is known: it is the object’s altitude, considering that the origin of the ENU frame is set on sea level. In cases the ENU Frame is set on another level, the z coordinate will correspond to the vertical distance, as the height, from the origin of ENU frame to the object’s level. Figure 12 shows the scheme.

The value of

z_{E N U}

must be known to perform the calculations, and it is very reasonable to assume this condition will be satisfied, because the region where the UAS is operating is known and so will be its altitude. In addition, digital elevation models can be used to estimate the altitude of the target.

Therefore, considering that

z_{E N U}^{'}

and

z_{C \to E N U}

had been determined in Equation (52),

z_{C}

is finally given by:

z_{C} = \frac{z_{E N U} - z_{C \to E N U}}{z_{E N U}^{'} - z_{C \to E N U}} .

(60)

Finally, the position of the object in the ENU frame can be obtained by using Equation (59)

P_{E N U} = z_{C} P_{E N U}^{'} - z_{C} T_{C \to E N U} + T_{C \to E N U}

(61)

2.3.7. Camera Matrix

Very often, in the literature, all the calculations demonstrated before are resumed in an unique Equation [4,12,13,14], in the form:

z_{C} [\begin{matrix} u \\ v \\ 1 \end{matrix}] = K [\begin{matrix} R_{E N U \to C} & T_{E N U \to C} \end{matrix}] [\begin{matrix} x_{E N U} \\ y_{E N U} \\ \begin{matrix} z_{E N U} \\ 1 \end{matrix} \end{matrix}] = M [\begin{matrix} x_{E N U} \\ y_{E N U} \\ \begin{matrix} z_{E N U} \\ 1 \end{matrix} \end{matrix}] .

(62)

According to Hanning [20], Equation (62) is called pinhole model equation, where M is the camera matrix, used to denote a projective mapping from world coordinates (

x_{E N U}

,

y_{E N U}

,

z_{E N U}

) to pixel coordinates

(u, v)

.

Seeking the goal of this work, which is to bring a complete and detailed exposition about the entire direct georeferencing process, a brief explanation about the meaning of this expression must be taken here.

The main aspect in this equation is the use of homogeneous coordinates, from the projective geometry, in place of the Euclidean geometry coordinates. This representation is necessary to allow calculations in matrix form, using a single matrix (the camera matrix M) when mapping a three-dimensional image from the real world to a bi-dimensional camera image, and vice versa [21].

In the equation, K is the intrinsic parameter matrix (35),

R_{E N U \to C}

is the 3 × 3 matrix representing the product of all rotations from the Camera frame to the ENU frame, and

T_{E N U \to C}

is the 3 × 1 matrix, which is the sum of all rotated translations from the ENU frame to the Camera frame.

According to Hanning [20], in homogeneous coordinates, the extrinsic matrix (

[\begin{matrix} R_{E N U \to C} & T_{E N U \to C} \end{matrix}]

) is given by:

[\begin{matrix} R_{E N U \to C} & T_{E N U \to C} \end{matrix}] = [\begin{matrix} R_{E N U \to C} & T_{E N U \to C} \\ 0_{1 \times 3} & 1 \end{matrix}] = [\begin{matrix} r_{11} & r_{12} & \begin{matrix} r_{13} & t_{11} \end{matrix} \\ r_{21} & r_{22} & \begin{matrix} r_{23} & t_{21} \end{matrix} \\ \begin{matrix} r_{31} \\ 0 \end{matrix} & \begin{matrix} r_{32} \\ 0 \end{matrix} & \begin{matrix} \begin{matrix} r_{33} & t_{31} \end{matrix} \\ \begin{matrix} 0 & 1 \end{matrix} \end{matrix} \end{matrix}] .

(63)

The camera matrix M is given by:

M_{3 \times 4} = [K 0_{3 \times 1}] [\begin{matrix} R_{E N U \to C} & T_{E N U \to C} \\ 0_{1 \times 3} & 1 \end{matrix}] = [\begin{matrix} f_{x} & 0 & \begin{matrix} c_{x} & 0 \end{matrix} \\ 0 & f_{y} & \begin{matrix} c_{y} & 0 \end{matrix} \\ 0 & 0 & \begin{matrix} 1 & 0 \end{matrix} \end{matrix}] [\begin{matrix} r_{11} & r_{12} & \begin{matrix} r_{13} & t_{11} \end{matrix} \\ r_{21} & r_{22} & \begin{matrix} r_{23} & t_{21} \end{matrix} \\ \begin{matrix} r_{31} \\ 0 \end{matrix} & \begin{matrix} r_{32} \\ 0 \end{matrix} & \begin{matrix} \begin{matrix} r_{33} & t_{31} \end{matrix} \\ \begin{matrix} 0 & 1 \end{matrix} \end{matrix} \end{matrix}] .

(64)

As it can be seen, the intrinsic matrix K has also to be used in homogeneous coordinates. The completion of the matrices with zeros and ones is done in a way that the result of the calculations is the same before and after the homogeneous coordinates transformation.

2.4. Lens Camera Distortion

In this paper, the image used in the direct georeferencing process is assumed rectified. This means that no image distortion is present. Usually, manufacturers of UAS cameras provide the option to directly obtain the rectified image. However, as this subject may be an issue, it is briefly discussed in this section.

The geometry of the perspective or pinhole camera is simple since we assume the pinhole to be infinitely small. In reality, the light passes through a lens that complicates the camera model. The lens can distort the light rays projected onto the 2D sensor and this geometric effect of image distortion is important to compensate for with regard to georeferencing. A camera with radial distortion is not well described by the pinhole model. Many wide-angle lenses have noticeable radial distortion which basically means that lines in the scene appear as curves in the image. There are two types of radial distortion: (1) barrel distortion and (2) pincushion distortion. There are also other forms of distortion effects, such as tangential distortion.

By only considering barrel distortion, the distorted coordinate on the sensor plane can be represented as [22]:

\begin{matrix} x_{d i s t o r t e d} & = x (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) \\ y_{d i s t o r t e d} & = y (1 + k_{1} r^{2} + k_{2} r^{4} + k_{3} r^{6}) \end{matrix}

(65)

where

r = \sqrt{{(x - c_{x})}^{2} + {(y - c_{y})}^{2}}

,

c_{x}

and

c_{y}

is the optical centre pixel on the sensor and

k_{1}

,

k_{2}

and

k_{3}

are the radial distortion coefficients, which can be obtained during the camera calibration process. Therefore, the original raw distorted image can be rectified.

2.5. Software Implementation

In this section, a pseudocode to perform the direct georeferencing and the main functions used in the MATLAB and Python script are be presented. The structures followed in each script are the same and follow the six steps discussed in Section 2.3. For each implementation script, the specific employed functions will be explained, as well as its special procedures.

2.5.1. Pseudocode

The steps discussed in the previous section can be summarized in Algorithm 1, that can be further implemented in any chosen programming language.

Algorithm 1 Direct Georeferencing

1:: Input $P_{i}$
2:: Define K
3:: $P_{C}^{'} \leftarrow i n v e r s e (K) P_{i}$
4:: Define $R_{C \to G}$
5:: Input $T_{C \to G}$
6:: $P_{G}^{'} \leftarrow R_{C \to G} P_{C}^{'} + T_{C \to G}$
7:: Input $ψ_{G}, θ_{G}, ϕ_{G}$
8:: $R_{G \to U A S} \leftarrow t r a n s p o s e (D C M (ψ_{G}, θ_{G}, ϕ_{G},^{″} Z Y X^{″}))$
9:: Input $T_{G \to U A S}$
10:: $P_{U A S}^{'} \leftarrow R_{G \to U A S} P_{G}^{'} + T_{G \to U A S}$
11:: Input $ψ_{U A S}, θ_{U A S}, ϕ_{U A S}$
12:: $R_{G \to U A S} \leftarrow t r a n s p o s e (D C M (ψ_{U A S}, θ_{U A S}, ϕ_{U A S},^{″} Z Y X^{″}))$
13:: Input $T_{U A S \to N E D}$
14:: $P_{N E D}^{'} \leftarrow R_{U A S \to N E D} P_{U A S}^{'} + T_{U A S \to N E D}$
15:: Define $R_{N E D \to E N U}$
16:: Input $T_{N E D \to E N U}$
17:: $P_{E N U}^{'} \leftarrow R_{N E D \to E N U} P_{N E D}^{'} + T_{N E D \to E N U}$
18:: $T_{C \to E N U} \leftarrow T_{N E D \to E N U} + R_{N E D \to E N U} T_{U A S \to N E D} + R_{N E D \to E N U} R_{U A S \to N E D} T_{G \to U A S} + R_{N E D \to E N U} R_{U A S \to N E D} R_{G \to U A S} T C \to G$
19:: $z_{C \to E N U} \leftarrow T_{C \to E N U} [2]$
20:: $z_{E N U}^{'} \leftarrow P_{E N U}^{'} [2]$
21:: Input $z_{E N U}$
22:: $z_{C} = (z_{E N U} - z_{C \to E N U}) / (z_{E N U}^{'} - z_{C \to E N U})$
23:: $P_{E N U} \leftarrow z_{C} P_{E N U}^{'} - z_{C} T_{C \to E N U} + T_{C \to E N U}$
24:: Output $P_{E N U}$

2.5.2. MATLAB Implementation

The first step is the transformation from the Image to the Camera frame, and, in the beginning of all the calculations, the intrinsic parameters matrix data needs to be input. The intrinsic parameter matrix K is a 3-by-3 matrix, therefore in MATLAB has the following format: “K = [fx 0 cx; 0 fy cy; 0 0 1]”.

The pixel position coordinates in the Image Frame, u and v, are the next data required. The pixel position matrix in the Image Frame,

P_{i}

, is then determined. The MATLAB command “

K \

” gives the product of the inverse matrix of the intrinsic matrix K with the following term.

The following operations are matrix multiplication for rotations and sum for translations, which in MATLAB is done in a straightforward way.

In the transformation from the Gimbal to the UAS frame, in order to obtain the DCM for the rotation, the Euler angles sequence, yaw, pitch and roll must be input. The MATLAB function “angle2dcm(angl1, angl2, angl3)” gives the DCM matrix, where the parameters “angl1”, “angl2” and “angl3” are the aerodynamics Euler angles sequence, respectively, yaw, pitch and roll angles in this work. Here, it is important to note that the angles of the Euler sequence, yaw, pitch and roll, are commonly given in the most external reference frame, which in this case is the UAS Frame. Therefore, the MATLAB function will return the rotation from the UAS to the Gimbal frame. The rotation from the Gimbal to the UAS frame (

R_{G \to U A S}

) can be obtained by transposing the rotation matrix from the UAS to the Gimbal frame (

R_{U A S \to G}

), using the MATLAB function “transpose(

R_{U A S \to G}

)”. This same procedure needs to be applied when obtaining the rotation from the UAS to the NED frame.

The last step of the script consists of finding the target’s position in the ENU frame. The coordinate

z_{E N U}

, which is equivalent to the distance from the target to the origin of the ENU frame in the z axis, needs to be input. It is also necessary to determine the total translation matrix,

T_{C \to E N U}

, which is given by the sum of all translations matrices, rotated with its related DCMs. Finally, to calculate

z_{C}

,

z_{E N U}^{'}

and

z_{C \to E N U}

are needed. In MATLAB, they are obtained by

P_{E N U}^{'} (3)

and

T_{C \to E N U} (3)

, respectively.

2.5.3. Python Implementation

In the Python implementation, NumPy [23] arrays are used for the matrices. As an example, to input a NumPy array for the intrinsic matrix, the following command is used: “K = numpy.array([[f_x, 0, c_x],[0, f_x, c_y],[0, 0, 1]])”.

With the NumPy library, it is easy to get the inverse matrix of the intrinsic matrix by using the “numpy.linalg.inv(K)” command.

The matrix multiplication is done by using the operator “@” that was introduced in Python 3.5, for example: “C = A @ B”.

The NavPy [24] library is used to provide the DCM matrices by the function “navpy. angle2dcm” which inputs are the three angles of rotation (yaw, pitch and roll). The default rotation sequence is the “ZYX” sequence, which is the one used by this work. In addition, the documentation says that the DCM transforms a vector from the locally level coordinate frame (i.e., the NED frame) to the body frame. Therefore, in this work the transpose of the given DCM will be used, which transforms the vector from the body frame to the NED frame.

Finally, in the last step, where

z_{C}

is calculated, in order to access the third element of a NumPy array, the 2 index is used, for example: “z_T = T_T[2]”.

3. Results

To evaluate the proposed direct georeferencing method, two cases were explored. First, a target in picture taken in a simulation environment was georeferenced. Second, a real setup was used to take a picture of a target with a camera mounted on an UAS, which was then georeferenced.

3.1. Simulation Environment Example with UE4 and AirSim

In this case, a simulation environment was created in the Unreal Engine 4 (UE4). The AirSim plugin was used to simulate the UAS camera and flight dynamics. The UAS model in the simulation environment can be seen in Figure 13.

The virtual camera was placed 20 cm down to the drone and 30 cm to the front as illustrated with the example of Figure 14.

A target (white cross) was placed on the ground, in front of the door of a power house (Figure 15). To simulate the Universal Transverse Mercator (UTM) frame, a random location in the simulation environment was chosen to be the origin of the UTM frame. The target’s position in virtual UTM coordinates was then 8.5 m to the east, −8.0 m to the north and 0 m of height, as it is positioned on the ground.

The gimbal’s attitude was configured to have a yaw (

ψ_{G}

) of −90 degrees (

- π / 2

radians) and a pitch (

θ_{G}

) of −60 degrees (

- π / 3

radians) with respect to the world. The UAS was moved to a place where it could have a good view of the target and the surroundings to take the picture. The position of the drone (

T_{N E D \to E N U}

in meters) was given by the simulated sensors.

The camera was modeled based on the FLIR Chameleon 3 [25] camera sensor and a lens with 12.5 mm of focal distance. The camera has a resolution of 2448 × 2048 pixels and a sensor of 8.6 mm of width [25].

Additionally to the pixels resolution, the AirSim plugin models the camera by the horizontal Field of View in degrees. The following equation was used to calculate the horizontal field of view (

F O V_{horizontal}

in degrees) from the horizontal sensor size (SensorWidth in millimeters) and the focal length (f in millimeters):

F O V_{horizontal} = 2 arctan (\frac{SensorWidth}{2 f})

(66)

Therefore, for 8.6 mm of sensor width and 12.5 mm of focal length, the horizontal field of view is equal to 37.97 degrees.

The target (white cross) in Figure 15 is located on

[u, v] = [1095, 1099]

pixels. The values of all parameters can be seen in Table 1.

The calculated georeferenced position of the target in the ENU frame (

P_{E N U}

) was

[8.50283, - 7.99841]

m. This gives 100% of accuracy if rounded to the second decimal digit, proving that the georeferencing method is correct.

3.2. Practical Example with ROS

For the practical experiment, a ZED 2 camera [26] was mounted on the UAS pointing straight down without a gimbal. The Robot Operating System (ROS) was used to capture and save the data.

The camera ROS wrapper provides the topics of “image_rect_color”, where it is possible to obtain the rectified image; and the “camera_info”, where the camera intrinsic matrix K for the rectified image can be obtained.

The camera system is equipped with an IMU and a Magnetometer which are used in this work to calculate the UAS’ attitude. These sensors are mounted close to the camera sensor. The distance between the attitude sensors and the camera (

T_{G \to U A S}

) and the orientation between the camera and the attitude sensors (

[ψ_{G}, θ_{G}, ϕ_{G}]

) are given by the manufacturer in the ROS wrapper through the topic “left_cam_imu_transform”.

The attitude is given in quaternions format in the topic “imu/data”. As Euler angles are used in this work, a conversion from quaternions to angles is needed. However, in order to proceed with the georeferencing calculations, additional conversions are required as the frame in ROS have different orientation than the UAS frame used by this work (Figure 16), which is the same used by flight control units. Therefore, as the y and z axes in ROS’ frame are pointing to the opposite direction of the axes y and z in this work’s UAS frame, the pitch angle obtained from the “imu/data” ROS topic needs to be inverted; and the yaw angle needs to be inverted and added 90 degrees (

π / 2

radians).

The test setup was to take a picture (Figure 17) of the target object, which is a table of 80 × 60 × 85 cm.

The parameters considered for the direct georeferencing can be seen in Table 2. The translation from camera to gimbal frames (

T_{C \to G}

) is zero because there is no gimbal installed in this setup. To keep the same implementation steps that were described in the previous sections, as no gimbal was installed, the gimbal angles (

[ψ_{G}, θ_{G}, ϕ_{G}]

) considered in the calculations are actually the angles of the camera attitude with respect to the IMU (UAS frame). The same is valid for the translation (

T_{G \to U A S}

). The attitude of the IMU (UAS frame) is obtained from converting the quaternions from the ROS topic “imu/data” to Euler angles and making the required conversions from the frame used in ROS to the frame used in this work. The NED frame’s origin is considered at the same place as the IMU (UAS frame) and the ENU horizontal origin is placed on the UAS. Finally, the vertical ENU origin was placed on the ground, therefore, the height of the UAS (

z_{N E D \to E N U}

) was obtained from its relative height (height from take-off).

The strategy used to evaluate the method was to georeference the four corners of the table’s top shelf. As it can be seen in Figure 18, the blue squares have the corners’ pixel coordinates and the green squares have the calculated georeferenced coordinates in the ENU frame. It is important to notice that in this case, as the corners of the top shelf are being georeferenced, the height of the object (

z_{E N U}

) is of 0.85 m.

The calculated lengths from corner to corner can be seen in Table 3. This was the strategy chosen to evaluate the algorithm as there was no way to precisely measure the position of the table during the experiments. Therefore, the georeferenced position presented in Figure 18 is relative to the UAS, as it can be observed in Table 2 from the

T_{N E D \to E N U}

translation vector. The result was very satisfactory as the calculated values perfectly match the dimensions of the shelf.

Finally, to evaluate the proposed solution’s performance when georeferencing targets at different heights, the distance between corners that are placed on top of each other was calculated and compared with the measured distance.

The bottom right corner of the top shelf is assumed at 0.85 m of height (

z_{E N U}

) and the bottom right corner of the bottom shelf is assumed at 0.35 m of height.

The georeferenced coordinates of both corners were calculated (Figure 19) and used to calculate the distance between them. The calculated distance was of 0.52 m, against 0.50 m of measured distance. Therefore, there was an error of 2 cm, which is small and could have happened because of the shelf’s thickness or deviations when manually choosing the pixels in the image.

4. Discussion

There are many different orientation frames in the process of direct georeferencing aerial images acquired by cameras mounted on UAS. First, the image frame, then the camera, gimbal, UAS (or IMU) and the geographic/world (NED and ENU) frames. In this work, in order to follow the literature and most common applications, two world frames were considered: the NED (North-East-Down); and ENU (East-North-Up) frames.

Despite camera, gimbal and UAS, NED and ENU frames were presented in this work, not all applications will require the use all these frames. Therefore, it is important to apply the method with the needed changes in each case.

As seen in the practical case study, it is important to understand the formats of the attitude data given by the flight control units or ROS, to be able to convert to the formats used in the direct georeferencing algorithm presented in this work. This is one of the reasons why this work is considered relevant as it explains the implementation steps so that the readers can understand the reasons behind the equations, in order to adapt the solution to their use.

In addition, it is important to note that accurate measurements are important for an accurate direct georeferencing. Therefore, GNSS RTK solutions should be used, in addition to highly accurate Inertial Measurement Units (IMU), magnetometers and gimbals. Furthermore, platform mounting measurements between the origins of the frames of reference (lever arms) should be included in the calculations as translation vectors. This is often neglected in the UAS community, since the lever arms are short, in the range of centimeters. However, for high resolution sensors in the order of millimeters, or larger UAS or manned aircraft, the lever arms are necessary to account for.

Another important aspect to consider is that depending on the IMU orientation, there may be problems when computing the yaw angles when the camera is pointing down. A possible solution for this issue is to change the orientation of the z axis so that it is perpendicular to the world when the UAS is horizontally positioned. A way to eliminate this issue is to use quaternions in all steps of the direct georeferencing implementation. This could be interesting future work.

5. Conclusions

This work presented a comprehensive method for direct georeferencing of aerial images taken by monocular cameras mounted on UAS. To perform the calculations, the method uses data about the camera sensor, relative position and orientation between camera, gimbal, UAS and world, and position and attitude sensor data. Two case studies were performed to show the method’s employment: georeferencing of a target in an image taken in a simulation environment; and georeferencing an object from an image taken in a real flight. For an image taken in a simulation environment, the georeferenced target position was precisely calculated. In the practical experiment, where the corners of a table were georeferenced, there were small inaccuracies, which were expected, as in real life setups there are small terrain deviations, sensor noise, among other problems. Therefore, the main goal of this work, which was to present a comprehensive direct georeferencing method and its implementation, was accomplished. This included the mathematical fundamentals and the assessment of practical challenges explored in the case studies.

Author Contributions

Conceptualization, C.A.M.C. and F.A.A.A.; methodology, C.A.M.C. and F.A.A.A.; software, C.A.M.C., I.P.G. and F.A.A.A.; validation, C.A.M.C., A.S. and F.A.A.A.; formal analysis, F.A.A.A. and A.S.; investigation, C.A.M.C., F.A.A.A. and A.S.; writing—original draft preparation, C.A.M.C., A.S. and F.A.A.A.; writing—review and editing, M.F.P., D.B.H. and A.S.; visualization, D.B.H., M.F.P., F.A.A.A. and I.P.G.; supervision, D.B.H., A.G.M., F.A.A.A. and M.F.P.; project administration, F.A.A.A.; funding acquisition, M.F.P., D.B.H. and A.G.M. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank the following federal Brazilian agencies: CEFET-RJ, CAPES (code 001), CNPq and FAPERJ for the support to this work.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MATLAB and Python codes, as well as the image data used in this work are openly available in https://github.com/professorfabioandrade/georef (accessed on 23 December 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

De Alcantara Andrade, F.A.; Sivertsen, A.H.; Moraes Correia, C.A.; Belbachir, N.; De Sousa, L.C.; Pereira Rufino, V.M.; Petrópolis, E.P.; Rodrigues e Silva, E.; Rinaldi Fortes Henriques, V.H. Virtual Reality Simulation of Autonomous Solar Plants Inspections with Unmanned Aerial Systems. In Proceedings of the 2021 Aerial Robotic Systems Physically Interacting with the Environment (AIRPHARO), Biograd na Moru, Croatia, 4–5 October 2021; pp. 1–8. [Google Scholar] [CrossRef]
Rees, E.V. A Quick Introduction to Drone Photogrammetry. 2019. Available online: https://medium.com/@ericvanrees/a-quick-introduction-to-drone-photogrammetry-381a15478fdc (accessed on 24 December 2021).
Rodin, C.D.; de Lima, L.N.; de Alcantara Andrade, F.A.; Haddad, D.B.; Johansen, T.A.; Storvold, R. Object Classification in Thermal Images using Convolutional Neural Networks for Search and Rescue Missions with Unmanned Aerial Systems. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Helgesen, H.H.; Bryne, T.H.; Wilthil, E.F.; Johansen, T.A. Camera-Based Tracking of Floating Objects using Fixed-wing UAVs. J. Intell. Robot. Syst. 2021, 102, 1–16. [Google Scholar] [CrossRef]
Novak, M.U.S. Border Patrol Flew More Drone Missions Last Year Than Ever Before. Available online: https://gizmodo.com/u-s-border-patrol-flew-more-drone-missions-last-year-t-1829323612 (accessed on 24 November 2021).
Sisk, R. Coast Guard Commandant Boosts Plan to Outfit Cutters with ScanEagle Drones. 2019. Available online: https://www.military.com/defensetech/2018/12/10/coast-guard-commandant-boosts-plan-outfit-cutters-scaneagle-drones.html (accessed on 24 November 2021).
Honkavaara, E.; Saari, H.; Kaivosoja, J.; Pölönen, I.; Hakala, T.; Litkey, P.; Mäkynen, J.; Pesonen, L. Processing and assessment of spectrometric, stereoscopic imagery collected using a lightweight UAV spectral camera for precision agriculture. Remote Sens. 2013, 5, 5006–5039. [Google Scholar] [CrossRef] [Green Version]
Luna, T. New technology making drones easier, more affordable. The Boston Globe, 8 December 2015. [Google Scholar]
Anderson, J.D. Aircraft Performance & Design; McGraw-Hill Science Engineering: Berks, UK, 1999. [Google Scholar]
Skaloud, J. Optimizing Georeferencing of Airborne Survey Systems by INS/DGPS; University of Calgary: Calgary, AB, Canada, 1999. [Google Scholar]
Zhou, G.; Li, C.; Cheng, P. Unmanned aerial vehicle (UAV) real-time video registration for forest fire monitoring. In Proceedings of the 2005 IEEE International Geoscience and Remote Sensing Symposium, IGARSS ’05, Seoul, Korea, 29–29 July 2005; pp. 1803–1806. [Google Scholar]
Hemerly, E.M. Automatic georeferencing of images acquired by UAV’s. Int. J. Autom. Comput. 2014, 11, 347–352. [Google Scholar] [CrossRef] [Green Version]
Leira, F.S.; Trnka, K.; Fossen, T.I.; Johansen, T.A. A ligth-weight thermal camera payload with georeferencing capabilities for small fixed-wing UAVs. In Proceedings of the 2015 International Conference on Unmanned Aircraft Systems (ICUAS), Denver, CO, USA, 9–12 June 2015; pp. 485–494. [Google Scholar]
Helgesen, H.H.; Leira, F.S.; Bryne, T.H.; Albrektsen, S.M.; Johansen, T.A. Real-time georeferencing of thermal images using small fixed-wing UAVs in maritime environments. ISPRS J. Photogramm. Remote Sens. 2019, 154, 84–97. [Google Scholar] [CrossRef]
Xiang, H.; Tian, L. Method for automatic georeferencing aerial remote sensing (RS) images from an unmanned aerial vehicle (UAV) platform. Biosyst. Eng. 2011, 108, 104–113. [Google Scholar] [CrossRef]
Barber, D.B.; Redding, J.; McLain, T.; Beard, R.; Taylor, C.N. Vision-based Target Geo-location using a Fixed-wing Miniature Air Vehicle. J. Intell. Robot. Syst. 2006, 47, 361–382. [Google Scholar] [CrossRef]
Kobayashi, A. International Encyclopedia of Human Geography; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
Curtis, H.D. Orbital Mechanics for Engineering Students; Butterworth-Heinemann: Oxford, UK, 2013. [Google Scholar]
Talay, T.A. Introduction to the Aerodynamics of Flight; Scientific and Technical Information Office, National Aeronautics and Space Administration (NASA): Washington, DC, USA, 1975; Volume 367. [Google Scholar]
Hanning, T. High Precision Camera Calibration; Springer: New York, NY, USA, 2011. [Google Scholar]
Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Li, M.; Lavest, J.M. Some aspects of zoom lens camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 1996, 18, 1105–1110. [Google Scholar]
Oliphant, T.E. A Guide to NumPy; Trelgol Publishing: New York, NY, USA, 2006; Volume 1, Available online: https://ecs.wgtn.ac.nz/foswiki/pub/Support/ManualPagesAndDocumentation/numpybook.pdf (accessed on 23 December 2021).
NavPy. Welcome to NavPy’s Documentation!— NavPy 0.1 Documentation. Available online: http://navpy.readthedocs.io (accessed on 6 September 2021).
FLIR. Chameleon3 USB3 | Teledyne FLIR. Available online: https://www.flir.com.br/products/chameleon3-usb3/ (accessed on 23 December 2021).
Labs, S. ZED 2. Available online: https://www.stereolabs.com/zed-2/ (accessed on 19 July 2021).

Figure 1. Angles between

X^{'} Y^{'} Z^{'}

and

X Y Z

axes.

Figure 1. Angles between

X^{'} Y^{'} Z^{'}

and

X Y Z

axes.

Figure 2. Airplane yaw, pitch and roll maneuvers.

Figure 3. Euler sequence: (a) Yaw; (b) Pitch; and (c) Roll.

Figure 4. Camera Model.

Figure 5. Pinhole Camera Model: (a) X axis view; and (b) Y axis view.

Figure 6. Equivalent Camera Model: (a) X axis view; and (b) Y axis view.

Figure 7. Equivalent Camera Model, in a more detailed perspective view, with the Image Frame top left orientation in the two axis U and V.

Figure 8. Camera and Gimbal frames.

Figure 9. UAS and Gimbal frames.

Figure 10. NED and UAS frames.

Figure 11. ENU and NED frames.

Figure 12. Object’s height in the ENU frame (

z_{E N U}

coordinate).

Figure 12. Object’s height in the ENU frame (

z_{E N U}

coordinate).

Figure 13. UAS in simulation environment.

Figure 14. Camera placement on UAS in the simulation application.

Figure 15. Picture acquired in the simulation environment.

Figure 16. ROS and georeferencing algorithm frames.

Figure 17. Aerial image taken by camera mounted on UAS.

Figure 18. Pixel and georeferenced positions of table’s top shelf’s four corners.

Figure 19. Georeferenced positions of bottom right corner of the top shelf and bottom right corner of the bottom shelf.

Table 1. Parameters used for the direct georeferencing of the aerial imaged acquired in the simulation environment.

Parameter	Value	Note
$P_{i}$	$[1095, 1099]$
K	$[\begin{matrix} 3558.1395 & 0 & 1224 \\ 0 & 3558.1395 & 1024 \\ 0 & 0 & 1 \end{matrix}]$
$T_{C \to G}$	$[0, 0, 0]$	The camera and the gimbal frames have the same origin.
$[ψ_{G}, θ_{g}, ϕ_{G}]$	$[- π / 2, - π / 3, 0]$
$T_{G \to U A S}$	$[0.3, 0, 0.2]$
$[ψ_{U A S}, θ_{U A S}, ϕ_{U A S}]$	$[0, 0, 0]$	As the gimbal’s attitude is with respect to the world, the UAS attitude is not considered.
$T_{U A S \to N E D}$	$[0, 0, 0]$	The UAS and NED frames have the same origin.
$T_{N E D \to E N U}$	$[31.72212, - 6.55099, 42.44889]$
$z_{E N U}$	0

Table 2. Parameters used for the direct georeferencing of the aerial imaged acquired in a real flight.

Parameter	Value	Note
K	$[\begin{matrix} 1055.33422 & 0 & 1055.33423 \\ 0 & 990.06824 & 544.24640 \\ 0 & 0 & 1 \end{matrix}]$
$T_{C \to G}$	$[0, 0, 0]$
$[ψ_{G}, θ_{G}, ϕ_{G}]$	$[0.00176, 0.00116, 0.00138]$
$T_{G \to U A S}$	$[- 0.002, 0.023, 0.002]$
$[ψ_{U A S}, θ_{U A S}, ϕ_{U A S}]$	$[6.04630, - 1.46643, - 0.10614]$
$T_{U A S \to N E D}$	$[0, 0, 0]$	The UAS and NED frames have the same origin.
$T_{N E D \to E N U}$	$[0, 0, 8.88]$

Table 3. Calculated length between shelf corners.

	Length [m]
top left to top right	0.80
top left to bottom left	0.60
top right to bottom right	0.60
bottom left to bottom right	0.80

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Correia, C.A.M.; Andrade, F.A.A.; Sivertsen, A.; Guedes, I.P.; Pinto, M.F.; Manhães, A.G.; Haddad, D.B. Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications. Sensors 2022, 22, 604. https://doi.org/10.3390/s22020604

AMA Style

Correia CAM, Andrade FAA, Sivertsen A, Guedes IP, Pinto MF, Manhães AG, Haddad DB. Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications. Sensors. 2022; 22(2):604. https://doi.org/10.3390/s22020604

Chicago/Turabian Style

Correia, Carlos A. M., Fabio A. A. Andrade, Agnar Sivertsen, Ihannah Pinto Guedes, Milena Faria Pinto, Aline Gesualdi Manhães, and Diego Barreto Haddad. 2022. "Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications" Sensors 22, no. 2: 604. https://doi.org/10.3390/s22020604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comprehensive Direct Georeferencing of Aerial Images for Unmanned Aerial Systems Applications

Abstract

1. Introduction

2. Materials and Methods

2.1. Direct Georeferencing

2.2. Mathematical Fundamentals

2.2.1. Rotation Matrix

2.2.2. Euler Angles Sequence for Aerodynamics: Yaw, Pitch and Roll

2.3. Implementation

2.3.1. Image Frame to Camera Frame

2.3.2. Camera Frame to Gimbal Frame

2.3.3. Gimbal Frame to UAS Frame

2.3.4. UAS Frame to NED Frame

2.3.5. NED Frame to ENU Frame

2.3.6. Determination of z C

2.3.7. Camera Matrix

2.4. Lens Camera Distortion

2.5. Software Implementation

2.5.1. Pseudocode

2.5.2. MATLAB Implementation

2.5.3. Python Implementation

3. Results

3.1. Simulation Environment Example with UE4 and AirSim

3.2. Practical Example with ROS

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3.6. Determination of $z_{C}$