Abstract

Artificial intelligence- (AI-) empowered machines are devised to mimic human actions. In the automotive industry, AI plays a significant role in the development of vehicular technology. AI joins hands with the field of mechatronics to assist in the accurate execution of the vehicle functionalities. Autonomous vehicles get the scene information by using onboard sensors such as laser, radar, lidar, Global Positioning System (GPS), and vehicular communication networks. The data obtained is then used for various path planning and control techniques to make the vehicles capable of autonomously driving in complex environments. Autonomous vehicles use very up-to-date AI algorithms to localize themselves in known and unknown environments. AI algorithms are also exploited for perception, path planning, and motion control. A concise review of the state-of-the-art techniques to improve the performance of autonomous vehicles is presented.

1. Introduction

The world is progressing in technology and automation impressively with every passing day. It results in the establishment of smart cities by interconnecting the intelligent Home Area Networks (IHAN), Intelligent Industrial Area Networks (IIAN), Intelligent Vehicular Communication Networks (IVCN), and Smart Grids (SG). The key enabler of IVCN is included in an autonomous vehicle as an intelligent node of the Internet of Vehicles (IoV), Vehicle to Everything (V2X), Vehicle to Vehicle (V2V), and Vehicle to Infrastructure (V2I). People started working on autonomous driving in 1920, and since then, many advancements have been introduced in that domain. But technology still needs human support even with a certain level of intelligence. Current research is focused on introducing vehicles as completely driverlesswhich means no human intervention is required anymore. Intelligent vehicles can move around independently withtheir decision-making capabilities [13].

According to the Society of Automobile Engineers (SAE) [4], automated vehicles are categorized into six different levels. The initial level is level 0; in this level, the driver is responsible for all decisions which means no autonomy. The highest level is level 5, where the vehicle alone is responsible for all driving tasks and decisions (fully autonomous). These levels are presented in Figure 1.

Although many companies such as Uber, Google, and Tesla have invested a lot in the advancement of this technology, the autonomous system is still an active research area due to its very large challenges. A good autonomous system is one that is able to make correct decisions intelligently in real-time scenarios [58]. Active researchers are still focusing on devising better algorithms for localization, perception, and detection.

The most important questions the autonomous vehicle technology is built upon are as follows: (1)Where am I at the time?(2)What is around me?(3)What is going to happen next?(4)What should be done?

The first question, “Where am I at the time,” is the localization problem. The vehicle must be able to locate/localize itself in the current environment. The next question is getting information about surroundings, and it deals with perception.. Based upon the information perceived/detected, the prediction about the environment falls under the territory of the third question, that is, “What is going to happen next?” Finally, the course of action to be taken by the vehicle is discussed by, “What should be done?” All these fundamental questions are addressed by the use of different sensors and algorithms that make these cars reliable and safe to drive.

Autonomous vehicles sense the world by using various sensors mounted on the vehicle’s assembly as shown in Figure 2. Information received from these sensors is then used to make decision like the safest path to reach the destination considering the optimality with respect to time and distance required to reach the place. To complete the task, more cutting-edge solutions, like localization, object detection and identification path planning, and data fusion received from different sensors, are needed.

With the availability of very powerful computational tools like graphics processing units (GPUs) and a very large amount of data, a subset of artificial intelligence known as deep learning(DL) has gained enormous popularity to solve these problems and to achieve the optimal performance [10]. (DL) algorithms have improved the performance of AVs by ensuring accuracy and fast processing speed. In this paper, different AI technologies being used in autonomous vehicles are reviewed. In Section 2, the generic structure of Auntonomous Vehicles (AVs) is discussed. Section 3 discusses the state-of-the-art techniques used for localization. In Section 4, techniques used for path planning are discussed, and in Section 5, a brief discussion on motion controllers is made.

2. Autonomous Vehicle Decision-Making Architecture

Autonomous decision-making is required in AVs to process the observation data received from the sensors mounted on the vehicle. The car’s computer uses these observations to make optimal decisions. These decisions can be computed in two possible ways: either by using the integrated perceive-plan-act method or by end-to-end learning methods. In the end-to-end method, the information obtained from sensors is mapped to control outputs directly without any intermediate steps. An AI-based AV is shown in Figure 3. As can be seen in Figure 3, each step in AVs’ perceive-plan-act method can be implemented either by classical methods with no learning or the latest AI or DL techniques. The end-to-end method of implementation always uses DL techniques. Learning and nonlearning methods can be used together in various arrangements; for example, an object detector based upon deep learning techniques provides input to the A algorithm that is used for path planning.

An integrated perceive-plan-act method has four components of perception and localization, path planning, behavioral mediation, and motion control, and these components are discussed one by one in this paper.

3. Perception and Localization in AVs

Autonomous vehicles must be able to perceive the environment and be able to locate themselves in the environment correctly. This section reviews various techniques for perception and localization implemented in the literature.

3.1. Hardware for Sensing: Cameras or LiDAR

For better understanding of surroundings, 3D perception is usually preferred. Images taken through cameras can only capture a 2D environment. LiDAR sensors are generally used for 3D perception. LiDAR’s performance is measured by its range, rotation/frame rate, field of view, and resolution. Velodyne is also a 3D sensor that has a 360° field view. Autonomous vehicles cannot afford any delays in information communication, so to ensure processing at very high speeds, a range with a minimum of 200 m is required.

The debate of camera usage or LiDAR technology is still a hot topic. For example, Tesla is using its camera system for environment perception while Waymo’s vehicle technology is based on LiDAR. Every sensing approach has its own positives and negatives. LiDARs ensure very high resolution and accurate environment perception but show poor performance in the case of bad weather. Moreover, LiDAR technology right now is very expensive. On the other hand, cameras are cheap, but they have very low depth perception and also show poor performance under bad weather conditions. In addition to LiDAR/cameras, ultrasonic sensors and RADAR are also used to enhance the system’s perception capability. Waymo makes use of three LiDAR sensors.

3.2. Understanding the Driving Scene

The environments that autonomous vehicles work in are as follows: (1)Multiagent environment(2)Dynamic(3)Unknown(4)Stochastic(5)Sequential(6)Partially observable

All these features of the environment make the task of autonomous driving extremely challenging. Cars should be able to detect every possible scenario like all other agents in the environment, drivable areas, and pedestrians. The task becomes more and more challenging while driving in an urban area where a variety of objects appear and blockings are very high.

For environment perception, deep neural networks (DNNs) are playing a very important role. Various deep neural network (DNN) algorithms have been proposed for the detection of objects where objects are taken as 2D regions of interest [1214]. In some other studies, DNNs are used for environment perception based upon pixel-wise segmentation in images [15], 3D bounding boxes in LiDAR [16], and, in some cases, 3D representation of objects in LiDAR + camera-combined data [17]. On a lighter note, for object identification, image data can be useful. However, while estimating 3D positions of the objects as 2D images, depth information of the scene is lost. The two most popular methods of driving scene detection are as follows: (1)Semantic and instance segmentation(2)Bounding boxes like object detectors

For safe navigation and to understand surrounding environments, semantic and instance segmentation are of utmost importance. For this purpose, several studies using efficient deep learning-based frameworks have been reported recently in the literature. FSNeT, a failure detection framework, is proposed for pixel-level misclassifications in the images [18]. In [19], the transformer-based knowledge distillation framework is proposed for efficient semantic segmentation of road driving scenes. A convolutional neural network method using multiscale attentions is proposed for instance segmentation [20].

3.3. Localization

Localization is the task of finding the vehicle’s pose (orientation + position) when it moves in the environment. Localization is an elemental requirement for navigation. It is important to mention here that some of the latest research trends in AVs [21, 22] propose DL-based algorithms that do not need localization and mapping and instead produce end-to-end driving decisions based upon the sensor information. This is termed as the behavior reflex approach [22].

GPS is most commonly used for localization in autonomous vehicles. GPS data is integrated with other sensor data to compensate for the signal loss in case of any possible outage. Various techniques for sensor fusions exist in the literature. The most commonly used traditional methods for sensor fusion are the Kalman filter, extended Kalman filter, unscented Kalman filter, particle filters, and multimodal Kalman filters [2326]. A robust cooperative positioning (RCP) [27] scheme to acquire accurate position has been proposed that augments GPS with ultra wide band (UWB). However, the latest trends deal with visual-based localization that uses DL techniques. This method of localization is also called visual odometry (VO). Visual localization is achieved by key point landmarks matching in adjacent video frames. Based upon the vehicle’s current frame information, key points are fed as input to the n-point mapping algorithm for the vehicle’s pose detection with respect to the previous frame. Accuracy of visual odometry can be enhanced by using deep learning algorithms. These algorithms can affect the key point detector’s precision. A DNN is trained for key point distractors learning in monocular VO [28]. The incremental mapping of the environment’s structure can also be done by computing the camera pose. This method belongs to SLAM (simultaneous localization and mapping) [29].

SLAM is the act of online map making and localizing the vehicle in it at the same time. A priori information about the environment is not required in SLAM. Because of the enormous improvements of deep learning approaches in image classification and detection, these algorithms are being recommended to enhance traditional SLAM algorithms. Although the deep learning applications in this field are still not mature enough, some studies have proposed to replace classical SLAM blocks with deep learning modules to attain better accuracy and robustness.

To ensure safe navigation, AVs should be able to predict the surrounding environment’s motions as well. This is known as scene flow. LiDAR-based estimation of the scene flow is a common approach in literature. Current research proposes to replace the method with DL techniques for automatic learning of the scene flow.

Despite that the research reports much progress in DL-based localization, classical key point matching techniques still dominate VO (visual odometry) mainly because of computational efficiency and easy deployment on embedded devices.

3.4. Perception

For the task of perception, occupancy maps are used frequently. These can also be termed as the Occupancy Grid (OG). It is environment representation in cells. In this method, driving space is divided into a set of cells and the probability of occupancy is calculated for each cell. The technique is very famous in robotics and is now a viable solution in AVs as well.

DL techniques are being used to detect and track the dynamic objects, to probabilistically estimate the occupancy map around vehicle, and to derive the driving scene context. In the case of driving scene derivation, deep learning is used to label the environment into highway drive, intercity drive, or parking area. Deep learning plays a vital role in OG estimation. It helps in extracting the information from LiDAR data and image processing that is required to populate grid cells. A multitask recurrent neural network is proposed to predict grid maps [30]. Grid maps provide sematic information, occupancies, velocity estimates, and drivable area.

4. Path Planning

Once an AV is able to localizes itself in the environment, next comes path planning. Path planning is defined as the ability of autonomous vehicles to find the optimal path between the start position and its destination (desired location) considering the kinematics and dynamic model of vehicles. The path planning process should make the autonomous vehicle capable of calculating the optimal trajectory to ensure the collision-free route while considering all possible obstacles it might come across in the surrounding environment. As mentioned earlier in the paper, autonomous driving is a multiagent problem, so according to the author in [31], the host vehicle must be capable of and apply good negotiation skills with all other users of the road while performing any action like taking a turn or changing lanes. Mission planning is defined as the full pursuit of the generated path by path planning.

Path planning also includes mission planning, motion planning, and behavior planning. Every time the vehicle undergoes a driving experience, a huge amount of data also termed as big data is stored on the server. AVs can use the information contained in the previously stored data to make correct decisions in the future. Route finding algorithms are very complicated because of all the obstacles that cross the vehicle’s path. The AV should be capable of identification as well as avoiding these obstacles that make the planning algorithm’s task more complicated. The AV must know exactly what to do in a specific driving environment and/or driving situation. For example, for a vehicle driving on the road, it should obey the sequence of waypoints designed by the planning algorithm as shown in Figure 4.

The problem of path planning has been the subject of study for many years and is often divided into two categories, global and local path planning. The techniques used for path planning were divided into four groups: graph search methods, interpolation, numerical optimization, and sampling. Most common motion planning techniques in autonomous vehicles are described below. Figures 5, 6, and 7 show the various techniques as presented in the literature.

4.1. Graph Search-Based Planning Techniques

The autonomous driving path planning techniques work on the basic idea of traversing a complete state space from source point A to goal point B. The state space tells where the objects in the dynamic environment are and is usually represented as a lattice or as an occupancy grid. The graph search algorithms visit the state space in the occupancy grid and return an optimal/nonoptimal solution if it exists or return no solution at all in case it does not exist. The most common search algorithms implemented for autonomous vehicle path planning are described below.

4.1.1. Dijkstra Algorithm

It is a graph search algorithm that finds the shortest path in a grid or series of nodes. It works well for global path planning in both structured and unstructured environments. In [33], the authors detailed the basic description of the algorithm and how to implement it. However, the algorithm has been implemented in [34] in multivehicle simulations. Despite its advantages, a large number of nodes are needed to be traversed in the vast areas making the algorithm slow. Moreover, the algorithm does not use any heuristics function to optimize the search cost. The path obtained is not continuous, so it is not suitable for real-time scenarios. Figure 6 shows different planning algorithms as they are presented in the literature.

4.1.2. A-Star Algorithm

It is an extended version of the Dijkstra algorithm as it implements heuristics to ensure optimality and a faster node search, reducing the computation time [3537]. The advantage of the A-star algorithm comes from the fact that to define the node weights, it calculates the cost. It is costly in terms of speed and memory for searching large areas but is very suitable for searching spaces that are mostly known by the vehicle theoretically beforehand. Various modified versions of A-star are being utilized in mobile applications such as the dynamic () and anytime repairing () [38]. For path planning in unstructured spaces and parking spaces, using Voronoi cost functions has been implemented in [39]. The winner of the DARPA Urban Challenge, the Boss used the algorithm [40]. Despite its advantages, the path found by the A-star algorithm is not continuous. Moreover, sometimes finding the heuristic rule becomes very complex.

4.1.3. State Lattice Algorithm

The algorithm uses spatiotemporal lattices (including velocity and time dimensions) [41, 42]. Depending upon the maneuver’s complexity, the environment is decomposed in a local grid, making it suitable for dynamic environments and local planning. Despite its advantages, the algorithm has to evaluate every feasible solution in the database that makes it computationally expensive.

4.2. Sampling-Based Planning Techniques

This approach works by sampling the state space or configuration space randomly and tries to look for the connectivity inside the space [46]. These techniques try to solve timing restrictions by planning in higher dimensional spaces. However, the techniques result in suboptimal solutions. Most commonly used sampling-based techniques are the Rapidly-Exploring Random Tree (RRT) and Probabilistic Roadmap Method (PRM). Both are probabilistically complete while RRT is much faster than PRM. RRT is used for online path planning. It executes a random search in the navigation space allowing itself to plan quickly in semistructured spaces. In autonomous vehicles, the algorithm has been used by the MIT team in the DARPA Urban challenge [47]. However, the path resulted is jerky, noncontinuous, and suboptimal. A modified version of this algorithm named is discussed in [48]. The solution generated is optimal, although at the cost of computational efficiency.

4.3. Interpolating Curve Planning Techniques

Interpolation is defined as the generation of a new set of data points that are in the range of known data points (reference points). These algorithms take previously known waypoints that describe a global roadmap and generate new data points. The points generated ensure a smooth and continuous trajectory and are also beneficial for the dynamic environment in which the AV moves as well as for AV constraints [51]. During path execution, if an obstacle occurs, it generates a set of new data points to avoid it and then continues on the previously planned path. Different techniques are used for curve generation and path smoothing, some of which are reviewed below.

4.3.1. Lines and Circles

Through the interpolation of known waypoints with circular and linear/straight shapes, segments of different road networks can be represented. It is computationally inexpensive and is easy to implement. It guarantees the shortest path for car-type vehicles [52]. However, on the downside, the path generated is jerky, thus making uncomfortable changes between path segments. It also needs global waypoints.

4.3.2. Clothoid

In this technique, the linear change in curvature is used to make the transitions from and to the curves [53]. These types of curves are implemented in road designs and highways. It is suitable for local path planning. On the downside, although the path generated is continuous, it is not smooth because of the linear behavior. It also has time complexity because of the integrals defining the curve. It also needs global waypoints for path planning.

4.3.3. Polynomial

To meet the limitations in the points being interpolated, polynomial curves are commonly implemented [54]. The limitations in the points include angle, curvature, and position. The coefficients of the curve are determined by limitations in the beginning and ending segments or desired values. This method of interpolation is computationally less expensive and is suitable for comfort. However, on the downside, a 4th or higher degree implementation of curves makes the coefficient computation very difficult and challenging.

4.3.4. Bézier

Bézier curves are the parametric curves that are defined by the set of control points. The Bézier curves are related to the Bernstein polynomial. The advantages of using these curves are their reduced computational cost and intuitive manipulation of the curve because of the control points defining it [55]. It is also possible to continuously concatenate the curves which makes it suitable for comfort. However, with the increase in the curve’s degree and computational time, more and more control points need to be evaluated and placed. It also depends upon global waypoints.

4.3.5. Spline

A spline is a piecewise curve that is defined by the polynomials, clothoid or B-splines. A knot is a junction between each subsegment of the curve, and it possesses a higher degree of smoothness constraint between the spline pieces at the joint [56].

4.4. Numerical Optimization Techniques

In path planning, numerical methods are most often used to smooth already computed paths/trajectories as in [57]. The most commonly used technique is the function optimization. To minimize the outcome of variables, this technique finds real valued roots of a function. Using this technique, a plan can be generated by taking ego-vehicle limitations, road constraints, and other users on the road into account. On the downside, at each motion state, the optimization of the function needs to take place, because of which, the optimization needs to be stopped at a given time. This planning technique also depends on global waypoints.

4.5. Deep Learning-Based Techniques

Latest research shows increased interest in the application of DL techniques in path planning. The two most discussed DL techniques in the path planning scenario are imitation learning and planning based upon reinforcement learning. The fundamental task of imitation learning (IL) [58] is to imitate the human driver’s behavior. The human driver’s behavior is recorded in the form of big data, and then, a convolutional neural network (CNN) is used to make the vehicle learn, how to plan from imitation. Imitation learning is also termed as the inverse of reinforcement learning [59, 60]. This method uses the human driver’s behavior to learn how to maximize reward functions and then to generate driving trajectories just like humans. The DRL method is also used to plan the path. In this method, the agent learns driving trajectories in a simulator environment [61]. On the basis of a transfer model, the real environment model is transformed into a virtual one. Both of these methods have their own advantages and disadvantages. IL has the advantage of being trained on real-world data, but as data is rare on corner cases (e.g., driving off the lanes), the trained network might give errors when it handles unseen scenarios. On the other side, DRL shows good performance in simulations, but the performance is not that good under real-world scenarios. Although the use of deep learning-based techniques to perform perception, localization, path planning, and control is getting much attention, it has also increased concerns of transparency and accountability in autonomous vehicles because of the black box nature of deep neural networks. So to build the trust on these deep frameworks, explainable AI (xAI) is the field that has gained researchers’ interest in recent years. Explanations generated either in numerical form or textual forms or in the form of heat/saliency maps (visual form) provide insights into the decision-making process of autonomous vehicles. Various approaches are being used to produce these explanations. An imitation learning- (IL-) based agent equipped with an attention model is proposed [62]. The attention model helps to understand regions of images considered important in the decision-making process.

5. Motion Controllers/Act

The task of calculating steering commands (longitudinal and lateral) comes under the territory of the motion controller. The motion controller makes use of learning algorithms as part of an incomplete entity, or they work as a complete entity as an end-to-end controller to generate the steering commands from sensory data. Traditional controllers work on a model composed of fixed parameters. Learning controllers use the training information and data to make themselves capable of learning their models over time. The more information gathered, the more accurate the system model is. Commonly used learning controllers are the iterative learning control (ILC) [63] and model predictive control (MPC) [64]. ILC works efficiently for controlling systems that work in repetitive mode, e.g., tracking a defined trajectory in autonomous vehicles. MPC finds the appropriate control actions by solving the optimization problem. MPC also helps us in the prediction of disturbances and uncertainties in the system leading to optimal solutions. The data for training is mostly available in the form of the vehicle’s past states and observations. A use of CNN can then be made by training it to find the dense occupancy grid map. This map is then passed to the cost function of MPC to find the optimal trajectory to be followed by the vehicle over a finite horizon. The maximum advantage of these learning controllers can be achieved as they make use of a model-based control as well as learning algorithms. Deep learning-based techniques have gained much importance in the motion control of autonomous vehicles [30, 65]. A visual attention model is used to train an end-to-end (from images to control commands) convolutional neural network model [66]. These attentions learned by the attention model identify the image regions influencing the network’s output. To generate textual explanations, an attention-based video-to-text model is used. Finally, the controller’s attention map and explanations are aligned to ground the explanations in the image regions that mattered to the controller. In most existing works on autonomous driving, three main modules of autonomous vehicles, i.e., sensing, decision making, and motion controlling, have been studied separately. However, the power of DNN can also be exploited for joint optimization of sensing, decision making, and motion control [67].

6. Conclusion

The development of intelligent and efficient algorithms for the safe operation of AVs is one of the key issues in vehicle design. This work presents a complete layout of an autonomous vehicle. A survey of various state-of- the-art AI algorithms used by the AVs to achieve the best possible and optimal solutions to the problems of perception, localization, path planning, and motion control has been presented. Although the field of AVs is vast and involves a wide variety of challenges to address, this very challenging nature of the problem makes endless research opportunities in this field.

Conflicts of Interest

The authors declare that they have no conflicts of interest.