Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles

Sadykov, Marat; Haines, Sam; Broadmeadow, Mark; Walker, Geoff; Holmes, David William

doi:10.3390/en16041628

Open AccessArticle

Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles

¹

School of Mechanical, Medical and Process Engineering (MMPE), Queensland University of Technology (QUT), Brisbane, QLD 4000, Australia

²

School of Electrical Engineering & Robotics, Queensland University of Technology (QUT), Brisbane, QLD 4000, Australia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Energies 2023, 16(4), 1628; https://doi.org/10.3390/en16041628

Submission received: 23 December 2022 / Revised: 24 January 2023 / Accepted: 27 January 2023 / Published: 6 February 2023

(This article belongs to the Special Issue Computational Intelligence in Electrical Systems)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents a practical usability investigation of recurrent neural networks (RNNs) to determine the best-suited machine learning method for estimating electric vehicle (EV) batteries’ state of charge. Using models from multiple published sources and cross-validation testing with several driving scenarios to determine the state of charge of lithium-ion batteries, we assessed their accuracy and drawbacks. Five models were selected from various published state-of-charge estimation models, based on cell types with GRU or LSTM, and optimisers such as stochastic gradient descent, Adam, Nadam, AdaMax, and Robust Adam, with extensions via momentum calculus or an attention layer. Each method was examined by applying training techniques such as a learning rate scheduler or rollback recovery to speed up the fitting, highlighting the implementation specifics. All this was carried out using the TensorFlow framework, and the implementation was performed as closely to the published sources as possible on openly available battery data. The results highlighted an average percentage accuracy of 96.56% for the correct SoC estimation and several drawbacks of the overall implementation, and we propose potential solutions for further improvement. Every implemented model had a similar drawback, which was the poor capturing of the middle area of charge, applying a higher weight to the voltage than the current. The combination of these techniques into a single custom model could result in a better-suited model, further improving the accuracy.

Keywords:

driving schedulers; gradient recurrent unit (GRU); optimisers; lithium-ion battery (Li-ion); long short-term memory (LSTM); recurrent neural networks (RNNs); state-of-charge (SoC) estimation; time-series machine learning

1. Introduction

The market for electrical vehicles (EVs) has grown significantly in recent decades [1]. The replacement of fossil-fuel-based engines with electric drivetrains eliminates the exhaust emissions, with the potential to significantly reduce human impact on climate change. To increase the market share and reduce the costs of EVs, the batteries’ cost and longevity must be improved. Extensive battery cycling leads to battery degradation over time (ageing). The development of smarter and more accurate battery management strategies may prolong the service duty cycle. This would depend on the system’s ability to estimate the battery’s state of charge at any point in time. An accurate charge calculation would avoid the occurrence of overcharging or overdischarging, leading to improved battery service utilisation, a better health estimation, a longer lifespan, a more reliable range prediction, and other benefits [2].

The development of effective methods for state-of-charge (SoC) estimation remains a crucial research focus. Various techniques to estimate the SoC have been developed to enhance battery usage. The ability to determine the state of charge of a battery or a battery system is a required function of an advanced battery management system (BMS). Those techniques can be classified into three primary categories [3,4,5,6]: direct measurement, model-based methods, and computer intelligence or machine learning (ML). Direct measurement methods take readings from the batteries, relying on sensors, such as the open-circuit voltage, internal resistance, or current readings over a set period (i.e., Coulomb counting) [4,5]. Model-based methods recreate the battery behaviour and use the sensor inputs to calculate the results using a predefined model [6]. Computer intelligence techniques enhance such models with additional data. These data-driven calculations aim to improve the model estimation by fitting it to the actual observed behaviour. Examples include fuzzy logic [7], support vector machines [8,9], or neural networks (NN) [10,11,12,13,14,15,16].

While some model-based methods, such as the equivalent circuit model, are simple to implement within a BMS, many cannot correctly capture a battery’s complex, multiple dependencies [6]. Direct measurement estimation is limited to sensor accuracy and is affected by the losses created by Coulombic efficiencies [17], where some portion of the charge is transformed into heat or is affected by uncaptured battery ageing. In contrast, machine learning can establish the relationships in complicated and multidimensional nonlinear systems [8,9,18]. This characteristic shows excellent potential to account for battery losses due to Coulombic efficiency. Some researchers have used support-vector-machine-based methods to estimate the SoC using the voltage, current, and temperature inputs [8,9]. Sensor data were obtained from a driving schedule profile on a battery cycler, and the achieved end-error estimation was less than 6% in [18]. Many attempts to implement different neural networks exist, but the most promising variant for charge estimation is recurrent neural networks (RNNs) [10,11,12,13,14,15,16]. The effectiveness of RNNs in time-series-dependent problems was shown using internal neurons to process the data sequences with varying lengths by Chemali et al. [11].

In the last five years, the RNN approach has found multiple applications in SoC estimation. The earliest approach utilised the regression nature of the battery’s charging, only using stateless models [10,13,14]. Later, some approaches introduced additional parameters to support the NN learning process [12,13,15]. In addition to good convergence, these models can determine critical events, such as the time before complete charge depletion or overcharge. However, their wider application has been limited due to the need for the initial state as an input feature. The most popular approach determines the charge’s value using the recent history at a fixed voltage, current, and temperature in stateless long short-term memory (LSTM) models [11,12,15,16]. This method has the advantage of being independent of the charge or discharge cycles in different periods, as long as the historical samples are in an equally spaced order of time. The most recent attempt to determine the Li-ion battery’s remaining useful life implemented gradient recurrent unit (GRU) models [10,13,14,15], where every prediction was independent of the previous prediction, allowing for it to be used at any random point of time, without worrying about whether the battery was initially fully charged or depleted. While this applies to the estimation of regenerative braking, stateful models are more applicable to a critical event time estimation, such as the prediction of the remaining battery life. Focusing specifically on RNN models applied to SoC estimation, Table 1 presents a range of methods that have been developed in recent years.

The earliest attempts to train an RNN model to predict the SoC aimed to fit several cycles of a single battery utilisation dataset at different temperatures [10,13,14,15]. Later, this was used to generalise the battery behaviour to multiple usage scenarios, leading to a higher root-mean-squared error (RMSE) and broader applications, as in Mamo and Wang’s [12] work. This approach led to doubled accuracy errors on the testing data when performed at a different temperature or on an untrained driving profile, as compared with similar testing procedures presented by Song et al. [10] and Mamo and Wang [12]. Doubling the quantity of data by combining several temperatures or profiles also led to insignificantly higher errors, but improved the general capture, as per the stateful models in Song et al. [10], with roughly 0.735%, and the stateless models in Mamo and Wang [12], reporting a 1.2533% error, respectively. These numbers can be explained by the use of a single driving profile; when using the entire available temperature range for training, a portion of hand-picked temperatures can be used to report on the validation and accuracy. Such an approach does not necessarily represent a realistic EV usage scenario, since during a single acceleration event, the battery can go from ambient temperature to the maximum allowed temperature within a few seconds, and its usage depends on the road conditions and the driver’s behaviour. One of the potential ways to improve this capture is to modify the structure of the models, introducing an additional layer of logic, such as attention, as per Mamo and Wang [12], or extra ’dense’ layers, as per Jiao et al. [13], making the model applicable to any driving conditions. Another strategy would be to use a variety of statistical or gradient-based optimisers (i.e., adding a momentum algorithm to the stochastic gradient optimisation process [14]) to speed up the training and explore multiple-potential minima, which could achieve the fewest possible errors or identify the model that is most suited to a given scenario. Due to the stochastic nature of ML, it is hard to present any clear winner among the existing optimisers by only judging their complexity, not their average performance over multiple trials.

In most published testing of ML methods applied to SoC, experiments on battery cycling data are conducted on different cell types. The most-used table data for real-time sensors are derived from battery cyclers to validate the efficiencies generated using different current schedules (driving profiles) [10,11,12,14]. Three profiles are most commonly used in the research in this area: the dynamic stress test (DST), which is used for a variable power discharge mode, the aggressive highway drive schedule (US06), and the federal-urban driving scheduler (FUDS), for nominal driving scenarios [12,14,15]. Unlike some general, simple static discharge processes, which commonly appear in other battery-based tools, driving profiles include some amount of regenerative driving to simulate the actual battery application in an electric vehicle. Differences in these drive-cycle data in the training and testing of machine learning SoC estimation have been highlighted, including in applications focusing on the fitting process of battery discharge [10,12,13,15], capturing the complete charge–discharge cycle [11], multiple combinations at various temperatures or profiles [11,12,14,15]; the impact of data samples’ quantity [10], and the cross-validation of all three current profiles against each other [12]. Identifying the best-suited method for a specific condition, such as driving an EV, is a crucial step in machine learning engineering. It requires a carefully defined methodology, which characterises the research conditions as closely as possible, and experimental results from multiple models with applicable techniques and the fewest errors. By comparing the implementation and results from different sources and comparing testing accuracy and performance against multiple driving conditions at various temperatures, ranging from ambient to maximum possible, it is possible to select the best machine-learning technique which can be directly integrated into an electric vehicle and safely used on both tight city roads and long, high-speed highways.

This paper investigates, implements, and compares extended memory-based RNN models to predict the state of charge and additional built-on over-time techniques and select the most suitable, practical application for EV use with combinations of different profiles. Each subset contains an implementation derived from various key references, changing either the structure of the models or the learning approach. This should help us to develop a methodology that can further extrapolate offline trained methods from the lab condition to road-drive tests. These algorithms have been used on all kinds of rechargeable batteries, but this research focuses on only one type of cell, which is openly available for everyone to access. The A123 lithium-ion battery data with three typical driving profiles, obtained from the University of Maryland 2012 [19] cycling experiment, acted as training and testing samples. Several spreadsheets of the sensory measured values of an experimentally cycled single Li-ion cell have been obtained by the Universities’ Battery Research Group using a battery cycler. Each method is validated through these samples (either DST, US06, or FUDS driving profiles) and tested compared to the robustness and accuracy of state-of-charge estimations for batteries in the other two unseen schedulers.

Since there has been no comparison to determine which RNN type or driving profile impacts the state-of-charge estimation for both charge and discharge cycles, this article aims to identify the most viable and optimum method for custom-built electric vehicles. However, long overnight charge cycles and regenerative breaking burst charges are equally crucial for the SoC percentage in the context of electric vehicles’ battery utilisation with prolonged usage and influence the models’ weight and biases. The contribution of this work is in the implementation of all those methods in a comparable way, evaluated against real-world drive cycle scenarios using full charge–discharge cycles over various temperatures and extrapolated into different driving profiles. It works out which is best for machine-learning SoC prediction and shows novelty in the testing validation procedures. There have been many publications, but none tested these methods against each other under the same set of testing circumstances. To move forward with either an effective SoC estimation or the development of a new SoC estimation algorithm, the most effective algorithm for real-world scenarios must be determined. This paper provides that evaluation and concludes which approach is the most appropriate. The remaining sections are organised as follows. Details on algorithms and optimisers are written in Section 2, where Section 2.1 separates all details for each GRU and LSTM method, and Section 2.2 breaks down every applied optimising algorithm. Applied methodology with details regarding training procedure and the selection of hyperparameters are outlined in Section 3, with processing data provided in Section 3.1. Section 4 gives the results of the implementation and performance characteristics and concludes the critical analysis, while Section 5 concludes the article.

2. Preliminary Algorithmic Evaluation

Building on the variants of ML used for SoC estimation in the literature, Table 1, the models to be investigated in this work are given in Table 2. This provides details of the five different implementations that varied in structure and learning process but underwent the same training, validation, testing, and performance measurement procedures. These five methods were directly chosen from the literature as recent and accurate examples of implementation, containing repeatable information using RNN. They are representative cross-correlations of existing machine learning methods in the published literature on state-of-charge estimation, evolving from relatively simple to more complex implementations, representative of the most promising candidates for SoC estimation. An exception was made for Model 5, which was introduced to define a bottom line and support the narration by utilising an earlier optimiser as the simplest variation. Each model was implemented by following the original published versions as faithfully as possible. Any details on the implementation not present in the original published papers were assumed based on ML’s standard methods at the time of writing. This section focuses on providing a detailed overview of each component required to build and train the machine learning model. Each type of model used in this investigation is discussed part by part to provide an overview of its potential strengths and weaknesses. In addition, every optimiser is discussed regarding the growing complexity to visualise their development over time and pick the most efficient model for the final goal.

2.1. Model Structure and Implementation

The general summary model structure is represented in Figure 1, with three feature inputs (voltage, current, and temperature) and a single percentage output (state of charge). Since the output consists of only a single sample, it is defined by a fully connected layer—a dense layer with a single neuron.

Several activation functions for those layers are widely used in machine learning libraries for time-series problems [20]. For the SoC prediction problem, all authors used the same function. They experimentally confirmed that the best option for all hidden layers was the hyperbolic tangent function in Equation (1). The output layer used a sigmoid function as an activation to bound the result between zero and one, indicating the percentage of the charge, given by Equation (2). A dropout layer technique with a 20% cutoff was applied to all hidden layers to prevent early data overfitting over long training periods.

t a n h (x) = \frac{s i n h (x)}{c o s h (x)} = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(1)

σ (x) = \frac{1}{1 + e^{- x}}

(2)

The efficiency of an RNN in a time-series problem is defined by the ability of the neurons to store memory as an internal state. Over time, the memory of long-passed samples may fade away. The problem is called the vanishing gradient, when the value needed to update the network weights shrinks as it propagates over time [21]. Long-term dependencies are not captured, since layers with a slight gradient do not significantly affect the system due to their insufficient weight change [21,22]. The more complicated structures of neurons tend to solve that problem.

Two commonly used recurrent neural networks utilised memory cells, gated recurrent unit (GRU) and long short-term memory (LSTM), with possible extensions implemented by the referenced articles’ authors. The GRU and LSTM used a stateless approach, providing a fixed number of timestamps for all implemented models. The stateless implementations with non-gradient-calculus-based optimisation algorithms were not used as part of this research because their effectiveness was not proven during preliminary work for this case.

2.1.1. Gated Recurrent Unit (GRU)-Based Models

One of the methods proposed by Cho et al. [23], which improves the behaviour of the neural network, is the gated recurrent unit. Unlike a simple recurrent NN with a single activation function in the cells, the GRU implements different logics to deal with the vanishing gradient, as per Figure 2. In addition to the activation function, it adds two gates related to input and propagated sequences. The forget gate

f_{t}

controls the level information, which has to be ignored. The update gate

i_{t}

controls the impact of previous information on the current status. The gates implemented by sigmoid Equation (2) are updated with Equation (3). Both gates are related to cell input sequence

x_{t}

and the memory cell’s output at the last time stamp

h_{t - 1}

.

\begin{matrix} f_{t} & = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) \\ i_{t} & = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \end{matrix}

(3)

The memory cell output

h_{t}

is calculated through the early chosen activation function

t a n h

in Equation (4). The ∗ stands for multiplication by element.

\begin{matrix} \hat{h_{t}} & = t a n h (W_{\hat{h}} [f_{t} * h_{t - 1}, x_{t}] + b_{\hat{h}}) \\ h_{t} & = (1 - i_{t}) h_{t - 1} + i_{t} \hat{h_{t}} \end{matrix}

(4)

2.1.2. Long-Short Term Memory (LSTM)-Based Models

The most commonly used time-series machine learning model is the long short-term memory cell [25]. As in GRU, LSTM models preserve long-term dependencies in the extended data sequences. In its ten years of existence, it has become the most widely used type of RNN in those applications. Figure 3 summarises the internal cell logic.

Unlike the GRU, this cell utilises three gates instead of two. The update gate is replaced with a separate input

i_{t}

and output

o_{t}

, as per Equation (5). All gates utilise the same sigmoid Equation (2).

\begin{matrix} f_{t} & = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) \\ i_{t} & = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \\ o_{t} & = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}) \end{matrix}

(5)

The main difference between the LSTM cell and the GRU lies in the cell state calculation. Using the same

t a n h

activation function, Equation (6) describes how cells are updated and propagated.

c_{t}

represents the cell state at a timestamp.

\begin{matrix} c_{t} & = f_{t} c_{t - 1} + i_{t} \times t a n h (W_{c} [h_{t - 1}, x_{t}] + b_{c}) \\ h_{t} & = o_{t} * t a n h (c_{t}) \end{matrix}

(6)

2.1.3. LSTM with Attention Layer

The research conducted by Mamo and Wang [12] was intended to determine weaknesses and improve the LSTM structure by introducing additional techniques to the default layer of the training model. They added an attention layer [26] between the LSTM and fully connected layers to improve accuracy and replaced a traditional gradient optimiser with a probability-based differential evolution. Figure 4 summarises the model structure, and Equations (7) and (8) define the internal logic between hidden layers and output.

The implementation of the attention layer was not provided with the machine learning library. The source code from Winata and Kampman’s research [27] was used instead. The open-source code is publicly accessible through its Github source [27]. Details of the optimiser usage and replacement are presented in Section 2.2. In the state-of-charge estimation, the attention layer addresses the LSTM shortcomings, such as replacing the traditional method of recursively constructing the LSTM depth, and locating it after the output of the primary layer, just before the model’s dense layer output [12].

\begin{matrix} u_{t} & = t a n h (W h_{t} + b) \\ α_{t} & = \frac{e x p (u^{T} u)}{\sum_{t} (e x p (u_{t}^{T} u))} \\ v_{t} & = α_{t} * h_{t} \end{matrix}

(7)

\begin{matrix} v = \sum_{t} (α_{t} * h_{t}) \end{matrix}

(8)

2.2. Optimisers

The optimisation algorithm aims to define a way to achieve minimal difference between model prediction and actual values using the mean average error (MAE) equation as a loss function, Table 2. The following section breaks down several methods selected by the chosen authors in terms of their growing complexity. Different algorithms have utilised several improvements to achieve an optimum result quicker and avoid overfitting. However, there is no universal best choice to obtain the best result. For the state-of-charge estimation, this research intended to attempt multiple algorithms and determine if there was a best choice for the time-series charge cycling problem. All models shared the same parameter, the learning rate

α

, which acted as a step to update the predictions.

2.2.1. Classic and Momentum Stochastic Gradient Descent Algorithms

One of the simplest methods to optimise the model is the stochastic gradient descent (SGD), Algorithm 1. The SGD optimiser utilises a simple gradient update with the following learning rate: Algorithm 1, Line 8. The extension of SGD, which Jiao et al. [13] used, applies a single momentum calculation, Algorithm 1, to the classical SGD Line 9. In the text, this is referred to as the stochastic gradient descent with momentum (SGDw/M). It increases the algorithm’s performance by improving the convergence speed compared to the classical version.

Algorithm 1 Stochastic gradient descent with momentum optimisation.

1:: Number of input samples
$N \leftarrow l e n g t h (inputdata)$
2:: Size of windows
$S \leftarrow l e n g t h (V_{i . . n})$
3:: Input: $x_{n} = [V_{i . . n}, I_{i . . n}, T_{i . . n}] -$ Shape: $X = (N, S, 3)$
4:: Output: $y_{n} = [S o C_{n}] -$ Shape: $Y = (N, 1)$
5:: Define loss function: L
Get hyperparameters: $α, β_{1}$
6:: while $W_{t} does not converge$ do
7:: $t \leftarrow t + 1$
8:: $g_{t} \leftarrow \nabla_{W} L_{t} (W_{t - 1})$ {obtain gradient}
9:: $m_{t} \leftarrow β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$ {1st moment calculation}
10:: $W_{t} \leftarrow W_{t - 1} - α m_{t}$ {update parameters}
11:: end while

2.2.2. Classic and Robust Online Adaptive Moment Estimation

The most commonly used optimiser in time-series prediction is the adaptive moment estimation [28] (Adam) optimiser. Algorithm 2 highlights the steps required to update model weights and bias, as per the source. In addition to a second

β

constant at Algorithm 2, Line 10 used for the momentum calculation, the algorithm uses

ϵ

, referred to as the fuzz factor.

Algorithm 2 Adaptive moment estimation (Adam) optimisation.

1:: Number of input samples
$N \leftarrow l e n g t h (inputdata)$
2:: Size of windows
$S \leftarrow l e n g t h (V_{i . . n})$
3:: Input: $x_{n} = [V_{i . . n}, I_{i . . n}, T_{i . . n}] -$ Shape: $X = (N, S, 3)$
4:: Output: $y_{n} = [S o C_{n}] -$ Shape: $Y = (N, 1)$
5:: Define loss function: L
Get hyperparameters: $α, β_{1}, β_{2}, ϵ$
6:: while $W_{t} does not converge$ do
7:: $t \leftarrow t + 1$
8:: $g_{t} \leftarrow \nabla_{W} L_{t} (W_{t - 1})$ {obtain gradient}
9:: $m_{t} \leftarrow β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$ {1st moment calculation}
10:: $υ_{t} \leftarrow β_{2} υ_{t - 1} + (1 - β_{2}) g_{t}^{2}$ {2nd moment calculation}
11:: $\hat{m_{t}} \leftarrow \frac{m_{t}}{1 - β_{1}^{t}}$ {corrected $\hat{m_{t}}$ }
12:: $\hat{υ_{t}} \leftarrow \frac{υ_{t}}{1 - β_{2}^{t}}$ {corrected $\hat{υ_{t}}$ }
13:: $W_{t} \leftarrow W_{t - 1} - α \frac{\hat{m_{t}}}{\sqrt{\hat{υ_{t}}} + ϵ}$ {update parameters}
14:: end while

Javid et al. [15] extended the default Adam algorithm, introducing a robust online version of Adam (RoAdam), Algorithm 3, at Algorithm 3, Line 14, along with a third

β

constant at Algorithm 3, Line 15. Adding the direct influence of a loss function to the gradient update adds an online calculus to the regular Adam correction. The framework library contained no inbuilt implementation of the robust optimiser. Instead, this was implemented using first principles by overwriting the model training procedure.

Algorithm 3 Robust online adaptive moment estimation (RoAdam) optimisation.

1:: Number of input samples
$N \leftarrow l e n g t h (inputdata)$
2:: Size of windows
$S \leftarrow l e n g t h (V_{i . . n})$
3:: Input: $x_{n} = [V_{i . . n}, I_{i . . n}, T_{i . . n}] -$ Shape: $X = (N, S, 3)$
4:: Output: $y_{n} = [S o C_{n}] -$ Shape: $Y = (N, 1)$
5:: Define loss function L and initial loss $L (W_{- 1}) = 1.0$
Get hyperparameters: $α, β_{1}, β_{2}, β_{3}, ϵ$
6:: Initialise: $m, v = z e r o e s$ and $d = o n e s$
7:: while $W_{t} does not converge$ do
8:: $t \leftarrow t + 1$
9:: $g_{t} \leftarrow \nabla_{W} L_{t} (W_{t - 1})$ {obtain gradient}
10:: $m_{t} \leftarrow β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$ {1st moment calculation}
11:: $υ_{t} \leftarrow β_{2} υ_{t - 1} + (1 - β_{2}) g_{t}^{2}$ {2nd moment calculation}
12:: $\hat{m_{t}} \leftarrow \frac{m_{t}}{1 - β_{1}^{t}}$ {corrected $\hat{m_{t}}$ }
13:: $\hat{υ_{t}} \leftarrow \frac{υ_{t}}{1 - β_{2}^{t}}$ {corrected $\hat{υ_{t}}$ }
14:: $r_{t} \leftarrow ‖ L_{t} (W_{t - 1}) / L_{t} (W_{t - 2}) ‖$ {relative prediction error term of the loss function}
15:: $d_{t} \leftarrow β_{3} d_{t - 1} + (1 - β_{3}) r_{t}$ {3rd moment calculation}
16:: $W_{t} \leftarrow W_{t - 1} - α \frac{\hat{m_{t}}}{d_{t} \sqrt{\hat{υ_{t}}} + ϵ}$ {Update parameters}
17:: end while

Unlike initial variables m and v of the Adam algorithm, which are set as matrices of zeros, the d variable in the RoAdam algorithm is initialised with ones, Algorithm 3, Line 6. In addition, the algorithm depends on the loss calculation during the parameter evaluation and not every framework supports this with inbuilt functionality. The previous loss result has to be preserved in the next iteration. The initial loss value must be set above zero in the algorithm to avoid a zero-division error, Algorithm 3, Line 5.

2.2.3. Ensemble Optimisation with Nesterov’s Momentum Adam and AdaMax

The Adam algorithm is the most commonly used optimiser. However, two potential issues motivate the change to a different algorithm. First, the training may not converge [29], and second, the optimal solution is frequently missed at large learning steps [30].

Xiao et al. [14] proposed a novel alternative, combining several optimisers to address these issues. The new ensemble optimisation algorithm was based on the combination of Nesterov’s momentum Adam (Nadam), Algorithm 4 [31] and the AdaMax, Algorithm 5 [28], at certain training points.

The Nadam optimiser Algorithm 4 extends the Adam optimiser, which implements the Nesterov momentum [31]. Algorithm 4, Lines 13 and 14 add additional calculations involving gradient and parameter updates, which are intended to improve convergence speed.

Algorithm 4 Nesterov’s adaptive moment estimation (Nadam) optimisation.

1:: Number of input samples
$N \leftarrow l e n g t h (inputdata)$
2:: Size of windows
$S \leftarrow l e n g t h (V_{i . . n})$
3:: Input: $x_{n} = [V_{i . . n}, I_{i . . n}, T_{i . . n}] -$ Shape: $X = (N, S, 3)$
4:: Output: $y_{n} = [S o C_{n}] -$ Shape: $Y = (N, 1)$
5:: Define Loss function: L
Get hyperparameters: $α, β_{1}, β_{2}, ϵ$
6:: while $W_{t} does not converge$ do
7:: $t \leftarrow t + 1$
8:: $g_{t} \leftarrow \nabla_{W} L_{t} (W_{t - 1})$ {obtain gradient}
9:: $m_{t} \leftarrow β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$ {first moment calculation}
10:: $υ_{t} \leftarrow β_{2} υ_{t - 1} + (1 - β_{2}) g_{t}^{2}$ {2nd moment calculation}
11:: $\hat{m_{t}} \leftarrow \frac{m_{t}}{1 - β_{1}^{t}}$ {corrected $\hat{m_{t}}$ }
12:: $\hat{υ_{t}} \leftarrow \frac{υ_{t}}{1 - β_{2}^{t}}$ {corrected $\hat{υ_{t}}$ }
13:: $\hat{g_{t}} \leftarrow \frac{g_{t}}{1 - \prod_{i = 1}^{k} β_{2}^{t}}$ {corrected $\hat{g_{t}}$ }
14:: $W_{t} \leftarrow W_{t - 1} - α \frac{(β_{1}^{k + 1} \hat{m_{t}} + (1 - β_{1}^{t}) \hat{g_{t}})}{\sqrt{\hat{υ_{t}}} + ϵ}$ {Update parameters}
15:: end while

Algorithm 5 in the ensemble sequence is AdaMax [28], another modification of the Adam. The second-order moment on Algorithm 5, Line 10 is replaced with the infinity norm. As a result, Xiao et al. [14] considered AdaMax to have a stable weight-updating rule and be suitable for use in the fine-tuning phase, since its advantage lies in the reduction in gradient fluctuations.

Algorithm 5 Adaptive moment estimation based on the infinity norm (Adamax).

1:: Number of input samples
$N \leftarrow l e n g t h (inputdata)$
2:: Size of windows
$S \leftarrow l e n g t h (V_{i . . n})$
3:: Input: $x_{n} = [V_{i . . n}, I_{i . . n}, T_{i . . n}] -$ Shape: $X = (N, S, 3)$
4:: Output: $y_{n} = [S o C_{n}] -$ Shape: $Y = (N, 1)$
5:: Define Loss function: L
Get hyperparameters: $α, β_{1}, β_{2}, ϵ$
6:: while $W_{t} does not converge$ do
7:: $t \leftarrow t + 1$
8:: $g_{t} \leftarrow \nabla_{W} L_{t} (W_{t - 1})$ {obtain gradient}
9:: $m_{t} \leftarrow β_{1} m_{t - 1} + (1 - β_{1}) g_{t}$ {1st moment calculation}
10:: $υ_{t} \leftarrow m a x (β_{2} υ_{t - 1}, | g_{t} |)$ {corrected $\hat{υ_{t}}$ }
11:: $W_{t} \leftarrow W_{t - 1} - α \frac{m_{t}}{(1 - β_{1}^{t}) (υ_{t} + ϵ)}$ {update parameters}
12:: end while

Xiao et al. [14] considered separating the training process into two stages: pretraining and fine-tuning. Based on their observations: “The purpose of the pre-training phase is to endow the GRU model with the appropriate parameters to capture inherent features of the training samples. The Nadam algorithm uses adaptive learning rates and approximates the gradient using the Nesterov momentum, thereby ensuring fast convergence of the pre-training process.” [14], p. 54195. The selection of the second algorithm was trivial. Xiao et al.’s [14] selection of AdaMax was defined by its fast convergence to a more stable value for further parameter adjustment. The proposed ensemble algorithm combined both methods for a single GRU’s training, see Algorithm 6. This algorithm describes the adapted version of the ensemble algorithm, used by the model training procedures, with Nadam for pretraining and AdaMax for fine-tuning phases. From the results of Xiao et al.’s [14] work,

< p_{1}

and

< p_{2}

had the same number of epochs—100. This scenario used the value of

< p_{2}

at the moment the model reached an overfit with the first phase. The learning rate value was set to the minimum possible amount, as defined by the research literature.

Algorithm 6 Ensemble optimisation training process.

1:: Setup model. Split total number of epoch by 30% to $p_{1}$ and $p_{2}$ or until model overfits at $p_{2}$
2:: Initialise parameters
3:: while epoch $< p_{1}$ : do
4:: if epoch $< p_{2}$ : then
5:: {pass if already compiled with Nadam}
6:: compile model with Nadam parameters {pretraining phase}
7:: else
8:: {pass if already compiled with AdaMax}
9:: compile model with AdaMax parameters {fine-tuning phase}
10:: end if
11:: train for a single epoch
12:: end while

2.3. Dataset Description and Generator

A recurrent neural network is a subclass of NN which has proven effective in weather or stock price forecasting. This method learns by recognising a pattern within a sequential data input, thus predicting the future outcome. Two vectors or matrices define the inputs and outputs of a model. The general description of a single input X is in Equation (9) and an output Y in Equation (10).

V, I

, and T represent voltage (V), current (A), and temperature (°C), respectively, as input features, and

S o C

is the fraction of state of charge (between 0 and 1) as the output. All samples are equally time-distributed, and t represents the number of input time steps at a time. Considering the characteristics of a constant current and constant voltage charging, this workaround should not cause any training issues.

X (n) = \{\begin{matrix} V (0) & V (1) & \dots & V (t) \\ I (0) & I (1) & \dots & I (t) \\ T (0) & T (1) & \dots & T (t) \end{matrix}\}

(9)

Y (n) = \{\begin{matrix} S o C (t) \end{matrix}\}

(10)

Both stateful and stateless methods rely on the input samples’ quality and length. Chemali et al. [11] researched the impact of the history of input samples: the longer the period of input readings, the better the accuracy the model produced and the longer it took to compute. The research results are plotted in Figure 5, outlining the root square parabola behaviour regarding the size of the history compared with errors in the prediction. The optimum size of the windows for stateless models obtained by Chemali et al. [11] was 500 samples. Any more significant matrices led to an increase in computation time but an insignificant difference in performance; thus, 500 was used in this work.

To generate datasets for training and testing purposes, data were combined in a three-dimensional matrix using windowing techniques, as per Figure 6. These figures provide an example of the stateless model input data visualisation, where the step between each window s was less than the number of input time steps. All stateful models used the same windowing technique to keep data generation simple, with a sample size of 1. The state reset for stateful models occurred at the end of every cycle, allowing for a batching mechanism to be implemented to speed up the training process. For example, 12 discharge process datasets with a similar voltage, current, and state of charge, but different temperatures at a time t, could be treated as a single batch. The statefulness of a model preserved the state at index i to the same index in the next batch [32]. In addition, the normalisation technique according to the mean and standard deviation, based on the entire training data of all three input features, was applied to speed up the fitting process.

3. Evaluation Methodology

The methodology implemented in this work prototyped and deployed neural network models from existing methods to identify the best candidate for integration into an electric vehicle’s battery management system. One of the objectives was to analyse several different RNN models, measure their performance, and determine the most promising direction to further enhance the integration of a neural network model into an accumulator inside an EV.

3.1. Battery Data for Training and Validation

Model training was conducted over a lithium-ion battery’s cycling data obtained by the Battery Research Group of the Center for Advanced Life Cycle Engineering (CALCE) Group at the University of Maryland [19] in 2012. According to the associated paper, the battery cycling was performed with a BT2000 tester machine, manufactured by Arbin Instruments, Texas, USA, and controlled with official Arbin Mits Pro Software (v4.27) [33]. Table 3 highlights selected battery characteristics directly from the datasheet [34].

The battery cycling data over 2 Li-ion cells were stored as Excel spreadsheets over the temperature range 0 °C to 50 °C degrees, with 10 degree steps and a tolerance of around 0.5–1 °C, including an ambient temperature of 25 °C. Each testing cycle contained three profiles, distinguished by their current consumption, emulating a stress test or driving scenarios: dynamic stress test (DST)—Figure 7a, highway (US06)—Figure 7b and the federal urban driving schedules (FUDS)—Figure 7c. Each cycle consisted of charge and discharge periods, with a sampling rate of 4 Hz and 1 Hz, respectively. Charging periods were linearly interpolated to match the data sampling rate. The range of 20 °C to 50 °C was used as a training and validation dataset, since this was the most common temperature range for the EVs involved in this research. This resulted in ∼58,613 and ∼12,171 samples for training and validation over one profile. Each model from Table 2 was trained independently on each drive cycle profile and then tested against the other two, as per Mamo and Wang [12]. The performance calculation was conducted over two cycles of 30 °C and 40 °C samples for each of the two remaining profiles, leading to a total of 47,022 testing samples.

As in any battery usage scenario, the state of charge was provided in the CALCE data. However, the Arbin machine stored both in and out charges as separate arrays, along with the applied current. The SoC value could be calculated from the difference between charge and discharge capacities in Ah. The resulting trend could be validated with Coulomb counting, using the integral of the consumed and/or produced current I between initial

t_{0}

and the end of cycle

t_{n}

time, divided by the (converted to seconds) nominal capacity

C_{N}

of the batteries, as per Equation (11).

\begin{matrix} \hat{S o C} & = \frac{\int_{t_{0}}^{t_{n}} I (t) d t}{C_{N}} = \frac{\int_{t_{0}}^{t_{n}} I (t) d t}{(2.3 * 3600)} \end{matrix}

(11)

The final expected value was rounded to two decimal places in all scenarios to simplify the training and testing processes.

3.2. Model Training, Validation, and Testing Metrics Functions

The evaluation procedures were conducted similarly to Mamo and Wang’s [12] research. The same training, validation, and testing procedures were used to determine the best technique and performance. Each training stage was written from first principles using the CUDA-supported TensorFlow 2.3.1 framework’s official documentation, with additional implementation to accommodate method modifications [35]. This allowed a flexibility in modifying the training and evaluation procedures to come up with objective comparison criteria. A single training iteration consisted of several stages, involving several performance and quality checks, to ensure that training improved until the optimum was achieved before hitting a limit.

Algorithm 7 provides the pseudocode summary for the primary training procedures undertaken by every model. Every training run worked with all three profiles, where the first cycle type was used for training and validation and another two for testing and performance rating, as illustrated in Figure 8, with five charge/discharge cycles per type. After a model setup, initial parameter definition, and a single-iteration (epoch) run with the entire training dataset, the models’ mean average accuracy was compared with previous results to decide if retraining was necessary to foresee potential overfitting at Algorithm 7 Line 6. The interchange between different temperatures for validation was neglected to avoid unnecessary complexity in the comparison of the results. If any improvement was observed, the model was saved as a checkpoint for a rollback if needed, before a follow-up evaluation, as in Algorithm 7 Line 16. Otherwise, the model was rolled to the previous state and underwent another attempt, with a half-reduced current learning rate. Within 3–8 attempts, the models recovered and continued learning with an error between 3 and 6%, Algorithm 7 Lines 8 to 12 as, after approximately 30–50 trials, they started to show obvious overfitting or underfitting results with no potential recovery. The cycle-type train on Figure 8 demonstrates the one-to-five data breakdown between validation and training, where the cycle of 25 °C confirmed the general fitting process and produced an output for later use during performance averaging, as in Algorithm 7 Line 19. For this investigation, a single-cycle verification at 25 °C was the primary criterion assessment, as per the research goal and limitations regarding the quantity of data that were provided. Due to the potential assumption that a model inside an electric vehicle would undergo constant online learning, there was no situation when the model had to face completely unpredictable scenarios. Finally, the model was tested against two other cycles types, as shown in Figure 8, assessing the general state of charge cycling that was captured under different conditions, as in Algorithm 7 Line 20. Two cycles from 25 °C and 30 °C were selected as examples of the closest value ranges and most likely idle battery states. The better the results, the more objective the model was in unseen conditions. If the model reached the optimum state, with no further accuracy improvements and hyperparameters to adjust, the training process was stopped and the model underwent the final evaluation against the entire training set with all three profiles, as in Algorithm 7 Line 22.

Algorithm 7 Training procedure.

1:: Setup model. Define optimiser and metrics.
2:: Initialise parameters with initial learning rate at 0.001
3:: Set prev_error 1
4:: while epoch < 100: do
5:: Train model, obtain gradients, and apply optimiser
6:: if error > prev_error: then
7:: while attempt < 50: do
8:: Load previous successful model
9:: Reduce learning rate by half
10:: Train model, obtain gradients, and apply optimiser
11:: if error < prev_error: then
12:: Update error. Save state. Break the loop.
13:: end if
14:: end while
15:: else
16:: Update error. Save state.
17:: end if
18:: Update learning rate based on the scheduler.
19:: Validate model on 25 °C cycler.
20:: Test on two other profiles.
21:: end while
22:: Record overall results against entire training datasets.

The following process was adapted for all error calculations in training, verification, and testing. Figure 9 shows an example of the accuracy evaluation, where the actual state of charge is compared with the model’s prediction. The filled area below the plot captures the absolute error difference between two lines, as per Equation (12). The test procedure was performed on two cycles of each profile but at a different temperature to assess how perceptive the model was in capturing the average and high heat spikes that the battery module might experience. Each result was summarised into a table and reported based on metrics values.

A B S e r r o r = \sqrt{{(A c t u a l - P r e d i c t i o n)}^{2}}

(12)

Metrics functions acted as user evaluation criteria to assess the performance of the trained model during both fitting and validation processes. Although some papers relied on different evaluation criteria, for this research, the metrics were unified, with several equations provided in Table 4. The mean average error (MAE) and root-mean-square error (RMSE) are the two standard metric functions used in almost any machine-learning-related problem, whereas the coefficient of determination (R2), a measure of a model’s goodness of fit, has been used in several sources; therefore, to be as complete and comparable as possible, this was also used as one of the comparison criteria in this article. All metrics were used to represent the entire training and testing cycles, which could be interpreted as a meaningful overall quality of the fitting process. Thus, the same criteria were used to compare the model efficiencies.

3.3. Hyperparameters Selection

The reviewed articles from which the chosen models were selected for testing used constant hyperparameter values for models and optimisers, including learning rate. However, each selected different values based on experiments and observations, somewhere in the range from 0.001 (the standard framework provided) down to 0.0001, to ensure a smooth stepping and minimise the potential of missing the optimum minimum.

\begin{matrix} I F A N & O V E R F I T : \\ \hat{α} & = \frac{\hat{α}}{2} \\ E L S E : \\ Δ & = \frac{s t o p - s t a r t}{N - 1} \\ \hat{α} & = α - Δ \times e p o c h \end{matrix}

(13)

In this work, a stepping learning rate algorithm was used for every optimiser over the training course given by Equation (13). The implementation was equivalent to a linspace function, where every epoch’s learning rate

\hat{α}

was calculated through start and stop variables, indicating the boundaries of rate degradation, with a

Δ

decrement through a total of N iterations and the known

α

value. The learning rate was sequentially reduced following a standard stepped scheme. However, an additional adaptive phase was introduced in this work. The training error was reduced in subsequent epochs when the stepped scheme was followed. The adaptive scheme was deployed if the learning error increased from one epoch to the next. Then, the rate was halved, and the same epoch was rerun. If the training error remained above the error of the previous epoch, the learning rate continued to be reduced to a predetermined minimum. The prior stepped scheme resumed if the error returned to a convergent value. Figure 10 shows two adaptive schemes that were applied and reconverged, and a final ultimate adaptive decay ended the training. Based on two training tests in Figure 11, with and without this approach, the rollback method was found to be superior to other adaptive learning rate schemes presented in this work and was employed for all subsequent training.

The betas and epsilon optimiser constants did not go through any similar optimisation process, due to a lack of documented training attempts to improve fitting over the state-of-charge estimation on lithium-ion batteries. As a result, they were kept constant for all trained models except SGDw/M. Since the stochastic gradient descent does not use any hyperparameters except the learning rate and the single beta constant, its value was set to

β_{1} = 0.8

. The learning rate limits of the scheduler and remaining hyperparameters used in all cases are summarised in Table 5.

The work performed by other authors, in Table 1, used a constant number of layers and neurons to conduct the experiments. Only a few provided their reasoning for the selections or results obtained from other attempted experiments without changing the technique [12,13]. The least time-consuming method evaluated the most promising combination of layers and neurons to create the best, similar circumstances for all methods. The most promising candidates were taken through a dozen similar attempts to obtain an average result and produce several selection criteria for later use.

3.4. Results Averaging

The five models used in this research had a stochastic nature due to the use of randomness in the gradient calculation during the learning [36], such as the stochastic gradient descent, which is discussed in Section 2.2. As such, obtaining repeatable results which were worth comparing required the implementation of an averaging method, in which each model undertook the same procedures multiple times. The resulting plots and values provided far more representative statistical criteria, as opposed to random fluctuations.

During preliminary work, this research attempted to train many models of the same type to produce the best-fitting line. The training results showed a significant variance. As such, training was repeated ten times for each dataset to remove the statistical variance from the comparison results, and the average of all ten trained models was used in each result. Figure 12 and Figure 13 show this process, where Figure 12a and Figure 13a show a single test training history and final SoC prediction, Figure 12b and Figure 13b show the spread of 10 similar training sessions, and Figure 12c and Figure 13c show the averaged result. It is apparent that, while some repeats were highly accurate, others had more significant errors, see Figure 12b. The averaging provided a representative, statistically repeatable result.

In the next Section 4, the first tests carried out to determine the optimum hyperparameters are discussed; then, the full evaluation of models against each of the drive cycles is presented. In each case, the train/validation/test procedures and error metrics of Section 3.2 were followed, with the learning rate method in Section 3.3 and an average of 10 training sessions as in Section 3.4 to ensure the best and most accurate representation of each model.

4. Performance and Results

Since it is common for temperatures in an EV’s battery pack to range from an ambient 20 degrees to the limit of 60 °C, all temperature ranges were used together to train each model. The training process was conducted through all datasets for a single battery testing profile, and validated on a single cycle of unseen data of 25 °C (less or around 20% of the entire set) from another two datasets. This approach led to the accuracy being lower than that reported by other researchers, who were training individually for single temperature ranges, such as Xiao et al. [14]. The following section compares the models trained on each individual and then tests them against the entire dataset of all three profiles. All examples were trained using charge and discharge cycles with a predetermined set of hyperparameters. First, the evaluation of hyperparameters was carried out. Then, all models were compared, using nine plots per method, outlining their training history, prediction, and the generalisation of other profiles per driving schedule. All the models’ deployment testing was proved to operate effectively on low-power devices regarding predictions for EV applications. Although it is beyond the scope of this paper, in work published elsewhere, the algorithm performed adequately, indicating that, if not during training, an actual real-time prediction could work on onboard hardware. The performance was summarised in a table representing the MAE, RMSE, and

R^{2}

of each method for each profile against the entire dataset of each driving schedule.

4.1. Optimisation of Layers and Neurons

Between one and three layers ’L’ and incremental combinations of neurons ’N’ from 131 to 1572 were used to determine the optimum set of hyperparameters for all models to work with. Any values higher or lower than both parameters did not provide worthy outputs; therefore, these were omitted. In a case with multiple layers, the number of neurons was evenly distributed per layer and narrowed to the lowest possible integer. For example, three layers at 1048 neurons would represent each LSTM or GRU layer, containing only 349. Training simple LSTM models with an Adam optimiser for three current profiles only three times resulted in a total of 135 trained models. These could all be summarised in 15 different hyperparameters sets for comparison. Table 6 reports the average results of the five best models, based on the lowest average training error of all three profiles. The time in seconds highlights the duration of training over a single epoch. Online training on a low-power device could be considered an essential factor, sacrificing some accuracy; however, this was not the case in this research. The angle of inclination is a line fit to the average error training over time, starting from the second epoch until the end of the training. This can be determined either by visually examining the average training curve of all attempts or through a simple line fitting, where the negative or positive reversed tangent of angle alpha represents the convergence or divergence, and constant C represents the height of the curve or average error. As a result, a model with index 10, with three layers, and a total of 131 neurons (43 per layer) was selected as the main research hyperparameter combination for all follow-up models. This was also the lowest memory combination, which served as another criterion in favour of this selection.

With a new set of hyperparameters of 43 neurons per three layers, each model was adapted, as well as an equivalent number of floating point operations per second (FLOPs), which used each model to provide a single output; this is reported in Table 7. The impact of the optimiser was not included in the calculation, nor was the complexity of the training process, due to the significant impact of testing hardware. However, the table was sorted in descending order based on the relative training speed over a single sample, considering the optimiser algorithms’ number of operations, as discussed in Section 2.

4.2. Model Results Overview

The results are presented for all tests on five included models. All the results are illustrated in Figure 14, Figure 15, Figure 16, Figure 17 and Figure 18. The total errors for each model are summarised in Table 8. In total, 150 models were produced, recorded, and evaluated to meet all methodology requirements.

Based on an observation of the table, figures, and the overall results, it can be concluded that Models 1–4 achieved great results considering the complexity of the given task and the quantity of input data, with training accuracies between 1.58% and 3.37%. All four models showed steady or converging training and testing curves on their dataset at the history plots in subfigures a, d and g for DST, US06, and FUDS, respectively. Model 5 showed the worst results, being the simplest and least common due to its simplicity in implementation and resulting lack of efficiency. With average errors ranging from 3.08% to 4.78%, the history curve showed a clear divergence from the testing curve. Due to the minimal training fitting rate, the model could not achieve a relatively similar accuracy on the training dataset compared to other models. Overall, of all cases that managed to reach the optimum point, the FUDS dataset showed the best results in capturing the complex behaviour, with the best being achieved by Model 4, utilising a robust Adam dataset. In contrast, the DST-based model showed great results in capturing the behaviour of other datasets. The error variance between training and testing results in the DST case was generally within 0.05–1.31% for US06 and 0.51–1.68% for FUDS, as opposed to FUDS, which was 3.08–4.16% for DST and 1.31–2.45% for US06.

DST-trained Model 1 showed the best testing results, with an average training and testing accuracy of 2.97%. Model 3 showed the second-best results for the same profile, at 3.16%, with only a 0.19% difference, which can be considered meaningless given the number of attempts. Model 4 was the third-best, sharing similarities in implementation specifics with Model 2, with values of 3.60% and 4.07%, respectively. The same trend can be observed by following the training and testing plots, where Models 1, 3, 4, and 2 obtained 3.245, 3.7, 3.45, and 3.995 average percentage errors.

4.3. Observations and Discussion

Model 1 was based on the most simple and oldest research, conducted by Chemali et al. [11] in 2017. The original implementation utilised the simplest model structure, with a single layer and no complicated cell modifications or optimiser enhancements. While they utilised from one to three temperature ranges, the modification of the hyperparameter increased the model’s efficiency by 25% at five different environments, although they did not achieve equally good results due to the different methodologies and research goals. However, with a 2–3 times increased complexity in the input data, due to the charge and discharge training across five temperature ranges, the error only doubled, from Chemali et al.’s [11] 0.77–1.6% to Model 1’s 1.95–3.37%.

Model 2 was an attempt to move from an old LSTM- to a recently developed GRU-type of cell. Both Adam and Nadam optimisers by themselves ran for an epoch that was reduced by half on the GRU-cell-type compared to the LSTM cell but achieved similar accuracies. Embedding an additional optimiser with the same learning rate scheduling for pre- and fine-tuning doubled and tripled the training time. While Xiao et al. [14] achieved above 99.2%

R^{2}

for all their cases, Model 2 obtained a 1% lower score after being trained on both charge and discharge. However, based on individual temperature cases, the MAE for US06 was the lowest in comparison to other profiles, with 0.63%, while FUDS, based on Table 8, achieved only 2.72% This significant difference was justified by the doubled quantity of data and the utilisation of temperature variances, rather than exploring individual cases. However, the history plots in Figure 15 resembled the same pattern, indicating that the pre- and fine-tuned phases behaved similarly. Lowering the learning rate and switching the optimiser led to a much more stable learning curve but did not significantly improve the prediction results.

Similar to Model 1, Mamo and Wang [12] utilised a single temperature range with a single or two profiles of DST, US06, or FUDS. Their use of only the discharge cycle did not allow for a direct comparison between results or plots. However, their focus was on improving a simple LSTM to an LSTM-with-attention architecture, indicating an error reduction of 0.076–0.204% for three temperature ranges. Similar improvements could be observed between Models 1 and 3 regarding the testing data, except for DST. These were based on the overall performance, which was up to 0.4% better in fitting the other profiles, especially when measured using US06. A similar pattern could be observed between Figure 14 and Figure 16, with 0.04% and 0.14% for US06 and FUDS, respectively. The absolute error area fill showed no or fewer spike predictions on the validation data. The FUDS model became much smoother after training with the attention layer but still faced difficulties capturing a charge above 40% in the testing scenarios compared to the DST version, as in Figure 16i. Besides the noticeable difference between the final results, in comparison with Model 3 and the equivalent published version, the DST’s efficiency in capturing other profiles was the same.

Similarly to the second method, Model 4 applied a different optimiser. However, the additional complexity compared to other processing methods introduced a better overall accuracy, as per Table 8, similar to what was reported by Javid et al. [15]. Based on their results, the error between a simple NN and robust GRU was lowered by 1.06–3.12% for three individual temperature ranges. A similar pattern could be observed between Models 4 and 5, where the training error averages were in the range of 1.89–2.94%, as in Javid et al.’s [15] comparisons of NN with an unscented and robust Kalman filter, and Models 2 and 4 in this work. All three models experienced better a training convergence in a shorter time, as per Figure 17. The figures showed improved capture at the isolated case, but worse general capture than other models.

Model 5 implemented the most straightforward case to compare the efficiency of the standard Adam and robust variant optimisers. Even though this was based on the work of Jiao et al. [13], the purpose was to compare the results and validate that the methodology could reach the limit of 100 epochs without breaking. Their work was performed on a battery tester at ambient temperature, without reporting the batteries’ internal properties, which left little possibility for a comparison of the errors. Assuming a single temperature range within 25–30 degrees and discharge-only cycles, an accuracy error rate below 0.01% and a general error rate,

R^{2}

, above 99% was difficult to match. However, the ML models’ limitations and the areas of a state-of-charge curve that were harder to capture could be explored. Based on the SoC prediction’s absolute error’s area plots for DST and FUDS (Figure 18), with a charge above 30%, some degree between 90 and 80% of discharge and a discharge below 50% created the highest discrepancies in the prediction. Similar areas were observed on the previously mentioned models with a lower degree of inaccuracy. Those behaviours outlined the effect of temperature on the SoCs curves, which caused problems when attempting to characterise all models together. Considering the general divergence of the testing curve and the MAE of the training trend, the results approached 1%, as opposed to other models, which were generally below that limit. Therefore, it could be assumed that all attempts were less likely to overfit the given data.

Overall, the attention layer of Model 3 performed best in capturing complex behaviours such as the FUDS profile. In contrast, the DST usage was suitable for describing other driving behaviours as a universal solution. As per the research question of this work, DST-based Model 1 acted as a superior model as it had the lowest testing error and was simple and lightweight. However, the attention layer’s impact on improvements in LSTM-capturing characteristics is worth considering, in addition to providing room for further improvements. However, a combination of all four strategies, when used in every model, may lead to far better outcomes in terms of both capturing complex behaviours and generalising other driving profiles. One of the biggest challenges in bringing these results to a conclusion was coming up with a consistent methodology, which was appropriately reflected in both the accuracy and repeatability of each method. A testing methodology was developed with a determined set of hyperparameters: an average of 10 was taken for every model, and testing and training were carried out simultaneously on the different computational platforms (such as several GPUs and multicore CPU, through data splitting, to match the graphical-model-testing speed time), with multiple threads of different cycles. Then, the training and testing of the proposed models proceeded in a somewhat streamlined way, using a large number of models, with each containing multiple epochs of results and plots. Some of the biggest and most time-consuming challenges in this research were the implementation of nondocumented methods, such as the attention layer or Robust Adam, in the context of Tensorflow 2.3, the computational expense required for high-performance computers conducting 6–12 simultaneous model training sessions, and the storage in a ClickHouse SQL database for maximum access efficiency, and ensuring access to the linked tables to produce the final average numbers.

5. Conclusions

This work presented several implementations of machine learning algorithms for the state-of-charge estimation of A123 lithium-ion batteries. Several recursive neural network examples were selected from previously published models, based on the most common and promising structures and optimisers. Five models were investigated and implemented, and their performance was measured and cross-evaluated using three drive cycles at five battery temperature ranges from 20 to 50 °C. Half a dozen thousand samples per charge and discharge cycle profile were resampled to a 1 Hz rate. These were organised in 500-sample 3D arrays consisting of voltage, current, temperature, and the corresponding charge percentage. While the methods and methodology covered in this article can be used on other battery chemistries and have been covered in the literature, if the models are trained on one type and then tested on another, without new training with combined batteries, they will not produce the expected results by themselves. However, the evaluation and comparison of the models’ effectiveness were beyond this work’s scope. To adequately compare performance across models and comprehend the stochastic nature of machine learning, a set of hyperparameters was predetermined through a trial-and-error evaluation and multiple attempts at averaging. Due to the inclusion of a learning rate scheduler and rollback technique to justify early stopping, the training speed increased, and the probability of the model suffering from early overfitting was reduced.

After comparing 135 models of different sets of layers and neurons, the most accurate, lightweight and reasonable training time was found to be three layers, with 43 neurons each. Then, another 150 combinations (five models for three driving profiles, at ten times each) were processed through the same training, testing, and performance measurement procedures. This led to the conclusion that a DST-based simple LSTM network with Adam optimisers was the best model to capture itself and others. The next best model, which could almost match the same results and had better self-capturing abilities, was an LSTM network with an attention model. While the attention layer had a significant impact on capturing complex driving profiles such as FUDS, it failed to characterise the other two profiles. Both models were trained for almost the same number of epochs, going through multiple attempts at the learning rate reduction scheduler to achieve the lowest possible optimum. Although the error results were mostly commonly double their already published equivalents, with triple the quantity of data and an increased complexity when fitting both charge and discharge cycles, the increased error in the battery cycle prediction remained below 5%, and line-fitting accurately described their state-of-charge behaviour, especially at critical points of full charge and depletion.

Although most models provided excellent results, they lacked the accuracy of time series models observed in similar scenarios. The highest error regions were observed at the middle point of the charge, where the voltage of lithium-ion batteries remained at 3.3 V most of the time. With the SoC as a function of current, this behaviour could indicate that recurrent neural networks place more emphasis on the voltage feature. Model 1 was the best model for the generalisation of driving behaviour; this showed little room for improvement. In contrast, Model 3, with an extension to the structure, may provide a vital starting point for future research iterations of charge prediction models utilising the output feature as an input, as time series models tend to do in other scenarios. Overall, this work provides a comparative evaluation of several published methods when implemented under the same conditions, which has not been achieved to date. These results allow us to establish a methodology that can be used in further research. However, to overcome the weight distribution, further research is needed, using a four-feature-based model where the SoC acts as one of the input parameters, to develop improvements to the machine-learning-based state-of-charge estimation.

Author Contributions

Conceptualisation, M.S., S.H., M.B. and D.W.H.; methodology, M.S. and D.W.H.; software, M.S.; validation, M.S., D.W.H. and G.W.; formal analysis, M.S. and D.W.H.; investigation, M.S. and M.B.; resources, M.S. and S.H.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.S., D.W.H. and G.W.; visualisation, M.S.; supervision, D.W.H. and G.W.; project administration, D.W.H.; funding acquisition, D.W.H. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the Automotive Engineering Graduate Program (AEGP), grand number AEGP000036, from the Australian Government Department of Industry Science, Energy and Resources in cooperation with industry partner Prohelion.

Data Availability Statement

The lithium-ion battery cycling data presented in this study are openly available at the Battery Research Group of the Center for Advanced Life Cycle Engineering (CALCE) Group official repository website, which can be accessed using the following link: https://web.calce.umd.edu/batteries/data.htm, in the section on A123 batteries [19] (accessed on 20 March 2020).

Acknowledgments

All model evaluations were performed with the help of the QUT HDR research and technical staff, who arranged access to the a high-performance machine (HPC Lyra) for the extensive initial computations. Finally, we gratefully acknowledge the original developers of the machine learning framework Tensorflow, from Google, who wrote detailed guides and documentation for all necessary tools used throughout the investigation [37].

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

Adam	Adaptive moment estimation
Adamax	Adaptive moment estimation based on the infinity norm
EV	Electric vehicle (EV)
GRU	Gradient recurrent unit
Li-Ion	Lithium-ion battery
LSTM	Long short-term memory
MDPI	Multidisciplinary Digital Publishing Institute
Nadam	Nesterov’s adaptive moment estimation
RNNs	Recurrent neural networks
RoAdam	Robust online adaptive moment estimation
SoC	State of charge
SGDw/M	Stochastic gradient descent with momentum
TF	TensorFlow

References

Sievewright, B. State Of Electric Vehicles. 2019. Available online: https://electricvehiclecouncil.com.au/wp-content/uploads/2019/09/State-of-EVs-in-Australia-2019.pdf (accessed on 7 September 2019).
Hoffart, F. Proper care extends li-ion battery life. Power Electron. Technol. 2008, 34, 24–28. [Google Scholar]
Ali, M.U.; Zafar, A.; Nengroo, S.H.; Hussain, S.; Alvi, M.J.; Kim, H.J. Towards a Smarter Battery Management System for Electric Vehicle Applications: A Critical Review of Lithium-Ion Battery State of Charge Estimation. Energies 2019, 12, 446. [Google Scholar] [CrossRef]
Ng, K.S.; Moo, C.S.; Chen, Y.P.; Hsieh, Y.C. Enhanced coulomb counting method for estimating state-of-charge and state-of-health of lithium-ion batteries. Appl. Energy 2009, 86, 1506–1511. [Google Scholar] [CrossRef]
Yan, J.; Xu, G.; Qian, H.; Xu, Y. Robust State of Charge Estimation for Hybrid Electric Vehicles: Framework and Algorithms. Energies 2010, 3, 1654–1672. [Google Scholar] [CrossRef]
Juang, L.W.; Kollmeyer, P.J.; Zhao, R.; Jahns, T.M.; Lorenz, R.D. The impact of DC bias current on the modeling of lithium iron phosphate and lead-acid batteries observed using electrochemical impedance spectroscopy. In Proceedings of the 2014 IEEE Energy Conversion Congress and Exposition (ECCE), Pittsburgh, PA, USA, 14–18 September 2014; pp. 2575–2581. [Google Scholar] [CrossRef]
Malkhandi, S. Fuzzy logic-based learning system and estimation of state-of-charge of lead-acid battery. Eng. Appl. Artif. Intell. 2006, 19, 479–485. [Google Scholar] [CrossRef]
Hansen, T.; Wang, C.J. Support vector based battery state of charge estimator. J. Power Sources 2005, 141, 351–358. [Google Scholar] [CrossRef]
Anton, J.C.A.; Nieto, P.J.G.; de Cos Juez, F.J.; Lasheras, F.S.; Vega, M.G.; Gutierrez, M.N.R. Battery state-of-charge estimator using the SVM technique. Appl. Math. Model. 2013, 37, 6244–6253. [Google Scholar] [CrossRef]
Song, Y.; Li, L.; Peng, Y.; Liu, D. Lithium-Ion Battery Remaining Useful Life Prediction Based on GRU-RNN. In Proceedings of the 2018 12th International Conference on Reliability, Maintainability, and Safety (ICRMS), Shanghai, China, 17–19 October 2018; pp. 317–322. [Google Scholar] [CrossRef]
Chemali, E.; Kollmeyer, P.J.; Preindl, M.; Ahmed, R.; Emadi, A. Long Short-Term Memory Networks for Accurate State-of-Charge Estimation of Li-ion Batteries. IEEE Trans. Ind. Electron. 2018, 65, 6730–6739. [Google Scholar] [CrossRef]
Mamo, T.; Wang, F. Long Short-Term Memory With Attention Mechanism for State of Charge Estimation of Lithium-Ion Batteries. IEEE Access 2020, 8, 94140–94151. [Google Scholar] [CrossRef]
Jiao, M.; Wang, D.; Qiu, J. A GRU-RNN based momentum optimized algorithm for SOC estimation. J. Power Sources 2020, 459, 228051. [Google Scholar] [CrossRef]
Xiao, B.; Liu, Y.; Xiao, B. Accurate State-of-Charge Estimation Approach for Lithium-Ion Batteries by Gated Recurrent Unit With Ensemble Optimizer. IEEE Access 2019, 7, 54192–54202. [Google Scholar] [CrossRef]
Javid, G.; Basset, M.; Abdeslam, D.O. Adaptive Online Gated Recurrent Unit for Lithium-Ion Battery SOC Estimation. In Proceedings of the IECON 2020 the 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 3583–3587. [Google Scholar] [CrossRef]
Zhang, W.; Li, X.; Li, X. Deep Learning-Based Prognostic Approach for Lithium-ion Batteries with Adaptive Time-Series Prediction and On-Line Validation. Measurement 2020, 164, 108052. [Google Scholar] [CrossRef]
Smith, A.J.; Burns, J.C.; Dahn, J.R. A High Precision Study of the Coulombic Efficiency of Li-Ion Batteries. Electrochem.-Solid-State Lett. 2010, 13, A177. [Google Scholar] [CrossRef]
He, W.; Williard, N.; Chen, C.; Pecht, M. State of charge estimation for Li-ion batteries using neural network modeling and unscented Kalman filter-based error cancellation. Int. J. Electr. Power Energy Syst. 2014, 62, 783–791. [Google Scholar] [CrossRef]
CALCE Battery Research Group. 2017. Available online: https://web.calce.umd.edu/batteries/data.htm#A123 (accessed on 20 March 2020).
Amidi, A.; Amidi, S. CS 230—Recurrent Neural Networks Cheatsheet. 2018. Available online: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks (accessed on 11 June 2021).
Rasifaghihi, N. Predictive Analytics: LSTM, GRU and Bidirectional LSTM in TensorFlow; Towards Data Science: 2020. Available online: https://towardsdatascience.com/predictive-analysis-rnn-lstm-and-gru-to-predict-water-consumption-e6bb3c2b4b02 (accessed on 6 March 2021).
Hochreiter, S. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions. Int. J. Uncertain. Fuzziness-Knowl.-Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef]
Cho, K.; van Merrienboer, B.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
Li, C.; Xiao, F.; Fan, Y. An Approach to State of Charge Estimation of Lithium-Ion Batteries Based on Recurrent Neural Networks with Gated Recurrent Unit. Energies 2019, 12, 1592. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar] [CrossRef] [Green Version]
Winata, G.I.; Kampman, O.P.; Fung, P. Attention-Based LSTM for Psychological Stress Detection from Spoken Language Using Distant Supervision. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6204–6208. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Reddi, S.J.; Kale, S.; Kumar, S. On the Convergence of Adam and Beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar]
Wilson, A.C.; Roelofs, R.; Stern, M.; Srebro, N.; Recht, B. The Marginal Value of Adaptive Gradient Methods in Machine Learning. arXiv 2017, arXiv:1705.08292. [Google Scholar]
Dozat, T. Incorporating Nesterov Momentum into Adam. In Proceedings of the ICLR 2016 Workshop, San Juan, Puerto Rico, 2–4 May 2016; paper 107 review 10. p. 4. [Google Scholar]
Zhu, S.; Chollet, F. Recurrent Neural Networks (RNN) with Keras: Cross-Batch Statefulness|TensorFlow Core; TensorFlow: 2020. Available online: https://www.tensorflow.org/guide/keras/rnn#cross-batch_statefulness (accessed on 11 June 2021).
Xing, Y.; He, W.; Pecht, M.; Tsui, K.L. State of charge estimation of lithium-ion batteries using the open-circuit voltage at various ambient temperatures. Appl. Energy 2014, 113, 106–115. [Google Scholar] [CrossRef]
A123 Systems. Nanophosphate® High Power Lithium Ion Cell ANR26650M1-A. 2011. Available online: https://www.buya123products.com/uploads/vipcase/844c1bd8bdd1190ebb364d572bc1e6e7.pdf (accessed on 30 November 2021).
Chollet, F. Writing a Training Loop from Scratch |TensorFlow Core. 2020. Available online: https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch (accessed on 11 June 2021).
Goodfellow, I. Deep Learning; Adaptive Computation and Machine Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org (accessed on 11 June 2021). [CrossRef]

Figure 1. Universal structure of RNN for SoC estimation.

Figure 2. Structure of a gated recurrent unit (GRU) cell [24].

Figure 3. Structure of a long short-term memory unit cell [25].

Figure 4. Attention-based architecture.

Figure 5. SoC estimation accuracy of LSTM-RNN with various network depths in time obtained by Chemali et al. [11] in a plot representation.

Figure 6. Data windowing scheme at a 1 Hz sampling rate. For visualisation purposes, the s-step was 250 s, which was different from the actual implementation. The initial index i was kept as close to the beginning of the data as possible, at around zero.

Figure 7. Cell current of three battery testing profiles, emulating a constant-current–constant-voltage charge and regenerative discharge until cells reached top or bottom cutoffs. (a) Dynamic stress test (DST) of 20 repeated subcycles; (b) highway driving schedule (US06) of 12 repeated subcycles; (c) federal urban driving schedule (FUDS) of 6 repeated subcycles.

Figure 8. Three profiles’ cross-validated data, split for training, validation, and testing in a simplistic SoC cycle representation under different temperatures.

Figure 9. Accuracy plot demonstration.

Figure 10. Learning rate degradation.

Figure 11. Training process comparison with and without a recovery algorithm. (a) Model training process with no rollbacks; (b) model training process with a rollback if the current error is higher than the previous error.

Figure 12. History results averaging demonstration. (a) Single model history for training and testing; (b) All 10 attempts of histories for training and testing; (c) Average of 10 attempts of histories for training and testing.

Figure 13. State of charge results’ averaging demonstration. (a) Single-model SoC prediction for training; (b) all 10 attempts a SoC prediction on a single plot; (c) average of 10 attempts at SoC prediction for training.

Figure 14. Model 1: stateless LSTM. (a) Average training and testing MAE history; average of 10 attempts. (b) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (c) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (d) Average training and testing MAE history; average of 10 attempts. (e) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (f) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (g) Average training and testing MAE history; average of 10 attempts. (h) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (i) Testing on two cycles of DST and US06 profiles; average of 10 attempts.

Figure 15. Model 2: stateless GRU with ensemble. (a) Average training and testing MAE history; average of 10 attempts. (b) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (c) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (d) Average training and testing MAE history; average of 10 attempts. (e) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (f) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts; (g) Average training and testing MAE history; average of 10 attempts. (h) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (i) Testing on two cycles of DST and US06 profiles; average of 10 attempts.

Figure 16. Model 3: stateless LSTM with attention. (a) Average training and testing MAE history; average of 10 attempts. (b) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (c) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (d) Average training and testing MAE history; average of 10 attempts. (e) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (f) Testing on two cycles of DST and FUDS profiles; average of 10 attempts. (g) Average training and testing MAE history; average of 10 attempts. (h) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (i) Testing on two cycles of DST and US06 profiles; average of 10 attempts.

Figure 17. Model 4: Stateless GRU with Robust optimiser. (a) Average training and testing MAE history; average of 10 attempts. (b) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (c) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (d) Average training and testing MAE history; average of 10 attempts. (e) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (f) Testing on two cycles of DST and FUDS profiles; average of 10 attempts. (g) Average training and testing MAE history; average of 10 attempts. (h) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (i) Testing on two cycles of DST and US06 profiles; average of 10 attempts.

Figure 18. Model 5: stateless LSTM—simplest optimisers. (a) Average training and testing MAE history; average of 10 attempts with a higher Y-axis. b) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (c) Testing on two cycles of US06 and FUDS profiles; average of 10 attempts. (d) Average training and testing MAE history; average of 10 attempts. (e) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (f) Testing on two cycles of DST and FUDS profiles; average of 10 attempts. (g) Average training and testing MAE history; average of 10 attempts. (h) Validation on a single cycle of SoC estimation; average of 10 attempts at 25 °C. (i) Testing on two cycles of DST and US06 profiles; average of 10 attempts.

Table 1. Summary of evaluated papers’ implementation. The model type highlights a primary path to structuring a neural network. Statefulness defines the input method, whereas stateless models use a fixed size of input samples per feature and statefully apply each time-sample individually, in batches. Optimisers are defined using adaptive moment estimation (Adam), Nesterov adaptive moment estimation (Nadam), Stochastic gradient descent (SGD), AdaMax (AM) and Differential Evolution (DE).

Reference Source	Model		State		Optimiser					Extension
Reference Source	GRU	LSTM	-less	-ful	Adam	Nadam	SGD	AdaMax	DE ¹	Extension
Song et al. [10]	✔			✔	✔					4 Layers
Chemali et al. [11]		✔	✔		✔
Mamo and Wang [12]		✔	✔						✔	Attention
Jiao et al. [13]	✔			✔			✔			Momentum
Xiao et al. [14]	✔			✔		✔		✔		Ensemble
Javid et al. [15]	✔		✔		✔					Robust
Zhang et al. [16]		✔	✔			✔				Online

¹ Differential evolution.

Table 2. Five ML models were tested. Type highlights the RNN structure used in the cells. Optimiser was based on the derivative calculation algorithms only.

#	Type	Optimiser
1	LSTM	Adam
2	GRU	Ensemble
		(Nadam and AdaMax)
3	LSTM + Attention	Adam
4	GRU	RoAdam
5	LSTM	SGDw/M

Table 3. Battery characteristics.

Brand Name	Cell Chemistry	Cell Type	Battery Weight	Nominal	Charge/Discharge Cut-Off Voltage
A123	$L i F e P O_{4}$	ANR26650	70 g	2.3 Ah	3.6 V
(2012)		M1-A	±2 g		2.0 V

Table 4. Model’s metrics functions.

Function Name	Equation
Mean absolute error	$\frac{1}{N} \sum_{t = 1}^{N} \| S o C_{t} - \hat{S o C_{t}} \|$
Root-mean-square error	$\sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(S o C_{t} - \hat{S o C_{t}})}^{2}}$
$R^{2}$ : $M_{S o C}$ is the mean SoC	$1 - \frac{\sum_{t = 1}^{N} {(S o C_{t} - \hat{S o C_{t}})}^{2}}{\sum_{t = 1}^{N} {(S o C_{t} - M_{S o C})}^{2}}$

Table 5. Optimiser’s Hyperparameters.

$α$	$β_{1}$	$β_{2}$	$β_{3}$	$ϵ$
Linear
scheduler	$0.9$	$0.999$	$0.999$	$10^{- 8}$
(0.001–0.0001)

Table 6. Hyperparameters’ selection, sorted by average MAE.

Index	N/L	Size (MB)	Time (s)	∠ Inclination	avg. MAE
10	131/3	0.17	2112.38	converges	2.5137
11	262/3	0.63	2304.04	converges	2.8515
6	262/2	0.85	1670.61	converges	2.8789
5	131/2	0.22	1429.47	converges	3.0303
7	524/2	3.33	1990.49	diverges	3.0303
⋮	⋮	⋮	⋮	⋮	⋮

Table 7. A number of floating point operations per second (FLOPs) for each model, with 3 layers and 131 neurons (43 per layer), and hyperparameters. The optimiser column serves as a reference.

#	Type	FLOPs	Optimiser
2	GRU	102,127	Ensemble
			(Nadam and AdaMax)
4	GRU	102,127	RoAdam
5	LSTM	120,494	SGDw/M
1	LSTM	120,494	Adam
3	LSTM + Attention	207,450	Adam

Table 8. Accuracy results summary for entire training datasets.

#	Trained	Tested
		DST			US06			FUDS
		MSE (%)	RMSE (%)	$R^{2}$ (%)	MSE (%)	RMSE (%)	$R^{2}$ (%)	MSE (%)	RMSE (%)	$R^{2}$ (%)
	DST	2.77	3.52	98.71	2.86	3.93	98.34	3.28	4.62	97.66
1	US06	5.97	7.97	93.39	3.37	4.14	98.15	5.38	6.93	94.73
	FUDS	5.03	7.26	94.51	4.02	6.07	96.04	1.95	2.85	99.11
	DST	3.08	3.89	98.42	4.39	5.97	96.17	4.76	6.36	95.57
2	US06	6.38	8.72	92.07	3.45	4.13	98.16	5.67	7.07	94.53
	FUDS	6.74	8.66	92.18	4.03	5.87	96.30	2.72	3.67	98.52
	DST	2.86	3.60	98.65	2.91	3.79	98.46	3.73	5.18	97.06
3	US06	5.98	8.26	92.90	3.35	4.11	98.19	5.27	6.84	94.87
	FUDS	5.33	7.25	94.53	3.61	5.53	96.71	1.82	2.51	99.31
	DST	2.89	3.61	98.65	3.82	5.38	96.88	4.11	5.51	96.67
4	US06	6.19	8.57	92.35	3.30	4.12	98.17	5.42	6.82	94.91
	FUDS	5.74	7.49	94.16	4.03	5.75	96.44	1.58	2.28	99.43
	DST	4.78	9.23	91.12	5.03	7.54	93.89	4.05	7.01	94.62
5	US06	5.93	7.92	93.46	3.08	4.42	97.90	3.02	3.84	98.39
	FUDS	7.26	10.82	87.79	5.17	8.64	91.96	4.52	6.82	94.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sadykov, M.; Haines, S.; Broadmeadow, M.; Walker, G.; Holmes, D.W. Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles. Energies 2023, 16, 1628. https://doi.org/10.3390/en16041628

AMA Style

Sadykov M, Haines S, Broadmeadow M, Walker G, Holmes DW. Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles. Energies. 2023; 16(4):1628. https://doi.org/10.3390/en16041628

Chicago/Turabian Style

Sadykov, Marat, Sam Haines, Mark Broadmeadow, Geoff Walker, and David William Holmes. 2023. "Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles" Energies 16, no. 4: 1628. https://doi.org/10.3390/en16041628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Practical Evaluation of Lithium-Ion Battery State-of-Charge Estimation Using Time-Series Machine Learning for Electric Vehicles

Abstract

1. Introduction

2. Preliminary Algorithmic Evaluation

2.1. Model Structure and Implementation

2.1.1. Gated Recurrent Unit (GRU)-Based Models

2.1.2. Long-Short Term Memory (LSTM)-Based Models

2.1.3. LSTM with Attention Layer

2.2. Optimisers

2.2.1. Classic and Momentum Stochastic Gradient Descent Algorithms

2.2.2. Classic and Robust Online Adaptive Moment Estimation

2.2.3. Ensemble Optimisation with Nesterov’s Momentum Adam and AdaMax

2.3. Dataset Description and Generator

3. Evaluation Methodology

3.1. Battery Data for Training and Validation

3.2. Model Training, Validation, and Testing Metrics Functions

3.3. Hyperparameters Selection

3.4. Results Averaging

4. Performance and Results

4.1. Optimisation of Layers and Neurons

4.2. Model Results Overview

4.3. Observations and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI