Multiple CNN Variants and Ensemble Learning for Sunspot Group Classification by Magnetic Type

Rongxin Tang; Xunwen Zeng; Zhou Chen; Wenti Liao; Jingsong Wang; Bingxian Luo; Yanhong Chen; Yanmei Cui; Meng Zhou; Xiaohua Deng; Haimeng Li; Kai Yuan; Sheng Hong; Zhiping Wu

doi:10.3847/1538-4365/ac249f

1. Introduction

During the continuous exploration of the Sun, many eruptive solar events have been discovered, such as solar flares and coronal mass ejections, which can cause changes in space weather. According to Baker (2002) and Li et al. (2020), severe space weather disturbances cause harm to many systems, such as satellites, power facilities, and ground short-wave radio communication systems, resulting in massive economic losses. Accurate space weather forecasts allow us to prepare in advance to weaken the impact of severe space weather on humans' social, economic, and industrial activities. Solar flares are a kind of intense solar activity that can cause the release of large amounts of high-energy radiation plus other phenomena, mainly erupting in the atmosphere above sunspots. Relevant studies (Lee et al. 2012; McCloskey et al. 2016; Eren et al. 2017) have shown that solar flares are related to the area, number, McIntosh type, and magnetic-type characteristics of sunspot groups. Therefore, real-time identification of the type of sunspot clusters is an effective method to predict the outbreak of flares.

Solar physicists have proposed many classification schemes for sunspot groups. Cortie (1901) established a sunspot classification scheme based on the shape of sunspot groups. The Zurich sunspot group classification scheme was later developed by Waldmeier (1954) by improving Cortie's scheme. Unfortunately, the correlation between the previous classification scheme and solar activity parameters, i.e., solar flares, was not sufficiently effective for the prediction goal, and it was modified so as to find a new classification scheme that could improve the flare prediction. The McIntosh (1990) classification scheme is based on three parameters, through which the correlation between sunspot groups and other solar activities can be better characterized. In addition, the magnetic Mount Wilson sunspot group classification scheme is also commonly used. The classification of sunspot groups in this paper is simplified based on the Mount Wilson classification scheme.

In recent years, an increasing number of space missions have led to a rapid accumulation of solar activity data. With the development of automatic sunspot recognition technology, an increasing number of studies are trying to apply machine-learning methods to the automatic classification of sunspot magnetic types. Nguyen et al. (2004) used two data-mining tools, WEKA and RSES, and some data-mining algorithms, on solar image data obtained by the Michelson Doppler Imager (MDI) instrument of NASA's Solar and Heliospheric Observatory satellite so that the machine learned seven types of sunspots and improved the Zurich classification scheme. Zharkova et al. (2005) summarized the existing manual and semiautomatic feature recognition techniques for multiple solar features, including sunspots. Colak & Qahwaji (2008) proposed a fast hybrid system that uses active-area data extracted from MDI magnetogram images to automatically detect sunspot groups on the MDI continuum image and perform the McIntosh scheme classification of sunspot groups. Abd et al. (2010) used a support-vector machine as an effective classification tool to realize the automatic Zurich classification of sunspot groups on full-disk white-light solar images. Stenning et al. (2012) used the image features of sunspot groups extracted by image morphology technology in an automatic classification scheme. Fang et al. (2019) built three types of convolutional neural networks (CNNs) to identify the predefined magnetic types of sunspot groups. This classification scheme for the predefined type is modified based on the Mount Wilson classification scheme. Deep-learning technology has developed rapidly in the field of image recognition. Those technologies have obvious advantages in image processing and analysis. Since deep learning can automatically learn many different kinds of image features from big data sets, integrating these features can achieve a better classification of sunspot groups in the image. In many projects, artificial intelligence (AI) can be used to approximate or vastly exceed human ability. In addition, many AI data-processing methods have reached the level of real-time processing. For example, in tasks of classifying images, AI can process hundreds of frames of images per second, which is significantly faster than humans and more accurate.

In this paper, an artificial neural network is used as the main AI method. First, an artificial neural network with different structures and different numbers of layers is constructed to classify the sunspot subgroups, and then extreme gradient boosting (XGBoost) is used to fuse the results of multiple models.

2. Data

2.1. Data Set

In this paper, the data set of sunspot group magnetic types was provided by the Space Environment Warning and AI Technology Interdisciplinary Innovation Working Group (Fang et al. 2019). The classification scheme of this data set is based on the Mount Wilson sunspot magnetic classification system, which classifies the magnetic characteristics of sunspots according to the rules of the Mount Wilson Observatory in California.⁸ The data set provided by Fang et al. was from 2010 May 1 to 2017 December 12. This data set contains continuous image data and magnetogram image data, and their corresponding magnetic type. The continuous image data and magnetogram image data in this data set come from Spaceweather HMI Active Region Patch (SHARP) data provided by the Solar Dynamics Observatory's Helioseismic and Magnetic Imager (HMI; Pesnell et al. 2012; Scherrer et al. 2012). SHARPs are calculated every 12 minutes. However, in order to ensure sufficient changes between every neighboring image, the data set collects SHARP data every 96 minutes (Fang et al. 2019). The continuous image data and magnetogram image data of a sunspot group at the same time correspond to the same area of the Sun.

According to the statistics of Fang et al. (2019), from 2010 May to 2017 May, three Mount Wilson sunspot magnetic types (i.e., Gamma, Delta and Gamma–Delta) were not reported in the solar region summary (SRS) text file⁹ (Fang et al. 2019), and were compared to Alpha and Beta. These three types occur very rarely. To avoid an extremely imbalanced data distribution, this data set combines Beta–Gamma, Beta–Delta, and Beta–Gamma–Delta sunspots into Beta-x. Their distributions are shown in Table 1.

Table 1. Simplified Mount Wilson Sunspot Magnetic Classification Scheme in the Data Set

Mount Wilson Sunspot Magnetic Type	Type in the Data Set	Total	Training Set and Validation Set	Test Set
		2010/5/1–2017/12/12	2010/5/1–2016/1/1	2016/1/2–2017/12/12
Alpha	Alpha	5276	4709	567
Beta	Beta	7849	7353	496
Beta–Gamma	Beta-x	1759	Beta-x: 2407	Beta-x: 109
Beta–Delta	Beta-x	48
Beta–Gamma–Delta	Beta-x	709

Note. This data set can be obtained through the following link: https://tianchi.aliyun.com/dataset/dataDetail?dataId=74779.

Download table as: ASCII Typeset image

The description of the magnetic types of Mount Wilson sunspots is shown below.

(i)
Alpha: a unipolar sunspot group.
(ii)
Beta: a sunspot group having both positive and negative magnetic polarities (bipolar), with a simple and distinct division between the polarities.
(iii)
Gamma: a complex active region in which the positive and negative polarities are so irregularly distributed that classification as a bipolar group is prevented.
(iv)
Delta: a qualifier to magnetic classes (see below) indicating that umbrae separated by less than 2° within one penumbra have opposite polarities.a sunspot group of general Beta magnetic classification but containing one (or more) Delta spot(s).
(v)
Beta–Gamma: a sunspot group that is bipolar but sufficiently complex such that no single, continuous line can be drawn between spots of opposite polarities.
(vi)
Beta–Delta: a sunspot group of general Beta magnetic classification but containing one (or more) Delta spot(s).
(vii)
Beta–Gamma–Delta: a sunspot group of Beta–Gamma magnetic classification but containing one (or more) Delta spot(s).
(viii)
Gamma–Delta: a Gamma magnetic classification sunspot group but containing one (or more) Delta spot(s).

2.2. Data Augmentation

It is well known that the performance of deep-learning algorithms would be improved if the amount of data increases. Sun et al. (2017) used deep-learning technology to classify 300 million images, and they found that the model performance increased logarithmically as the amount of training data increased. However, the amount of solar observation data is usually very limited for deep learning. In this data set, if the data augmentation technology is not used, our models will overfit the training set data quickly, resulting in our models being unable to learn effective knowledge from the training set, and its performance on the validation set is very poor. Thus, data augmentation is an important technique for solving this problem.

Data augmentation is a regularization technique that is mainly applied to training data. It uses a series of methods, e.g., translation, rotation, and cropping, to randomly change the training data (Figure 1). The detailed transformation amplitude of the data augmentation needs to be designed according to the data for the specific application. Note that applying these simple transformations do not change the label of the input image. Each image obtained through augmentation can be considered a "new" image. In this way, we can continuously provide new training samples to the model so that the model can learn more discriminative and generalized features.

**Figure 1.** Data augmentation techniques for the sunspot group data.
Download figure:
Standard image High-resolution image

In this paper, two data augmentation techniques are adopted. The first technique is image rotation, which is an effective way to enhance the continuum and magnetogram images in the data set. In Figure 1(a), the upper row shows continuum images, and the lower row shows magnetogram images. We obtain many kinds of image data by random rotation. The second data augmentation technique is image cropping, which is used to extract the features of sunspot groups. The active region of the sunspot groups needs to be further cropped according to continuum images. The sunspot subgroup region is simply cropped by comparing the absolute value of the difference between the minimum in rows and columns of continuum images and a certain threshold. In this way, we can filter out plenty of noise from solar active regions in this classification task. As shown in Figure 1(b), the original continuum images on the left can be transformed into the main target selected on the right by this processing technique. Through these operations, the data of the same group of sunspots input to the neural network is different each time.

2.3. Data Standardization

Before applying the neural network, all the continuum and magnetogram images were resized to 64 × 64 pixels and 128 × 128 pixels, and the continuous image data were standardized based on MinMaxScaler, which was standardized according to the following formula:

$\begin{eqnarray}&&{c}_{\mathrm{out}}=1-\displaystyle \frac{{c}_{i}-\min }{\max -\min }.\end{eqnarray} \tag{ 1 }$

In Equation (1), c_i represents the original input data of the continuum; c_out represents the normalized continuum data of the continuum; and max and min represent the maximum and minimum values of the current sunspot group continuum data. The normalized result is between 0 and 1. This method has the advantage of bringing the background outside the sunspot closer to 0 rather than 1.

The magnetogram image data are standardized according to the following formula:

$\begin{eqnarray}&&{m}_{\mathrm{out}}=\mathrm{HardTanh}\left(\displaystyle \frac{{m}_{\mathrm{in}}}{\max }\right).\end{eqnarray} \tag{ 2 }$

HardTanh is an activation function commonly used in machine learning:

$\begin{eqnarray}&&\mathrm{HardTanh}(x)=\left\{\begin{array}{l}1,x\ge =1\\ x,-1\lt x\lt 1\\ -1,x\le =-1\end{array}\right.,\end{eqnarray} \tag{ 3 }$

where m_in and m_out represent the original pixel value and processed pixel values of the magnetogram image data, respectively. The value of max takes a fixed value of 800, which is identical to that in the paper by Fang et al. (2019). The normalized result is between −1 and 1. This approach has the advantage of preserving the positive and negative properties of the original data. Moreover, it can make the image features more obvious.

According to McCloskey et al. (2016), a sunspot group does not evolve quickly within 24 hr, and NOAA Space Weather Prediction Center SRS data is only released once a day. Considering that the time interval between the two files of this observation data set is only 96 minutes, the data from two consecutive samples of the same sunspot group are very similar. Therefore, we divide the data set according to time instead of simply using random division. We use 1172 files after 2016 January 1 to form the test set, and the remaining data set is divided into five parts according to the time distribution, one of which is used as the validation set. In this way, fivefold cross validation can be used to adjust model parameters and the final model can be evaluated objectively.

3. Methods

3.1. Independent Classification Algorithm

In this paper, a neural network based mainly on a CNN is built, and residual network (ResNet) modules or depthwise separable convolution modules are added. The classification of sunspot groups was predicted by constructing different types of neural networks with different numbers of layers. The schematics of the structure of the independent classification model is shown in Figure 2(a).

**Figure 2.** Schematics of the structure of the independent model and its components. In part (a), we use CNN, ResNet, or Depthwise separable convolution one to three times to achieve feature extraction (① ∼ ⑤). The blue and green in the feature map represent the continuum image feature maps and the magnetogram feature maps, and the red represents the combined features of the continuum and magnetogram.
Download figure:
Standard image High-resolution image

3.1.1. CNNs

Lenet (Lecun et al. 1998), Alexnet (Krizhevsky et al. 2017), and VGG (Simonyan & Zisserman 2014) are several classic CNNs. A complete CNN usually mainly includes the following layers:

1. Convolutional layer. The convolutional layer is the key to distinguishing a CNN from a deep neural network. The emergence of convolutional layers makes neural networks more effective in extracting information such as pictures. The emergence of convolutional layers makes it easier for neural networks to extract local features, and these local features are difficult for humans to construct manually. The convolution operation in the convolution layer uses the cross correlation operation in practical applications, which can reduce the operation of flipping the convolution kernel without affecting the application effect of the convolution layer. The operation of the convolutional layer is shown in Figure 2(b).

2. Nonlinear layer. The nonlinear layer is completed by a nonlinear activation function. If the neural network does not introduce a nonlinear part, then no matter how many layers there are in the neural network, it is doing linear operations only, and the final result is a linear combination of inputs. As a result, the neural network will be unable to fit the nonlinear function. Because most of our mapping from input to output is nonlinear, it is essential for us to add a nonlinear layer to the neural network. The activation function commonly used in CNNs is rectified linear units (ReLU). Its expression is:

$\begin{eqnarray}&&\mathrm{ReLU}(x)=\max (x,0)=\left\{\begin{array}{c}x,\ x\gt 0\\ 0,\ x\leqslant 0\end{array}\right.,\end{eqnarray} \tag{ 4 }$

where x represents the input data of the function. ReLU is simple to implement, but it has a shortcoming that when x is negative, the derivative is equal to zero, making the parameters of the neural network model unable to update. The emergence of LeakyReLU overcomes this shortcoming. Its expression is:

$\begin{eqnarray}&&\mathrm{LeakyReLU}(x)=\left\{\begin{array}{c}x,\ x\gt 0\\ \alpha x,\ x\leqslant 0\end{array}\right..\end{eqnarray} \tag{ 5 }$

Among them, α is a small hyperparameter. In this paper, LeakyReLU is used as the activation function, and α is set to 0.01.

3. Pooling layer. The pooling layer performs downsampling operations to reduce the feature space of the feature map and retain the important information extracted by the convolutional layer. The main pooling operations include maximum pooling, average pooling, and so on. The pooling layer increases the translation invariance of the CNN and improves the generalization ability of the model.

4. Fully connected layer. The fully connected layer connects the feature map data extracted by all neurons in the previous layer. The fully connected layer completes the conversion to tags by integrating all the extracted features. For classification tasks, finally, the neural network can be output by the softmax function to infer the category probability.

CNNs combined with the above structures can better capture the spatial information of data, so the basic structure of CNNs can often be seen in the tasks of spatial information feature processing. For example, in the task of completing the spatial distribution data of the total electron content in the ionosphere, the model used by Chen et al. (2019) includes these structures. In this experiment, a neural network with two frames of single-channel image data is designed. Actually, 3D convolution has one more dimension than 2D convolution, and it can represent richer information semantically. We choose to use 3D convolution because we want to treat it as two pictures semantically, rather than two channels of one picture. In addition, the use of 3D convolution can easily control whether the continuum and magnetogram feature maps need to be merged.

3.1.2. ResNet

A deeper network normally has a stronger representation ability. But as shown by He et al. (2016), the deeper the network is, the more difficult it is to optimize, which leads to degradation problems, that is, as the number of network layers increases, the training set loss gradually decreases, and then tends to saturation; when you increase the depth of the network, the training set loss will increase instead. Note that this is not overfitting, because the training loss is always reduced during overfitting. The ResNet designed by He et al. (2016), also known as the residual network, eliminates this problem. It mainly comprises a residual block with a shortcut connection, which is equivalent to adding a direct connection channel to the network. In such a case the network has a stronger identity-mapping ability, expanded network depth, and improved performance. With the addition of direct mapping in ResNet, the neural network changes from direct fitting H(x) to fitting a residual function F(x) = H(x) − x. Although in theory both can get an approximate fit, the latter is easier to learn, thus avoiding the degradation problem. The residual block structure is shown in Figure 2(c).

3.1.3. Depthwise Separable Convolution

Depthwise separable convolution is another very effective module in neural networks. Compared with conventional convolutions, it can reduce parameters and improve the speed to a certain extent. Separable convolution is described in Xception by Chollet et al. (2017) and MobileNet by Howard et al. (2017). Its core idea is to decompose a complete convolution operation into two steps, namely depthwise convolution and pointwise convolution. In the depthwise convolution operation, convolution is applied to a single channel at a time, instead of convolving all M channels like standard CNN. In the pointwise convolution operation, the operation of a 1 × 1 convolution kernel is applied on M channels. The depthwise separable structure is shown in Figure 2(d).

3.2. Ensemble-learning Fusion Method

Data fusion methods can comprehensively use multiple classification results. The classification method based on result fusion has better robustness than that of a single model. In model fusion, compared with a simple voting system, the decision tree can automatically learn the importance of each input. The gradient boosting decision tree (GBDT) is an iterative decision tree algorithm that consists of multiple decision trees, and the conclusions of all the trees are added together to produce the final result. XGBoost, designed by Chen & Guestrin (2016), can be regarded as an efficient implementation of a GBDT and is an ensemble-learning algorithm proposed for large-scale training. Based on some problems in decision tree learning, many optimization methods have been proposed. For example, XGBoost introduces regularization terms to control the complexity of the tree, and applies the Taylor expansion to the objective function in the iterative process, which dramatically accelerates the optimization of the model. Therefore, comparing with the previous method, the entire process of model training is accelerated with fewer resources and a faster speed. XGBoost is often highly ranked in results in competitions such as Kaggle. Therefore, we use XGBoost to fuse the multiple results obtained by the above neural networks. To preserve as many of the features extracted by each model as possible, we do not directly use the final result of each model as input into XGBoost. Instead, we input the resulting data before softmax classification in the neural network into XGBoost.

The algorithm flow of the fusion model in this paper is shown in Figure 3.

3.3. Evaluation Metrics

To objectively evaluate the advantages and disadvantages of each independent model and the final ensemble model, it is necessary to establish a unified evaluation standard. There are four basic types of classification target statistics in classification tasks: true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs). These four statistics can be used to construct many useful performance indicators, such as the accuracy, recall, and precision:

$\begin{eqnarray}&&\mathrm{Accuracy}=\displaystyle \frac{{\rm{TP}}+{\rm{TN}}}{{\rm{TP}}+{\rm{FP}}+{\rm{TN}}+{\rm{FN}}}\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}&&\mathrm{Recall}=\displaystyle \frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FN}}}\end{eqnarray} \tag{ 7 }$

$\begin{eqnarray}&&\mathrm{Precision}=\displaystyle \frac{{\rm{TP}}}{{\rm{TP}}+{\rm{FP}}}.\end{eqnarray} \tag{ 8 }$

The accuracy is usually used to evaluate the global accuracy of the model. The recall is the proportion of all positive samples in the test set that are correctly identified as positive samples. The precision is the ratio of the TPs in the identified images to the total number of positive samples. To integrate the output results of the precision and recall, the F1 score is usually used. The value of the F1 score ranges from 0 to 1. The mathematical formula of F1 score is as follows:

$\begin{eqnarray}&&{F}_{1}=\displaystyle \frac{2\times \mathrm{Precision}\times \mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}.\end{eqnarray} \tag{ 9 }$

In the F1 score, 1 represents the best output of the model, and 0 represents the worst output of the model. In this experiment, we mainly focus on the F1 score of the classification result.

4. Results and Analysis

4.1. Independent Classification Model Results

After building the deep-learning model, each model is trained for 300 epochs on the training set. The Adam optimizer (Kingma & Ba 2014) was used as the optimizer of the neural network parameter. The Adam optimizer is an extension of stochastic gradient descent which can update the network weights more effectively. The initial learning rate is set to 0.001, and the learning rate is decayed to half of the previous rate every 30 steps. The advantage of this setting is that there is a relatively large learning rate at the beginning, and the neural network can quickly converge to a local optimal solution, speeding up the training of the neural network. After that, the model results are tested on a separate test set which does not temporally intersect the training set. Deeper neural network training is prone to gradient disappearance and gradient explosion. It will cause the trained model to fail to update the parameters of the network after a certain round of training or the model will classify all cases into the same category. In order to avoid those situations, the early stopping strategy is adopted. When the model is overfitted or the model is found to be invalid in the cross validation, the training would be stopped immediately, and the model would use the best performing parameters in the cross validation before stopping the training. Finally, the F1 score of each model in the test set is calculated and used as the final evaluation index of the model. The experimental results are shown in Table 2. The number in the "structure" column indicates the number of layers stacked in the neural networks.

Table 2. Fivefold Cross Validation Results

Model No.	Input Dimensions (pixels)	Structure	F1 Score Alpha	F1 Score Beta	F1 Score Beta-x	Accuracy
1	64 × 64	CNN-5	0.8387	0.8881	0.6928	0.8382
2	64 × 64	CNN-10	0.7996	0.8789	0.6357	0.8113
3	64 × 64	CNN-15	0.8157	0.8801	0.6693	0.8227
4	64 × 64	ResNet-5	0.8917	0.9066	0.7745	0.8997
6	64 × 64	ResNet-15	0.9115	0.8841	0.86	0.8911
7	64 × 64	Depthwise-5	0.8633	0.8906	0.6873	0.8214
8	64 × 64	Depthwise-10	0.893	0.8712	0.6295	0.8566
9	64 × 64	Depthwise-15	0.8494	0.8603	0.6495	0.809
10	64 × 64	ResNetDepthwise-15	0.9091	0.8858	0.8404	0.8707
11	128 × 128	CNN-5	0.8604	0.8947	0.6393	0.8869
12	128 × 128	CNN-10	0.8245	0.8918	0.6347	0.8406
13	128 × 128	CNN-15	0.8849	0.8683	0.7283	0.848
14	128 × 128	ResNet-5	0.9232	0.902	0.7361	0.8821
15	128 × 128	ResNet-10	0.9609	0.9117	0.7915	0.8815
16	128 × 128	ResNet-15	0.9385	0.9299	0.8069	0.9119
17	128 × 128	Depthwise-5	0.8851	0.8624	0.7581	0.8615
18	128 × 128	Depthwise-10	0.8911	0.8714	0.7354	0.8876
19	128 × 128	Depthwise-15	0.9157	0.8813	0.6921	0.861
20	128 × 128	ResNetDepthwise-15	0.9413	0.9011	0.8416	0.8954

Download table as: ASCII Typeset image

To obtain the optimal model, 20 single models with different parameters were built. The model algorithm is a key feature to identify each model. In our study, the ResNet exhibits excellent performance. The number of layers and the size of the training data are related to their performance, but a larger number of layers is not necessarily better. It can be seen from the above experimental results that the F1 score of ResNet-10 for the Alpha group is the highest (0.9609) with an input size of 128 × 128 pixels, the F1 score of ResNet-15 for the Beta group is the highest (0.9299) with an input size of 128 × 128 pixels, and the F1 score of ResNet-15 for the Beta-x group is the highest (0.86) with an input size of 64 × 64 pixels. The accuracy of ResNet-15 is the highest (0.9119) with an input size of 128 × 128 pixels.

4.2. Ensemble Model Results

To eliminate the interference of bad models on the model fusion, the top five models with various scores are selected from the above 20 models (model Nos. 4, 5, 6, 10, 14, 15, 16, 18, 19, and 20). Finally, XGBoost is used to merge the output data from these neural networks. The final results are shown in Table 3.

According to the final results, the accuracy of the Alpha group is 95.59%, the accuracy of the Beta group is 90.73%, and the accuracy of the Beta-x group is 82.57%. In general, the final ensemble model achieved good and stable recognition performance.

Table 3. XGBoost Ensemble Model Results

Label	Automatic Recognition Results			Evaluation Index
	Alpha	Beta	Beta-x	F1 Score	Accuracy
Alpha	542	25	0	0.9542	0.9559
Beta	26	450	20	0.91	0.9073
Beta-x	1	18	90	0.8219	0.8257
Total	569	493	110	...	0.9232

Download table as: ASCII Typeset image

5. Discussion

In the paper by Fang et al. (2019), model A uses HMI magnetogram data as an input, model B uses HMI continuous image data, and model C uses both magnetogram and continuum images as a two-channel input in the CNN. The accuracy of our model on the Beta class is higher than that of Fang et al. The overall accuracy, Alpha accuracy, and F1 score are not too far from the model of Fang et al. The comparison results are shown in Table 4.

Table 4. Comparison of Model Results

Model	F1 Score			Accuracy
	Alpha	Beta	Beta-x	Alpha	Beta	Beta-x	All
Fang-A	0.924	0.8123	0.8805	0.9575	0.7625	0.9025	0.8742
Fang-B	0.9657	0.9159	0.9445	0.985	0.885	0.9575	0.9425
Fang-C	0.9218	0.8364	0.9001	0.9575	0.7925	0.9125	0.8875
Ours	0.9542	0.91	0.8219	0.9559	0.9073	0.8257	0.9232

Download table as: ASCII Typeset image

The establishment of the data set in this article is based on the research of Fang et al. (2019), but our test set for evaluating model performance is different from Fang et al. The training set and the test set used by Fang et al. are randomly undersampled from each magnetic type in the entire data set. However, we divided the training set and the test set in chronological order, and these models were evaluated using fivefold cross validation in chronological order during training. Although the performance score of the our ensemble model is not significantly better than the performance score of Fang et al. (2019), the most appropriate validation techniques have been implemented here (i.e., chronological splitting and fivefold validation) which provides a more precise assessment of this model's performance.

Actually, it is not appropriate to divide the data set randomly. In the data set used in this paper, since most sunspot groups do not change much in 96 minutes, random undersampling of the entire data set will make the training set and the test set very similar because it is very likely that the two sets of adjacent time data of the same sunspot group are divided into training set and test set. In particular for Beta-x with the smallest number, due to the random undersampling almost all data in the test set can be found in the training set with a very high degree of similarity. In the case of randomly dividing the data set, the Beta with the largest number of samples has a smaller similarity between the training set and the test set, so Fang et al. (2019)'s models' performance in Beta is not as good as the other two types. In the early stage of our study, we also used the scheme of randomly dividing the data set, but we soon discovered that there was a serious overfitting problem in the model. Initially, we did not use data augmentation technology, and complex neural network models would quickly overfit the data in the entire training set. We did not find this problem in the randomly divided validation set. It was not until we applied the trained model to the latest sunspot group data that we found that it was very easy to overfit on this data set. Because it is easy to overfit on the data set, the correct data partitioning scheme, data augmentation technology, and cross validation are essential necessary for our model.

The performance of deep-learning algorithms is not only dependent on the number of layers and the size of the training data but is also determined by the extraction ability of the spatial features in different layers. Therefore, to effectively explore the differences among these deep-learning algorithms, the feature extraction results in different layers are shown in Figures 4 and 5, which can also help us to understand their identification processing mechanisms.

**Figure 4.** ResNet-5 vs. CNN-5 for identifying Beta-x sunspots (a1–a5 represent the continuum and magnetogram images processed by different layers of the algorithms).
Download figure:
Standard image High-resolution image

**Figure 5.** ResNet-5 vs. ResNet-15 for identifying Beta-x sunspots (a1 and b1 are similar to a1 and b1 in Figure 4).
Download figure:
Standard image High-resolution image

Since the structure of Beta-x sunspots is more complicated than the other two types, the recognition accuracy of Beta-x will be the worst. Among the results provided in Table 2, ResNet-15 achieved the best recognition accuracy for Beta-x sunspots. In Figure 4, the original input image contains a Beta-x sunspot, but CNN-5 misidentifies it as a Beta sunspot. This failure can be derived from the result of layer (a2) in CNN-5. The results from layer (a2) of CNN-5 are smoother than the results from layer (b2) of ResNet-5. A smoother result means more features are missing, which results from the degradation problem. This kind of gradient problem more likely occurs in original CNN algorithm. As shown in Table 2, CNN-5 has higher accuracy than CNN-10 and CNN-15 with more layers. This is the degradation problem described by He et al. (2016). In the process of adjusting parameters, we find that if the model does not use the ResNet structure, under the condition of a high learning rate, the deeper network is prone to an abnormal situation in the training process: the model will output constant results for any input. This may be caused by excessively large gradients, causing excessive parameter adjustments.

However, the ResNet structure introduces residual learning during training processing (He et al. 2016). It can effectively solve this problem and eliminate the gradient explosion phenomenon in the backpropagation of the neural network in the ResNet model. It is worth noting that the performance of the ResNet models does not always improve as the number of layers increases (see Table 2, models 4–6 and models 14–16). By replacing the ordinary convolutional layer with a depthwise separable convolutional layer (ResNetDepthwise-15), the performance for the Beta-x group is improved significantly. When the input size of the training data is increased from 64 × 64 to 128 × 128, the accuracy of most models has improved. In addition, the depthwise separable convolutional layer replaces the traditional convolution, which can reduce a large number of parameters. The smaller size of the model may lead to a faster recognition speed in parallel processing, which will be very meaningful for big data processing in solar research.

The amount of Beta-x training data is very limited and lacks temporal variability. Simply put, the sunspot group type is a dynamic evolution process rather than a static one. Yet the present model just recognizes an instantaneous image without considering the temporal variation. Thus, as the sunspot group is in the middle of the two types of changes (Figure 6), it is difficult for the final ensemble model to classify it into the correct type.

Figure 6 shows continuum and magnetogram images of a Beta-x sunspot group. Some small sunspots in the red box have reversed their magnetic polarity, which is important feature of the Beta-x sunspot group. But the model ignored these small sunspots and classified this sunspot group as a Beta one. This result may be caused by the pooling layers in the models. The pooling layers in the models increase the translation invariance of the local features, making the models insensitive to the local magnetic reversal of the sunspot group.

6. Conclusion

The goal of this research is to apply multiple deep-learning algorithms to construct a high-performing fusion model for sunspot group classification research. Four different neural networks with different numbers of layers and input training data amounts were applied to a data set of continuum and magnetogram images obtained in the period from 2010 May 1 to 2017 December 12. In total, 20 single models were built, and their recognition performances were tested on the three types of sunspot groups. Then, 10 single models were selected to construct a ensemble model by XGBoost.

The whole model construction process provides several useful references for related research. The main summary is as follows.

Since the structure of Beta-x sunspots is more complicated than those of the other two types of sunspots, it will cause a degradation problem in the classification model based on a traditional CNN and fail to recognize Beta-x sunspots. The ResNet model introduces residual learning to solve this problem and can build a qualified model for the Beta-x group. Adding a depthwise structure on the basis of the ResNet model can greatly reduce the number of model parameters. The smaller size of the model may yield a faster recognition speed, which will be very meaningful for big data processing in solar research.

In our study, the ResNet exhibits excellent performance. The number of layers and the size of the training data are related to their performance, but a larger number of layers is not necessarily better. Judging from the performance of these models in the three magnetic types, different models have their own advantages. Therefore, we use the ensemble-learning model to achieve the purpose of learning from each other. We found that XGBoost is an effective method for fusing multiple models, which is evidenced by the relatively balanced high scores in the three magnetic types. The final ensemble model mainly misclassified the two categories of Beta and Beta-x. The possible reason is that the data input into the network is compressed to a smaller size, and many details are lost. Since the simple structure of Alpha sunspots make them more distinguishable from the other two types, it is easier for the model to classify this type correctly. In the final classification results, Alpha and Beta-x sunspots are rarely confused by the models.

In the early stage of our study, without the data augmentation technology, the model would quickly overfit the training set, but it perform poorly in the test set. Therefore the application of data augmentation technology can improve the model's accuracy and help reduce overfitting. In addition, data augmentation can also increase the amount of data and reduce the large amount of manually labeled data required for deep learning. In machine learning, the more real training samples collected, the better. But when it is impossible to increase the real training samples, data augmentation can be used to overcome the limitations of small data sets.

In future work, we intend to use a larger input data size for training, because the input of 64 × 64 pixels or 128 × 128 pixels is too small and many details are lost, resulting in a low overall accuracy. In addition, the network structures of our models used in this paper are simple and similar. In future work, we will try to add an attention mechanism, random-grouping convolution, and other structures to enrich the network model. The pooling layer increases the translation invariance of local features, and this feature will have a negative effect on the classification results in this recognition process, so we will try to replace the pooling layer with other technologies. In addition, our models only infer the magnetic group classification of a sunspot based on the continuum and magnetogram images at a single moment. Therefore, our models do not consider the change in the magnetic type of the sunspot over time, which does not change greatly over a few hours. In such a case, it makes sense to modify the classification based on the data before or after a certain time. Thus, we can consider applying long short-term-memory-related algorithms.

The data set was provided by the Space Environment Warning and AI Technology Interdisciplinary Innovation Working Group (Fang et al. 2019). This research was funded by the National Natural Science Foundation of China (grant Nos. 41974183, 41604136, 41674144, 41774195 and 41974195).

Multiple CNN Variants and Ensemble Learning for Sunspot Group Classification by Magnetic Type

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction