With the recent advancement in digital technologies, the size of data sets has become too large in which traditional data processing and machine learning techniques are not able to cope with effectively [1, 2]. However, analyzing complex, high dimensional, and noise-contaminated data sets is a huge challenge, and it is crucial to develop novel algorithms that are able to summarize, classify, extract important information and convert them into an understandable form [3,4,5]. To undertake these problems, deep learning (DL) models have shown outstanding performances in the recent decade.

Deep learning (DL) has revolutionized the future of artificial intelligence (AI). It has solved many complex problems that existed in the AI community for many years. In fact, DL models are deeper variants of artificial neural networks (ANNs) with multiple layers, whether linear or non-linear. Each layer is connected to its lower and upper layers through different weights. The capability of DL models in learning hierarchical features from various types of data, e.g., numerical, image, text and audio, makes them powerful in solving recognition, regression, semi- supervised and unsupervised problems [6,7,8].

In recent years, various deep architectures with different learning paradigm are quickly introduced to develop machines that can perform similar to human or even better in different domains of application such as medical diagnosis, self-driving cars, natural language and image processing, and predictive forecasting [9]. To show some recent advances of deep learning to some extent, we select 14 papers from the articles accepted in this journal to organize this issue. Focusing on recent developments in DL architectures and their applications, we classify the articles in this issue into four categories: (1) deep architectures and conventional neural networks, (2) incremental learning, (3) recurrent neural networks, and (4) generative models and adversarial examples. In the following, we give a brief summary of each category and then individually introduce related articles.

1 Category 1: deep architectures and conventional neural networks

Deep neural network (DNN) [10] is one of the most common DL models that contains multiple layers of linear and non-linear operations. DNN is the extension of standard neural network with multiple hidden layers, which allows the model to learn more complex representations of the input data. In addition, convolutional neural network (CNN) is a variant of DNNs, which is inspired by the visual cortex of animals [11]. CNN usually contains three types of layers, including convolution, pooling, and fully connected layers. The convolution and pooling layers are added in the lower levels. The convolution layers generate a set of linear activations, which is followed by non-linear functions. In fact, the convolution layers apply some filters to reduce complexity of the input data [12]. Then, the pooling layers are used for down-sampling of the filtered results. The pooling layers manage to reduce the size of the activation maps by transferring them into a smaller matrix [13]. Therefore, pooling solves the over-fitting problem by reducing complexity [14]. The fully connected layers are located after the convolution and pooling layers, in order to learn more abstract representations of the input data. In the last layer, a loss function, e.g., a soft-max classifier, is used to map the input data to its corresponding output. CNN-based models have shown outstanding results in the areas of image processing and computer vision. This category contains four articles.

The paper “Combination of loss functions for deep text classification” authored by Hamideh Hajiabadi, Diego Molla-Aliod, Reza Monsefi and Hadi Sadoghi Yazdi, considers ensemble methods at the level of the objective function of a deep neural network. This paper proposed a novel objective function that is a linear combination of single losses and integrate the proposed objective function into a deep neural network. By doing so, the weights associated with the linear combination of losses are learned by back propagation during the training stage. The impact of the proposed ensemble loss function is studied on the state-of-the-art convolutional neural networks for text classification.

In the paper “A deep neural network-based recommendation algorithm using user and item basic data” authored by Jian-Wu Bi, Yang Liu and Zhi-Ping Fan, a new recommendation algorithm based on deep neural networks is proposed. The main idea of the algorithm is to build a regression model for predicting user ratings based on deep neural networks. To this end, based on user data and item data, a user feature matrix and an item feature matrix are respectively constructed using the four types of neural network layers, i.e., embedding layer (EL), convolution layer (CL), pooling layer (PL) and fully connected layer (FCL). Then, based on the obtained matrixes, a user-item feature matrix is further constructed using a FCL. On this basis, a regression model for predicting user ratings is trained to generate recommendation list.

The paper “A discriminative deep association learning for facial expression recognition” authored by Xing Jin, Wenyun Sun and Zhong Jin, proposed a novel discriminative deep association learning (DDAL) framework for facial expression recognition. In this work, the unlabeled data is used to train the DNNs with the labeled data simultaneously, in a multi-loss deep network based on association learning. In addition, the discrimination loss is utilized to ensure intra-class clustering and inter-class centers separating.

In the paper “A technical view on neural architecture search” authored by Yi-Qi Hu and Yang Yu, a review of recent advances in neural architecture search (NAS) from a technical point of view is provided. This paper drew a whole picture of NAS for readers including problem definition, basic search framework, key technique towards practice and promising future directions.

2 Category 2: incremental learning

Incremental learning refers to the condition of continuous model adaptation based on a constantly arriving input samples [15,16,17]. Unlike machine learning techniques with batch learning procedure that have to re-execute an iterative training procedure using both old and new samples, incremental learning techniques require to learn only new samples without re-learning preciously learned samples [18, 19]. Besides, incremental learning techniques are useful for training complex structure of DL models when the training samples are provided over time [20, 21]. This category contains two articles.

The paper “Cross-modal learning for material perception using deep extreme learning ma- chine”, authored by Wendong Zheng, Huaping Liu, Bowen Wang and Fuchun Sun, proposed a visual-tactile cross-modal retrieval framework to convey tactile information of surface material for perceptual estimation. In this paper, tactile information of a new unknown surface material is used to retrieve perceptually similar surface from an available surface visual sample set. Specifically, a deep cross-modal correlation learning method, which incorporates the high-level nonlinear representation of deep extreme learning machine and class-paired correlation learning of cluster canonical correlation analysis, is developed.

The paper “DeepCascade-WR: a cascading deep architecture based on weak results for time series prediction” authored by Chunyang Zhang, Qun Dai and Gang Song, considers the real- world time series predictions (TSPs) tasks. In this work, a cascading deep architecture based on weak results (DeepCascade-WR) is established, which possesses deep models marked capability of feature representation learning based on complex data. DeepCascade-WR possesses online learning ability and effectively avoids the retraining problem, owing to the property of OS-ELM. In addition, DeepCascade-WR naturally inherits some valuable virtues from ELM, including faster training speed, better generalization ability and the avoidance of being fallen into local optima.

3 Category 3: recurrent neural networks

Recurrent neural networks (RNNs) [22] have the deepest structures among the DL algorithms, which are able to map sequential input data to their output [23]. Unlike traditional DNNs, the nodes in each RNN layer are connected to each other. This self-connection enables RNNs to memorize information over time from a sequence of data. The long-short term memory (LSTM) [24] and gated recurrent units (GRU) [25] are two improved models of RNNs. Although RNNs are powerful, it is difficult to train a long-range sequence of data due to vanishing or exploding gradient problem [26]. To solve this issue, LSTM and GRU use gate units to decide what information to keep or remove from the previous state. RNN-based models have been widely applied to handle sequential learning problems. This category contains five articles.

The paper “DeepSite: bidirectional LSTM and CNN models for predicting DNAprotein binding” authored by Yongqing Zhang, Shaojie Qiao, Shengjie Ji and Yizhou Li, considers the prediction of DNA–protein binding sites in DNA sequence using DL methods. In this paper, DeepSite, which is the bidirectional long short-term memory (BLSTM) and CNN, is employed to capture the long-term dependencies between the sequence motifs in DNA.

The paper “Single image rain streaks removal: a review and an exploration” authored by Hong Wang, Qi Xie, Yichen Wu, Qian Zhao and Deyu Meng, provided a detailed review of single-image-based rain removal techniques in recent years. These techniques are categorized into: early filter-based, conventional prior-based, and recent deep learning-based approaches. In addition, inspired by the rationality of DL-based methods and insightful characteristics underlying rain shapes, a specific coarse-to-fine de-raining network architecture is built. This architecture is able to deliver the rain structures and progressively removes rain streaks from the input image, accordingly.

The paper “Learning deep hierarchical and temporal recurrent neural networks with residual learning” authored by Tehseen Zia, Assad Abbas, Usman Habib and Muhammad Sajid Khan, studies deep hierarchical and temporal structures in RNNs. The goal is to prove that approximating identity mapping is crucial for optimizing both hierarchical and temporal structures. In this regard, a framework, called hierarchical and temporal residual RNNs, is proposed to learn RNNs by approximating identity mappings across hierarchical and temporal structures.

The paper “Weighted multi-deep ranking supervised hashing for efficient image retrieval” authored by Jiayong Li, Wing W. Y. Ng, Xing Tian, Sam Kwong and Hui Wang, focuses on deep hashing networks for large-scale image retrieval. This paper proposed a weighted multi-deep ranking supervised hashing (WMDRH), which employs multiple weighted deep hash tables, to improve precision/recall without increasing space usage. A loss function that contains two terms: (1) the ranking pairwise loss and (2) the classification loss, is used to generate hash codes. The former one ensures to generate discriminative hash codes by penalizing more for the (dis)similar image pairs with (small)large Hamming distances, and the classification loss guarantees the hash codes to be effective for category prediction. Besides, multiple hash tables are integrated by assigning the appropriate weight to each table according to its mean average precision (MAP) score for image retrieval.

The paper “Pothole detection using location-aware convolutional neural networks” authored by Hanshen Chen, Minghai Yao and Qinlong Gu, proposed a new method based on location- aware convolutional neural networks to detect pothole in road images. The proposed method consists of two subnetworks: the first one employs a high-recall network model to find as many candidate regions as possible, and the second one performs classification on the candidates on which the network is expected to focus.

4 Category 4: generative models and adversarial examples

Generative models aim to generate new samples with some variations through learning distribution of the training samples [27]. Variational autoencoders (VAE) [28] and generative adversarial networks (GAN) [29] are two prominent members of generative models. DL models usually require large amount of labeled samples to learn their parameters. However, obtaining sufficient labeled samples in many practical applications is difficult and computationally expensive. To alleviate this problem, generative models can be used [30]. They can be used to solve recognition, semi-supervised learning, unsupervised feature learning, denoising tasks. Despite the great successes of DL models in solving many real-world problems, they can be easily fooled by adversarial examples [31]. This issue raises concerns in many fields such as safety or autonomous vehicles. Thus it is crucial to study the effects of adversarial examples on the performance of DL models. This category contains three articles.

In the paper “An adversarial non-volume preserving flow model with Boltzmann priors” authored by Jian Zhang, Shifei Ding and Weikuan Jia, an adversarial non-volume preserving flow model with Boltzmann priors (ANVP) for modeling complex high-dimensional densities is proposed. ANVP introduced an adversarial regularizer into the loss function to penalize the condition that places a high probability in regions where the training data distribution has a low density to generate sharper images.

The paper “Emotion recognition using multimodal deep learning in multiple psychophysiological signals and video” authored by Wang Zhongmin, Zhou Xiaoxiao, Wang Wenlang and Liang Chen, proposed an DL based approach to train several specialist networks to fuse the features of individual modalities. This approach includes a multimodal deep belief network (MDBN) and two bimodal deep belief network (BDBN). The MDBN is used to optimize and fuse unified psychophysiological features derived from the features of multiple psychophysiological signals, one DBBN to focus on representative visual features among the features of a video stream, and another DBBN to focus on the high multimodal features in the unified features obtained from two modalities.

The paper “Robustness to Adversarial Examples can be Improved with Overfitting” authored by Oscar Deniz, Noelia Vallez, Jesus Salido and Gloria Bueno, studies the effects of adversarial examples on the performance of DL methods. This paper, firstly, argued that the error in ad- versarial examples is caused by high bias, i.e. by regularization that has local negative effects, and then supported this idea by experiments in which the robustness to adversarial examples is measured with respect to the level of fitting to training samples.

In summary, this issue shows some recent advances in DL from a new angle to some extent. It includes fourteen articles belonging to four categories, among which four belong to the scope of deep architectures and conventional neural networks, two belong to the area of incremental learning, five belong to the scope of recurrent neural networks, and three belong to the field of generative models and adversarial examples. It aims to provide readers with some useful guidelines to know the recent developments in algorithm and application of DL, and to give a collection of DL articles for readers convenient to reference.