1 Introduction

Due to the Coronavirus pandemic, the streaming service such as Netflix has increased in the first quarter of 2020. Intrusion detection system for multimedia platforms can prevent the platform from being attacked. Modern communication infrastructure that includes the IP Multimedia Subsystem (IMS) and Voice over IP (VoIP) are suffering by attacks and unknown threaten [2]. The multimedia platform provides services to users through the Internet. With the advancement of technologies, cyber-attacks are renewing rapidly, and businesses have a higher risk of information security. A multimedia traffic classification scheme for intrusion detection systems is important [24]. Malicious Intrusion in the network is increasing every day. They are deliberately unauthorized, illegal attempts to access, manipulate, or take possession of an information system Network to render them unreliable or unusable. Intrusion Detection is the process of identifying various events occurring in a system/network and analyzing them for the possible presence of Intrusion and responding to malicious activities. Now Intrusion Detection has become the priority and an important task of Information Security administrators. A system deployed in a network is vulnerable to various attacks and needs to be protected against attacks [1]. Intrusion Detection System (IDS) was originally a software application. However, considering the real-time requirements, special equipment was developed to monitor the network for malicious activities or violations of policies. The management system of these devices collects any malicious activities, violations, security information and event logs and aggregates them into reports. Some IDS can respond when an intrusion is detected, so these are classified as intrusion prevention systems (IPS). The Intrusion Detection System (IDS) is designed to be monitoring and analyzing network traffic and events in the system to discover unauthorized access to computers in the network and play a vital role in protecting the organization’s security. The aim of an IDS is to protect the system from unauthorized access, so it collects information about a given network environment, removes redundant information, and makes decisions based on whether the activity is normal or intrusive. Researchers have used various approaches such as data mining, soft computing, Machine Learning, Statistical Techniques, Bayesian Techniques, Artificial Neural Networks, and Evolutionary Computing. Network Anomaly Detection has achieved the purpose of making decisions based on whether the activity is normal or intrusive. Machine learning [11, 32, 36] has some popular classifiers such as Naive Bayesian classifier, Decision Tree, and Support Vector Machine(SVM).Naive Bayesian classifiers uses the smallest classification error probability or the lowest average risk with a given cost. The SVM classifier is a supervised learning algorithm that classifies the data in binary form. Although we can find some survey of cloud-based network intrusion detection analysis, but the intrusion detection research by machine learning applied for multimedia platform is hardly found on the internet. The paper is focus on the apply the technology on multimedia platform.

Supervised learning is given a bunch of samples with attributes and categories. These categories are determined in advance. Then the classifier obtained through learning can give the correct classification to the newly appeared objects. SVM was proposed in 1964 [4]. The decision boundary of SVM is the maximum margin hyperplane that solves the learning samples [7]. Since the 1990s, it has been rapidly developed and derived a series of improved and extended algorithms. SVM has been used in pattern recognition problems such as facial recognition [3] and text classification [15]. In machine learning, the decision tree is a predictive model representing a mapping relationship between object attributes and object values [14]. In the late 1980s and early 1990s, the researcher I. Ross. Quinlan developed a binary classification tree algorithm called ID3, and Quinlan later proposed C4.5 [34, 35]. Machine learning algorithms for training classifiers have difficulty reducing the number of features, and it is an exciting challenge for researchers. To use machine learning algorithms effectively, preprocessing of the data is essential. Feature selection is one of the most frequent and vital data preprocessing techniques and has become an indispensable component of the machine learning process [9, 20, 26]. Feature selection is the process of selecting relevant features or a candidate subset of features. The evaluation criteria are used to getting an optimal feature subset. To find the high dimensional data of the optimal feature subset is a difficult task [21]. In general, feature selection refers to applying statistical tests to inputs, given a specified output, to determine which columns are more predictive of the output. The algorithms used in measuring the importance of the features include statistic methods, Pearsons’s or Kendall’s correlation, mutual information scores and chi-squared values. In this paper, the research focus on finding the best features applied to three classifiers for IDS. The features are ranked by predictive power, and the best features are selected based on their scores for defined metrics. The contribution of this paper is to propose a management system of Intrusion Prevention System (IPS) apply the technology on multimedia platform. The IPS including IDS helps in monitoring of all regular and normal patterns of traffic and sends alerts in case of any kind of deviation from the normal pattern.

2 Intrusion detection system for multimedia platform

IP Multimedia Subsystem (IMS) is a multimedia platform developed to provide distinct network services like voice, data, and video. The idea of IMS [30] is to assimilate voice communication and Internet technologies. It includes the sets of core network functional entities and interfaces used by service providers to provide services based on the Session Initiation Protocol (SIP) [33]. IMS promises to provide multi-services, miscellaneous access networks, IP based secure, and reliable network.

2.1 SIP flooding attack

The architecture of attack detection of the SIP flooding attack is shown in Fig. 1. Intrusion Detection System (IDS) can be implemented by software such as OpenIMSCore to achieve the network’s security by observing the abnormal behavior of the network [10]. The open-source IMS client and the OpenIMSCore can probe the packets in the traffic network. The IDS is placed on workstations then it is known as Host-based IDS.

Fig. 1
figure 1

Architecture of SIP flooding attack detection

2.2 The proposed intrusion prevent system

The Intrusion Prevention System (IPS) [25, 29, 31, 39] is extended from Intrusion Detection System (IDS), as shown in Fig. 2(a), helps in monitoring of all regular and normal patterns of traffic and sends alerts in case of any kind of deviation from the normal pattern. In our designed IPS, it can learn to protect intrusion through the log file when the attack events happen as shown in Fig. 2(b). Because people can be allowed access over the network, the packets sent from intruders are merged into the network traffic as input. It is necessary to monitor the all incoming and outgoing traffic. Intelligent intrusion detection has a trained model for deciding whether the behavior of network traffic is normal or not. If the trained model detects the abnormal behavior category, it is identified as the attack and raised the alarm as a signal to the respected devices’ owner. Five labels of the data packets are normal, Denial of Service Attack (DoS), User to Root Attack(U2R), Remote to Local Attack(R2L) that can be probed and stored in the database.

Fig. 2
figure 2

Intelligent intrusion detection system and intrusion prevention systems

2.3 Machine learning technologies

Machine Learning (ML) is used to analyze and construct the system based on the data sets [5, 13, 22, 37, 38]. There are mainly three types of learning techniques based on labeled data, i.e., Supervised, unsupervised, and semi-supervised learning. Common machine learning algorithms are Support Vector Machine (SVM) [8, 23, 28]. Naïve-Bayes classifier [12, 27], K-nearest neighbor (KNN) [17, 40], artificial neural network (ANN) [16, 18, 19], deep neural network (DNN) [6], and so on. Figure 3 shows how to select the best features by permutation feature importance. Permutation feature importance measures the increase in the prediction error of the model after we permuted the feature’s values, which breaks the relationship between the feature and the true outcome. The model training procedure includes input training data, feature extraction, training model, evaluation and validation. The trained model can be used to test the new input data and make decision whether the packet traffic is normal.

Fig. 3
figure 3

Model training and testing procedure

3 Theory of classification

This section explains the theory of the classifiers such as Decision Tree (DT), Support Vector Machine (SVM), and Naïve Bayes (NB).

3.1 Binary classification tree

A procedure for growing a binary classification tree (BCT) is described. A space Rd is a d-dimensional cubic surface that contains the training data points \(x_{i}=\left (x_{i 1}, x_{i 2}, \ldots , x_{i d}\right ), i=1, \ldots , n\) A plane in Rd splits a region Rk− 1 into two subregions Rk and \(R_{k}^{\prime }(\mathrm {k}>=1)\). The Function E(a) is to calculate fraction of points in xiR miss-classified by a majority in region R. The plane has two parameters j and s, and the optimal j and s minimize equation is \(E\left (R_{k}(j, s)\right )+E\left (R_{k}^{\prime }(j, s)\right )\). The function E is as follow:

$$ E(R)=\left\{\begin{array}{l} \frac{N_{0}}{N_{R}} \text { , if } N_{0}<N_{R} \\ \\ \frac{N_{1}}{N_{K}} \text { , if } N_{1}<N_{0} \end{array}\right. $$
(1)

where \(\mathrm {R}_{\mathrm {k}}(\mathrm {j}, \mathrm {s})=\left \{\mathrm {x}_{\mathrm {i}} \in \mathrm {R}_{\mathrm {k}} \mid \mathrm {x}_{\text {ij}}>\mathrm {s}, 1<\mathrm {j}<\mathrm {d}\right \},\mathrm {R}_{\mathrm {k}}^{\prime }(\mathrm {j}, \mathrm {s})=\left \{\mathrm {x}_{\mathrm {i}} \in \mathrm {R}_{\mathrm {k}} \mid \mathrm {x}_{\text {ij}} \leq \mathrm {s}, 1<\mathrm {j}<\mathrm {d}\right \}\). The N0 is the number of points xi with label 0. The N1 is the number of points xi with label 1. The NR Is the number of points and NR = N0 + N1. Stopping criteria is only xiR one points in Rk.

3.2 Support vector machine classifier

Given a training data set D has n points and corresponding label set L\(.\vec {x}=\left (x_{i 1}, x_{i 2}, \ldots , x_{i d}\right )\) is a vector in a d-dimensional space Rd and xiD. yi is a lable and yi ∈L = {1,− 1}. It is an optimization problem to find the “maximum-margin hyperplane” for the SVM classifier. The hyperplane is the set of points \(\vec {x}\in R^{d}\) satisfying \(\vec {w} \cdot \vec {x}-b=0\). where \(\vec {w}\) is the normal vector to the hyperplane and the parameter \(\frac {\mathrm {b}}{\|{\vec {w}}\|}\) determines the offset of the hyperplane from the origin along the normal vector \(\vec {w}\). Two parallel hyperplane that separate two classes of data are the region bonded by these two hyperplanes in called the “margin”, and the maximum-margin hyperplane is the middle of these two hyperplanes as follows:

$$ \vec{\mathbf{w} }\cdot \vec{\mathbf{x}}-b=\left\{\begin{array}{cl} +1, & \text { for } y_{i}=1 \\ \\ -1, & \text { for } y_{i}=-1 \end{array}\right. $$
(2)

Where \(\vec {\mathbf {x}}\) denotes a input feature vector. The distance between these two hyperplane is \(\frac b{||{\vec {w}}||}\), so to maximize the distance equal to minimize \(||{\vec {w}}||\). The optimization problem is formulated as follow:

$$ \underset{\vec{w}, b}{\min}|| \vec{\text{w}} \| $$
(3)

In (3), it subject to \(y_{i} \cdot (\vec {\mathbf {w}} \cdot \vec {\mathbf {x}}-b) \geq 1, \text { for } i=1,2, \ldots , n\)

3.3 Naïve Bayesian classifier

The Naive Bayes algorithm is based on Bayes’ theorem. The formula of Bayes’ theorem is as follows:

$$ P(A \mid B)=\frac{P(B \mid A) P(A)}{P(B)} $$
(4)

P(A) is a priori probability. \(P\left (B \mid A\right )\) is the conditional probability. \(P\left (A \mid B\right )\) is the posterior probability. In addition, P(B) is the probability of B occurring under different given conditions. It can be expressed as (5):

$$ P(B)=P\left( B \mid A_{1}\right) P\left( A_{1}\right)+P\left( B \mid A_{2}\right) P\left( A_{2}\right)+\ldots+P\left( B \mid A_{n}\right) P \quad\left( A_{n}\right) $$
(5)
$$ D=\left\{\left( x^{(1)}, y^{(1)}\right),\left( x^{(2)}, y^{(2)}\right), \ldots,\left( x^{(N)}, y^{(N)}\right)\right\} $$
(6)

There are N data in D. Each data has n characteristics. y is the class corresponding to x. There are k classes. The method of determining the class to which a given x belongs is as follows. For a given x, by Bayes’ theorem:

$$ P\left( C_{k} \mid x\right)=\frac{P\left( x \mid C_{k}\right) P\left( C_{k}\right)}{P(x)} $$
(7)

For the naive Bayes classifier, it is assumed that n features are independent of each other. Then bring (8) into (7) and bring P(x) in (7) into the full probability formula. Then formula 9 is the naive Bayes model:

$$ P\left( x \mid C_{k}\right)=P\left( x_{1}, x_{2}, \ldots, x_{m} \mid C_{K}\right)=\prod\limits_{i=1}^{n} P\left( x_{i} \mid C_{k}\right) $$
(8)
$$ P\left( C_{k} \mid x\right)=\frac{P\left( C_{k}\right) {\prod}_{i=1}^{n} P\left( x_{i} \mid C_{k}\right)}{{\sum}_{k=1}^{k}\left[P\left( C_{k}\right) {\prod}_{i=1}^{n} P\left( x_{i} \mid C_{k}\right)\right]} $$
(9)

The symbols in Fig. 4 are explained as follows. True Positive means we predicted the object as positive and it is actually positive. True Negative means that we predicted the object is negative and it is actually negative. False Positive means we predicted the object as positive but it is actually negative. False Negative means we predicted the object as negative but it is actually positive.

Fig. 4
figure 4

Confusion matrix

4 Experimental results and discussion

The database used in this paper is the NSL-KDD test dataset, and each sample in the data set has 41 features. There are five labels of the data are normal, Denial of Service Attack (DoS), User to Root Attack(U2R), Remote to Local Attack(R2L), and probe. All the labels except the normal indicate the different attacks in the dataset. The NSL-KDD data set is divided into two categories-normal and attack. The main challenge of the intrusion detection model is to achieve maximum accuracy with a minimum false alarm rate (FP). The results are shown in three subsections. In the Sections 3.1, the decision tree classifier model is used for testing and finding the optimal features. In the Section 3.2, the SVM classifier is used for testing and finding the optimal features. In the Sections 3.3, the Bayes classifier is used for testing and finding the optimal features.

4.1 Decision tree classifier

Two-Class Boosted Decision Tree is a binary classifier used for testing and finding the optimal features. Table 1 shows the parameter values of the decision tree. The threshold value is set to 0.5. The Confusion matrix is as shown in Table 2. In the fusion matrix, the value of True Positive is 3889, the value of False Positive is 40, the value of False Negative is 56, and the value of True Negative is 5033. The Accuracy of the decision tree classifier reaches 98.9%, and the corresponding area under the curve (AUC) value reaches 0.999. The resulting receiver operating characteristic (ROC) curve is shown in Fig. 5. The permutation feature importance scores the features in the decision tree model, as shown in Table 3. The first column is the rank order, and the second column is the feature name. The third column is the score of the feature importance. The top 10 records are listed in rank order.

Table 1 Parameters of decision tree classifier
Table 2 Confusion Matrix of decision tree classifier
Fig. 5
figure 5

The ROC curve of the decision tree classifier

Table 3 Feature score of the decision tree classifier

4.2 Support vector machine classifier

The Support Vector Machine (SVM) classifier is a binary classifier. Table 4 shows the parameter values of the SVM. The threshold is 0.5. Table 5 is the Confusion Matrix of SVM. The value of True Positive is 3711. The value of False Positive is 231. The value of False Negative is 234 and the value of True Negative is 4842. The Accuracy of the decision tree classifier reaches 94.8%, and the corresponding area under the curve (AUC) value reaches 0.985. The resulting receiver operating characteristic (ROC) curve is shown in Fig. 6. The permutation feature importance scores the features in the SVM classifier, as shown in Table 6. The first column is the rank order, and the second column is the feature name. The third column is the score of the feature importance. The top 10 records are listed in rank order.

Table 4 Parameters of SVM classifier
Table 5 Confusion Matrix of SVM classifier
Fig. 6
figure 6

The ROC curve of the SVM classifier

Table 6 The feature score of the SVM classifier

4.3 Naïve Bayesian classifier

The Naïve Bayesian classifier is a binary classifier. Table 7 shows the parameter values of the Naïve Bayesian. The threshold is 0.5. Table 8 is the Confusion Matrix of Naïve Bayesian classifier. The value of True Positive is 3695. The value of False Positive is 246. The value of False Negative is 250 and the value of True Negative is 4827. The Accuracy of the Naïve Bayesian classifier reaches 94.4%, and the corresponding area under the curve (AUC) value reaches 0.978. The resulting receiver operating characteristic (ROC) curve is shown in Fig. 7. The permutation feature importance scores the features in the Naïve Bayesian classifier as shown in Table 9. The first column is the rank order, and the second column is the feature name. The third column is the score of the feature importance. The top 10 records are listed in rank order.

Table 7 Parameters of Naïve Bayesian classifier
Table 8 Confusion Matrix of Naïve Bayesian classifier
Fig. 7
figure 7

The ROC curve of the Naïve Bayesian classifier

Table 9 The feature score of the Naïve Bayesian classifier

4.4 Comparison

Machine Learning algorithms are developing on the rising situation. Every year new techniques are presented that update the current leading algorithms. It is hard to define state of art since there is not certain algorithm capable of solving all kind of ML problems. The need of ML algorithms really varies with the constraints of the tasks. However in some sense we might list the well performing algorithms in their suitable use-cases with the best result such as SVM, Decision Tree, Naïve Bayesian classifier. For the intrusion detection task can be presented the comparisons between classifiers and deep learning model [6]. Figure 8 shows the ROC curves of the Decision Tree, SVM and Naïve Bayesian classifier. Their area under the curve (AUC) individually are 0.999, 0.985 and 0.978. Performance comparisons between classifiers and state-of-the-art methods are listed in the Table 10. The Decision Tree classifier has the best performance.

Fig. 8
figure 8

The ROC curves of the Decision Tree, SVM and Naïve Bayesian classifier

Table 10 Performance comparison between classifiers and state-of-the-art

5 Conclusion

An intelligent intrusion detection system is proposed for the security IP Multimedia Subsystem (IMS) based on machine learning technology. For increasing the accuracy of the classifiers, it is vital to select the critical features to construct the intrusion detection system. The decision tree, SVM, and Bayesian are binary classifiers that are used to test the NSL-KDD data set. Based on the experimental results, six critical features affecting the accuracy are respectively “Service”, “dst_host_same_srv_rate”, “Flag”, “Protocol_type”, “Dst_host_rerror_rate” and “Count”. The values of intrusion detection accuracy are separately 98.9%, 94.8%, 94.4%. The accuracy of the deep learning model is 91.5. The experimental results show that the machine learning classifiers with critical features have better accuracy than deep learning. In future work, the effect of the six features will be further verified for other classifiers.