ABSTRACT
Early Peer-to-Peer overlay network traffic classification schemes were based on port-based and payload based inspection. In recent years researchers have focused on alternate machine learning approaches. This paper presents ensemble learning which combines multiple models to improve prediction accuracy over a single classifier or semi-supervised learning techniques. In this paper, statistical characteristics of TCP and UDP flows are extracted from the network traces to construct a feature set first. We then apply feature selection techniques to reduce the number of features required to train the model, hence reducing the build time. We used Stacking and Voting ensemble learning techniques to improve prediction accuracy with base classifiers modelled using Machine Learning (ML) algorithms: Naïve Bayes classifier, Bayesian Network, Decision trees. We used meta classifiers to further improve classification accuracy to 99.9%. Our experimental results show that Stacking perform better over Voting in identifying P2P traffic.
- Mawi traffic archive. available online at:. http://mawi.wide.ad.jp/mawi/. Accessed on 25th July 2013.Google Scholar
- Sandvine. available online at:. http://sandvine.com/. Accessed on 10th July 2013.Google Scholar
- H. H. Ang, V. Gopalkrishnan, S. C. Hoi, and W. K. Ng. Adaptive ensemble classification in p2p networks. In Database Systems for Advanced Applications, pages 34--48. Springer, 2010. Google ScholarDigital Library
- M. Bednarczyk. jnetpcap website. http://jnetpcap.com. Accessed on 7th May 2013.Google Scholar
- N. Brownlee. Netramet & nemac reference manual v4. 3, 1999.Google Scholar
- M. Dash and H. Liu. Consistency-based search in feature selection. Artificial intelligence, 151(1): 155--176, 2003. Google ScholarDigital Library
- T. G. Dietterich. Machine-learning research. AI magazine, 18(4): 97, 1997.Google Scholar
- S. Dong, D. Zhou, and W. Ding. Traffic classification model based on integration of multiple classifiers? Journal of Computational Information Systems, 8(24): 10429--10437, 2012.Google Scholar
- I. Jolliffe. Principal component analysis. Wiley Online Library, 2005.Google ScholarCross Ref
- T. Karagiannis, K. Papagiannaki, and M. Faloutsos. Blinc: multilevel traffic classification in the dark. In ACM SIGCOMM Computer Communication Review, volume 35, pages 229--240. ACM, 2005. Google ScholarDigital Library
- H. Mark, F. Eibe, H. Geoffrey, P. Bernhard, R. Peter, and W. Ian H. The weka data mining software: An update. In SIGKDD Explorations. KDD, 2009. Google ScholarDigital Library
- T. T. Nguyen and G. Armitage. A survey of techniques for internet traffic classification using machine learning. Communications Surveys & Tutorials, IEEE, 10(4): 56--76, 2008. Google ScholarDigital Library
- N. Pratik, D. Jagan Mohan Reddy, and C. Hota. Feature selection for detection of p2p botnet traffic. In ACM Compute, Vellore. ACM, 2013. Google ScholarDigital Library
- B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted p2p traffic. volume 7967 of LNCS, pages 62--82. Springer Berlin Heidelberg, 2013. Google ScholarDigital Library
- H. Schulze and K. Mochalski. ipoque internet study 2008/2009. available online at:. http://ipoque.com/en/news-events/press-center/press-releases/2009/. Accessed on 11th July 2013.Google Scholar
- P. Van Der Putten and M. Van Someren. A bias-variance analysis of a real world learning problem: The coil challenge 2000. Machine Learning, 57(1--2): 177--195, 2004. Google ScholarDigital Library
- R. Wang, L. Shi, and B. Jennings. Ensemble classifier for traffic in presence of changing distributions. In IEEE ISCC, Split, Croatia. IEEE, 2013.Google Scholar
- R. Wang, L. Shi, and B. Jennings. Training traffic classifiers with arbitrary packets sets. In IEEE TRICANS ICC Workshop, Budapest, Hungary. IEEE, 2013.Google Scholar
- D. Zhao, R. C. Wang, and H. Xu. P2p traffic identification model based on ensemble learning. Journal of Nanjing University of Posts and Telecommunications(Natural Science), 2011-04.Google Scholar
Index Terms
- P2P traffic classification using ensemble learning
Recommendations
Rough set Based Ensemble Classifier forWeb Page Classification
Combining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier to classify ...
Building boosted classification tree ensemble with genetic programming
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference CompanionAdaptive boosting (AdaBoost) is a method for building classification ensemble, which combines multiple classifiers built in an iterative process of reweighting instances. This method proves to be a very effective classification method, therefore it was ...
Rough set Based Ensemble Classifier forWeb Page Classification
Combining the results of a number of individually trained classification systems to obtain a more accurate classifier is a widely used technique in pattern recognition. In this article, we have introduced a rough set based meta classifier to classify ...
Comments