Abstract
Web spambots are becoming more advanced, utilizing techniques that can defeat existing spam detection algorithms. These techniques include performing a series of malicious actions with variable time delays, repeating the same series of malicious actions multiple times, and interleaving legitimate (decoy) and malicious actions. Existing methods that are based on string pattern matching are not able to detect spambots that use these techniques. In response, we define a new problem to detect spambots utilizing the aforementioned techniques and propose an efficient algorithm to solve it. Given a dictionary of temporally annotated sequences \(\hat{S}\) modeling spambot actions, each associated with a time window, a long, temporally annotated sequence T modeling a user action log, and parameters f and k, our problem seeks to detect each sequence in \(\hat{S}\) that occurs in T at least f times within its associated time window, and with at most k mismatches. Our algorithm solves the problem exactly, it requires linear time and space, and it employs advanced data structures and the Kangaroo method, to deal with the problem efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Yan, J., El Ahmad, A.S.: A low-cost attack on a Microsoft captcha. In: CCS. ACM, pp. 543–554 (2008)
Zinman, A., Donath, J.S.: Is britney spears spam? In: CEAS (2007)
Webb, S., Caverlee, J., Pu, C.: Social honeypots: making friends with a spammer near you. In: CEAS, pp. 1–10 (2008)
Heymann, P., Koutrika, G., Garcia-Molina, H.: Fighting spam on social web sites: a survey of approaches and future challenges. IEEE Internet Comput. 11(6), 36–45 (2007)
Hayati, P., Chai, K., Potdar, V., Talevski, A.: Behaviour-based web spambot detection by utilising action time and action frequency. In: International Conference on Computational Science and Its Applications, pp. 351–360 (2010)
Benevenuto, F., Rodrigues, T., Almeida, V., Almeida, J., Zhang, C., Ross, K.: Identifying video spammers in online social networks. In: International workshop on Adversarial Information Retrieval on the Web, pp. 45–52. ACM (2008)
Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: CODASPY, pp. 335–342 (2010)
Hayati, P., Potdar, V., Talevski, A., Smyth, W.: Rule-based on-the-fly web spambot detection using action strings. In: CEAS (2010)
Ghanaei, V., Iliopoulos, C.S., Pissis, S.P.: Detection of web spambot in the presence of decoy actions. In: IEEE International Conference on Big Data and Cloud Computing, pp. 277–279 (2014)
Nicolae, M., Rajasekaran, S.: On pattern matching with k mismatches and few don’t cares. IPL 118, 78–82 (2017)
Wang, D., Rundensteiner, E.A., Wang, H., Ellison III, R.T.: Active complex event processing: applications in real-time health care. PVLDB 3(1–2), 1545–1548 (2010)
Wang, D., He, Y., Rundensteiner, E., Naughton, J.F.: Utility-maximizing event stream suppression. In: SIGMOD, pp. 589–600 (2013)
Harvey, S.J.: Smart meters, smarter regulation: balancing privacy and innovation in the electric grid. UCLA L. Rev. 61, 2068 (2013)
Aljamea, M.M., Brankovic, L., Gao, J., Iliopoulos, C.S., Samiruzzaman, M.: Smart meter data analysis. In: Proceedings of the International Conference on Internet of Things and Cloud Computing, p. 22 (2016)
Alamro, H., Badkobeh, G., Belazzougui, D., Iliopoulos, C.S., Puglisi, S.J.: Computing the antiperiod(s) of a string. In: CPM, pp. 32:1–32:11 (2019)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: CPM, pp. 181–192 (2001)
Puglisi, S.J., Smyth, W.F., Turpin, A.H.: A taxonomy of suffix array construction algorithms. ACM Comput. Surv. 39(2), 4–es (2007)
Yamamoto, M., Church, K.W.: Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Comput. Linguist. 27(1), 1–30 (2001)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. JACM 53(6), 918–936 (2006)
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2, 53–86 (2004)
Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Enhanced suffix arrays and applications. CRC Press (2006)
Louza, F.A., Telles, G.P., Hoffmann, S., Ciferri, C.D.: Generalized enhanced suffix array construction in external memory. AMB 12(1), 26 (2017)
Nong, G., Zhang, S., Chan, W.H.: Linear suffix array construction by almost pure induced-sorting. In: DCC, pp. 193–202 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alamro, H., Iliopoulos, C.S., Loukides, G. (2020). Efficiently Detecting Web Spambots in a Temporally Annotated Sequence. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds) Advanced Information Networking and Applications. AINA 2020. Advances in Intelligent Systems and Computing, vol 1151. Springer, Cham. https://doi.org/10.1007/978-3-030-44041-1_87
Download citation
DOI: https://doi.org/10.1007/978-3-030-44041-1_87
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-44040-4
Online ISBN: 978-3-030-44041-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)