ABSTRACT
Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account information, logon credentials, and identity information in general. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to spoofed websites that harvest information. We present a method for detecting these attacks, which in its most general form is an application of machine learning on a feature set designed to highlight user-targeted deception in electronic communication. This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites. We evaluate this method on a set of approximately 860 such phishing emails, and 6950 non-phishing emails, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1% of the legitimate emails. We conclude with thoughts on the future for such techniques to specifically identify deception, specifically with respect to the evolutionary nature of the attacks and information available.
- K. Albrecht, N. Burri, and R. Wattenhofer. Spamato - An Extendable Spam Filter System. In 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, July 2005.Google Scholar
- A. Alsaid and C. J. Mitchell. Installing fake root keys in a pc. In EuroPKI, pages 227--239, 2005. Google ScholarDigital Library
- Anti-Phishing Working Group. Phishing activity trends report, Jan. 2005. http://www.antiphishing.org/reports/apwg_report_jan_2006.pdf.Google Scholar
- Apache Software Foundation. Spamassassin homepage, 2006. http://spamassassin.apache.org/.Google Scholar
- Apache Software Foundation. Spamassassin public corpus, 2006. http://spamassassin.apache.org/publiccorpus/.Google Scholar
- L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001. Google ScholarDigital Library
- M. Chandrasekaran, K. Karayanan, and S. Upadhyaya. Towards phishing e-mail detection based on their structural properties. In New York State Cyber Security Conference, 2006.Google Scholar
- N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.Google Scholar
- W. Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 124--143. IOS Press, 1996.Google Scholar
- L. Cranor, S. Egelman, J. Hong, and Y. Zhang. Phinding phish: An evaluation of anti-phishing toolbars. Technical report, Carnegie Mellon University, Nov. 2006.Google Scholar
- N. Cristianini and J. Shawe-Taylor. An introduction to support Vector Machines: and other kernel-based learning methods. Cambridge University Press, New York, NY, USA, 2000. Google ScholarDigital Library
- FDIC. Putting an end to account-hijacking identity theft, Dec. 2004. http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.Google Scholar
- I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. Technical Report CMU-ISRI-06-112, Institute for Software Research, Carnegie Mellon University, June 2006. http://reports-archive.adm.cs.cmu.edu/anon/isri2006/abstracts/06-112.html.Google ScholarCross Ref
- F. L. Gandon and N. M. Sadeh. Semantic web technologies to reconcile privacy and context awareness. Journal of Web Semantics, 1(3):241--260, 2004.Google ScholarCross Ref
- Gilby Productions. Tinyurl, 2006. http://www.tinyurl.com/.Google Scholar
- P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Jan 2003.Google Scholar
- B. Leiba and N. Borenstein. A multifaceted approach to spam reduction. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
- T. Meyer and B. Whateley. Spambayes: Effective open-source, bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
- Microsoft. Sender ID framework, 2006. http://www.microsoft.com/senderid.Google Scholar
- T. M. Mitchell. Machine Learning. McGraw-Hill Higher Education, 1997. Google ScholarDigital Library
- Mozilla. Mozilla thunderbird, 2006. http://www.mozilla.com/thunderbird/.Google Scholar
- J. Nazario. phishingcorpus homepage, Apr. 2006. http://monkey.org/%7Ejose/wiki/doku.php?id=PhishingCorpus.Google Scholar
- Netcraft Ltd. Netcraft toolbar, 2006. http://toolbar.netcraft.com/.Google Scholar
- V. V. Prakash. Vipul's razor, 2006. http://razor.sourceforge.net.Google Scholar
- M. H. Rachna Dhamija, Doug Tygar. Why phishing works. In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 581--590. ACM Special Interest Group on Computer-Human Interaction, January 2006. Google ScholarDigital Library
- I. Rigoutsos and T. Huynh. Chung-kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (spam). In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
- M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.Google Scholar
- Yahoo. Domainkeys, 2006. http://antispam.yahoo.com/domainkeys.Google Scholar
- Yahoo. Flickr homepage, 2006. http://www.flickr.com/.Google Scholar
- Y. Zhang, J. Hong, and L. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW, 2007. Google ScholarDigital Library
Index Terms
- Learning to detect phishing emails
Recommendations
How Experts Detect Phishing Scam Emails
CSCWPhishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails ...
A Sender-Centric Approach to Detecting Phishing Emails
CYBERSECURITY '12: Proceedings of the 2012 International Conference on Cyber SecurityEmail-based online phishing is a critical security threat on the Internet. Although phishers have great flexibility in manipulating both the content and structure of phishing emails, phishers have much less flexibility in completely concealing the ...
Fighting against phishing attacks: state of the art and future challenges
In the last few years, phishing scams have rapidly grown posing huge threat to global Internet security. Today, phishing attack is one of the most common and serious threats over Internet where cyber attackers try to steal user's personal or financial ...
Comments