skip to main content
10.1145/1242572.1242660acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

Learning to detect phishing emails

Published:08 May 2007Publication History

ABSTRACT

Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account information, logon credentials, and identity information in general. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to spoofed websites that harvest information. We present a method for detecting these attacks, which in its most general form is an application of machine learning on a feature set designed to highlight user-targeted deception in electronic communication. This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites. We evaluate this method on a set of approximately 860 such phishing emails, and 6950 non-phishing emails, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1% of the legitimate emails. We conclude with thoughts on the future for such techniques to specifically identify deception, specifically with respect to the evolutionary nature of the attacks and information available.

References

  1. K. Albrecht, N. Burri, and R. Wattenhofer. Spamato - An Extendable Spam Filter System. In 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, July 2005.Google ScholarGoogle Scholar
  2. A. Alsaid and C. J. Mitchell. Installing fake root keys in a pc. In EuroPKI, pages 227--239, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anti-Phishing Working Group. Phishing activity trends report, Jan. 2005. http://www.antiphishing.org/reports/apwg_report_jan_2006.pdf.Google ScholarGoogle Scholar
  4. Apache Software Foundation. Spamassassin homepage, 2006. http://spamassassin.apache.org/.Google ScholarGoogle Scholar
  5. Apache Software Foundation. Spamassassin public corpus, 2006. http://spamassassin.apache.org/publiccorpus/.Google ScholarGoogle Scholar
  6. L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Chandrasekaran, K. Karayanan, and S. Upadhyaya. Towards phishing e-mail detection based on their structural properties. In New York State Cyber Security Conference, 2006.Google ScholarGoogle Scholar
  8. N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.Google ScholarGoogle Scholar
  9. W. Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 124--143. IOS Press, 1996.Google ScholarGoogle Scholar
  10. L. Cranor, S. Egelman, J. Hong, and Y. Zhang. Phinding phish: An evaluation of anti-phishing toolbars. Technical report, Carnegie Mellon University, Nov. 2006.Google ScholarGoogle Scholar
  11. N. Cristianini and J. Shawe-Taylor. An introduction to support Vector Machines: and other kernel-based learning methods. Cambridge University Press, New York, NY, USA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. FDIC. Putting an end to account-hijacking identity theft, Dec. 2004. http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.Google ScholarGoogle Scholar
  13. I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. Technical Report CMU-ISRI-06-112, Institute for Software Research, Carnegie Mellon University, June 2006. http://reports-archive.adm.cs.cmu.edu/anon/isri2006/abstracts/06-112.html.Google ScholarGoogle ScholarCross RefCross Ref
  14. F. L. Gandon and N. M. Sadeh. Semantic web technologies to reconcile privacy and context awareness. Journal of Web Semantics, 1(3):241--260, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. Gilby Productions. Tinyurl, 2006. http://www.tinyurl.com/.Google ScholarGoogle Scholar
  16. P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Jan 2003.Google ScholarGoogle Scholar
  17. B. Leiba and N. Borenstein. A multifaceted approach to spam reduction. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google ScholarGoogle Scholar
  18. T. Meyer and B. Whateley. Spambayes: Effective open-source, bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google ScholarGoogle Scholar
  19. Microsoft. Sender ID framework, 2006. http://www.microsoft.com/senderid.Google ScholarGoogle Scholar
  20. T. M. Mitchell. Machine Learning. McGraw-Hill Higher Education, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mozilla. Mozilla thunderbird, 2006. http://www.mozilla.com/thunderbird/.Google ScholarGoogle Scholar
  22. J. Nazario. phishingcorpus homepage, Apr. 2006. http://monkey.org/%7Ejose/wiki/doku.php?id=PhishingCorpus.Google ScholarGoogle Scholar
  23. Netcraft Ltd. Netcraft toolbar, 2006. http://toolbar.netcraft.com/.Google ScholarGoogle Scholar
  24. V. V. Prakash. Vipul's razor, 2006. http://razor.sourceforge.net.Google ScholarGoogle Scholar
  25. M. H. Rachna Dhamija, Doug Tygar. Why phishing works. In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 581--590. ACM Special Interest Group on Computer-Human Interaction, January 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. I. Rigoutsos and T. Huynh. Chung-kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (spam). In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google ScholarGoogle Scholar
  27. M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.Google ScholarGoogle Scholar
  28. Yahoo. Domainkeys, 2006. http://antispam.yahoo.com/domainkeys.Google ScholarGoogle Scholar
  29. Yahoo. Flickr homepage, 2006. http://www.flickr.com/.Google ScholarGoogle Scholar
  30. Y. Zhang, J. Hong, and L. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to detect phishing emails

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              WWW '07: Proceedings of the 16th international conference on World Wide Web
              May 2007
              1382 pages
              ISBN:9781595936547
              DOI:10.1145/1242572

              Copyright © 2007 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 8 May 2007

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate1,899of8,196submissions,23%

              Upcoming Conference

              WWW '24
              The ACM Web Conference 2024
              May 13 - 17, 2024
              Singapore , Singapore

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader