Article

Learning to detect phishing emails

Authors:
Ian Fette

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Norman Sadeh

Carnegie Mellon University

Carnegie Mellon University
View Profile

,
Anthony Tomasic

Carnegie Mellon University

Carnegie Mellon University
View Profile

WWW '07: Proceedings of the 16th international conference on World Wide WebMay 2007Pages 649–656https://doi.org/10.1145/1242572.1242660

Published:08 May 2007Publication History

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 649–656

ABSTRACT

Each month, more attacks are launched with the aim of making web users believe that they are communicating with a trusted entity for the purpose of stealing account information, logon credentials, and identity information in general. This attack method, commonly known as "phishing," is most commonly initiated by sending out emails with links to spoofed websites that harvest information. We present a method for detecting these attacks, which in its most general form is an application of machine learning on a feature set designed to highlight user-targeted deception in electronic communication. This method is applicable, with slight modification, to detection of phishing websites, or the emails used to direct victims to these sites. We evaluate this method on a set of approximately 860 such phishing emails, and 6950 non-phishing emails, and correctly identify over 96% of the phishing emails while only mis-classifying on the order of 0.1% of the legitimate emails. We conclude with thoughts on the future for such techniques to specifically identify deception, specifically with respect to the evolutionary nature of the attacks and information available.

References

K. Albrecht, N. Burri, and R. Wattenhofer. Spamato - An Extendable Spam Filter System. In 2nd Conference on Email and Anti-Spam (CEAS), Stanford University, Palo Alto, California, USA, July 2005.Google Scholar
A. Alsaid and C. J. Mitchell. Installing fake root keys in a pc. In EuroPKI, pages 227--239, 2005. Google ScholarDigital Library
Anti-Phishing Working Group. Phishing activity trends report, Jan. 2005. http://www.antiphishing.org/reports/apwg_report_jan_2006.pdf.Google Scholar
Apache Software Foundation. Spamassassin homepage, 2006. http://spamassassin.apache.org/.Google Scholar
Apache Software Foundation. Spamassassin public corpus, 2006. http://spamassassin.apache.org/publiccorpus/.Google Scholar
L. Breiman. Random forests. Mach. Learn., 45(1):5--32, 2001. Google ScholarDigital Library
M. Chandrasekaran, K. Karayanan, and S. Upadhyaya. Towards phishing e-mail detection based on their structural properties. In New York State Cyber Security Conference, 2006.Google Scholar
N. Chou, R. Ledesma, Y. Teraguchi, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS, 2004.Google Scholar
W. Cohen. Learning to classify English text with ILP methods. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 124--143. IOS Press, 1996.Google Scholar
L. Cranor, S. Egelman, J. Hong, and Y. Zhang. Phinding phish: An evaluation of anti-phishing toolbars. Technical report, Carnegie Mellon University, Nov. 2006.Google Scholar
N. Cristianini and J. Shawe-Taylor. An introduction to support Vector Machines: and other kernel-based learning methods. Cambridge University Press, New York, NY, USA, 2000. Google ScholarDigital Library
FDIC. Putting an end to account-hijacking identity theft, Dec. 2004. http://www.fdic.gov/consumers/consumer/idtheftstudy/identity_theft.pdf.Google Scholar
I. Fette, N. Sadeh, and A. Tomasic. Learning to detect phishing emails. Technical Report CMU-ISRI-06-112, Institute for Software Research, Carnegie Mellon University, June 2006. http://reports-archive.adm.cs.cmu.edu/anon/isri2006/abstracts/06-112.html.Google ScholarCross Ref
F. L. Gandon and N. M. Sadeh. Semantic web technologies to reconcile privacy and context awareness. Journal of Web Semantics, 1(3):241--260, 2004.Google ScholarCross Ref
Gilby Productions. Tinyurl, 2006. http://www.tinyurl.com/.Google Scholar
P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Jan 2003.Google Scholar
B. Leiba and N. Borenstein. A multifaceted approach to spam reduction. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
T. Meyer and B. Whateley. Spambayes: Effective open-source, bayesian based, email classification system. In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
Microsoft. Sender ID framework, 2006. http://www.microsoft.com/senderid.Google Scholar
T. M. Mitchell. Machine Learning. McGraw-Hill Higher Education, 1997. Google ScholarDigital Library
Mozilla. Mozilla thunderbird, 2006. http://www.mozilla.com/thunderbird/.Google Scholar
J. Nazario. phishingcorpus homepage, Apr. 2006. http://monkey.org/%7Ejose/wiki/doku.php?id=PhishingCorpus.Google Scholar
Netcraft Ltd. Netcraft toolbar, 2006. http://toolbar.netcraft.com/.Google Scholar
V. V. Prakash. Vipul's razor, 2006. http://razor.sourceforge.net.Google Scholar
M. H. Rachna Dhamija, Doug Tygar. Why phishing works. In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 581--590. ACM Special Interest Group on Computer-Human Interaction, January 2006. Google ScholarDigital Library
I. Rigoutsos and T. Huynh. Chung-kwei: a pattern-discovery-based system for the automatic identification of unsolicited e-mail messages (spam). In Proceedings of the First Conference on Email and Anti-Spam (CEAS), 2004.Google Scholar
M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A bayesian approach to filtering junk e-mail. In Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin, 1998. AAAI Technical Report WS-98-05.Google Scholar
Yahoo. Domainkeys, 2006. http://antispam.yahoo.com/domainkeys.Google Scholar
Yahoo. Flickr homepage, 2006. http://www.flickr.com/.Google Scholar
Y. Zhang, J. Hong, and L. Cranor. Cantina: A content-based approach to detecting phishing web sites. In WWW, 2007. Google ScholarDigital Library

Index Terms

Learning to detect phishing emails

Recommendations

How Experts Detect Phishing Scam Emails
CSCW

Phishing scam emails are emails that pretend to be something they are not in order to get the recipient of the email to undertake some action they normally would not. While technical protections against phishing reduce the number of phishing emails ...
Read More
A Sender-Centric Approach to Detecting Phishing Emails
CYBERSECURITY '12: Proceedings of the 2012 International Conference on Cyber Security

Email-based online phishing is a critical security threat on the Internet. Although phishers have great flexibility in manipulating both the content and structure of phishing emails, phishers have much less flexibility in completely concealing the ...
Read More
Fighting against phishing attacks: state of the art and future challenges

In the last few years, phishing scams have rapidly grown posing huge threat to global Internet security. Today, phishing attack is one of the most common and serious threats over Internet where cyber attackers try to steal user's personal or financial ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
email
filtering
learning
phishing
semantic attacks
spam
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 345
  Total Citations
  View Citations
- 6,526
  Total Downloads
- Downloads (Last 12 months)484
- Downloads (Last 6 weeks)66
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to detect phishing emails

WWW '07: Proceedings of the 16th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

How Experts Detect Phishing Scam Emails

A Sender-Centric Approach to Detecting Phishing Emails

Fighting against phishing attacks: state of the art and future challenges