Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques.

Tout, Hicham

Notes FAQ Contact Us

Back to results

Direct link

ERIC Number: ED563855

Record Type: Non-Journal

Publication Date: 2013

Pages: 127

Abstractor: As Provided

ISBN: 978-1-3035-8709-2

ISSN: N/A

EISSN: N/A

Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques

Tout, Hicham

ProQuest LLC, Ph.D. Dissertation, Nova Southeastern University

The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning algorithms to detect phishing attack did increase the predictive accuracy of these learning algorithms. The same research also recommended further investigation of the impact of including an expanded set of email headers on the predictive accuracy of machine learning algorithms. In addition, research has shown that the cost of misclassifying legitimate emails as phishing attacks--false positives--was far higher than that of misclassifying phishing emails as legitimate--false negatives, while the opposite was true in the case of fraud detection. Consequently, they recommended that cost sensitive measures be taken in order to further improve the weighted predictive accuracy of machine learning algorithms. Motivated by the potentially high impact of the inclusion of email headers on the predictive accuracy of machines learning algorithms and the significance of enabling cost-sensitive measures as part of the learning process, the goal of this research was to quantify the impact of including an extended set of email headers and to investigate the impact of imposing penalty as part of the learning process on the number of false positives. It was believed that if email headers were included and cost-sensitive measures were taken as part of the learning process, than the overall weighted, predictive accuracy of the machine learning algorithm would be improved. The results showed that adding email headers as features did improve the overall predictive accuracy of machine learning algorithms and that cost-sensitive measure taken as part of the learning process did result in lower false positives. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]

Descriptors: Electronic Mail, Predictive Validity, Accuracy, Artificial Intelligence, Computer Security

ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml

Publication Type: Dissertations/Theses - Doctoral Dissertations

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Privacy | Copyright | Contact Us | Selection Policy | API