NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED563855
Record Type: Non-Journal
Publication Date: 2013
Pages: 127
Abstractor: As Provided
ISBN: 978-1-3035-8709-2
ISSN: N/A
EISSN: N/A
Phishtest: Measuring the Impact of Email Headers on the Predictive Accuracy of Machine Learning Techniques
Tout, Hicham
ProQuest LLC, Ph.D. Dissertation, Nova Southeastern University
The majority of documented phishing attacks have been carried by email, yet few studies have measured the impact of email headers on the predictive accuracy of machine learning techniques in detecting email phishing attacks. Research has shown that the inclusion of a limited subset of email headers as features in training machine learning algorithms to detect phishing attack did increase the predictive accuracy of these learning algorithms. The same research also recommended further investigation of the impact of including an expanded set of email headers on the predictive accuracy of machine learning algorithms. In addition, research has shown that the cost of misclassifying legitimate emails as phishing attacks--false positives--was far higher than that of misclassifying phishing emails as legitimate--false negatives, while the opposite was true in the case of fraud detection. Consequently, they recommended that cost sensitive measures be taken in order to further improve the weighted predictive accuracy of machine learning algorithms. Motivated by the potentially high impact of the inclusion of email headers on the predictive accuracy of machines learning algorithms and the significance of enabling cost-sensitive measures as part of the learning process, the goal of this research was to quantify the impact of including an extended set of email headers and to investigate the impact of imposing penalty as part of the learning process on the number of false positives. It was believed that if email headers were included and cost-sensitive measures were taken as part of the learning process, than the overall weighted, predictive accuracy of the machine learning algorithm would be improved. The results showed that adding email headers as features did improve the overall predictive accuracy of machine learning algorithms and that cost-sensitive measure taken as part of the learning process did result in lower false positives. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A