Detecting and Analyzing Cybercrime in Text-Based Communication of Cybercriminal Networks through Computational Linguistic and Psycholinguistic Feature Modeling.

Mbaziira, Alex Vincent

Notes FAQ Contact Us

Back to results

Direct link

ERIC Number: ED579547

Record Type: Non-Journal

Publication Date: 2017

Pages: 115

Abstractor: As Provided

ISBN: 978-0-3552-6206-3

ISSN: EISSN-

EISSN: N/A

Detecting and Analyzing Cybercrime in Text-Based Communication of Cybercriminal Networks through Computational Linguistic and Psycholinguistic Feature Modeling

Mbaziira, Alex Vincent

ProQuest LLC, Ph.D. Dissertation, George Mason University

Cybercriminals are increasingly using Internet-based text messaging applications to exploit their victims. Incidents of deceptive cybercrime in text-based communication are increasing and include fraud, scams, as well as favorable and unfavorable fake reviews. In this work, we use a text-based deception detection approach to train models for detecting text-based deceptive cybercrime in native and non-native English-speaking cybercriminal networks. I use both computational linguistic (CL) and psycholinguistic (PL) features for my models to study four types of deceptive text-based cybercrime: fraud, scams, favorable and unfavorable fake reviews. The data is obtained from three web genres namely: email, websites and social media. I build 1-dataset non-hybrid models as well as two types of hybrid models for native and non-native English speaking cybercriminal networks: 2-dataset and 3-dataset hybrid models. I use Naive Bayes, Support Vector Machines and kth Nearest Neighbor to train and test all the models. All the 1-dataset non-hybrid models are trained on data from one web genre and then used to detect and analyze other types of cybercrime in other web genres that are not part of the training set. Furthermore, all the 2-dataset hybrid models are trained on data combined from two web genres and then used to detect cybercrime in other web genres that are not part of the training set. Further still, the 3-dataset models are trained on every triplet data in three web genres and used to detect and analyze cybercrime in the web genre which was not part of the training set. Performance of the models on test datasets ranges from 60% to 80% accuracy with best performance on detection of fraud and unfavorable reviews. There were notable differences in models in detecting and analyzing scams in both native and non-native English speaking cybercriminal networks. This work can be applied as provider- or user-based filtering tools to identify cybercriminal actors and block or label messages before they reach their intended audience. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.]

Descriptors: Crime, Information Security, Models, Networks, Psycholinguistics, Deception, Social Media, Web Sites, Internet, Synchronous Communication, Computational Linguistics, Computer Software, Accuracy, Identification, Native Language, English, English (Second Language), Audiences, Computer Security

ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com/en-US/products/dissertations/individuals.shtml

Publication Type: Dissertations/Theses - Doctoral Dissertations

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Privacy | Copyright | Contact Us | Selection Policy | API