NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1071707
Record Type: Journal
Publication Date: 2015-Sep
Pages: 5
Abstractor: As Provided
ISSN: ISSN-0007-1013
Classification of Word Levels with Usage Frequency, Expert Opinions and Machine Learning
Sohsah, Gihad N.; Ünal, Muhammed Esad; Güzey, Onur
British Journal of Educational Technology, v46 n5 p1097-1101 Sep 2015
Educational applications for language teaching can utilize the language levels of words to target proficiency levels of students. This paper and the accompanying data provide a methodology for making educational standard-aligned language-level predictions for all English words. The methodology involves expert opinions on language levels and extending these opinions to other words using machine learning and data from a large corpus. Common European Framework for Languages (CEFR) level predictions for about 50,000 words, which can be readily used in educational applications, are also provided. For applications where the cost of misclassification varies, machine learning model parameters and algorithm selection must be adjusted. A large number of expert opinions taken from a survey with 30 practicing language teachers that can be used for this adjustment are also released. The overall methodology can be applied to low-resource languages, where CEFR-level classifications may not exist, by adding a comparable survey and corpus. The data are released with a Creative Commons Attribution license to enable free mixing, sharing and even use in commercial applications.
Wiley-Blackwell. 350 Main Street, Malden, MA 02148. Tel: 800-835-6770; Tel: 781-388-8598; Fax: 781-388-8232; e-mail:; Web site:
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A