NotesFAQContact Us
Search Tips
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1022850
Record Type: Journal
Publication Date: 2014-May
Pages: 15
Abstractor: As Provided
Reference Count: N/A
ISSN: ISSN-0007-1013
Population Validity for Educational Data Mining Models: A Case Study in Affect Detection
Ocumpaugh, Jaclyn; Baker, Ryan; Gowda, Sujith; Heffernan, Neil; Heffernan, Cristina
British Journal of Educational Technology, v45 n3 p487-501 May 2014
Information and communication technology (ICT)-enhanced research methods such as educational data mining (EDM) have allowed researchers to effectively model a broad range of constructs pertaining to the student, moving from traditional assessments of knowledge to assessment of engagement, meta-cognition, strategy and affect. The automated detection of these constructs allows EDM researchers to develop intervention strategies that can be implemented either by the software or the teacher. It also allows for secondary analyses of the construct, where the detectors are applied to a data set that is much larger than one that could be analyzed by more traditional methods. However, in many cases, the data used to develop EDM models are collected from students who may not be representative of the broader populations who are likely to use ICT. In order to use EDM models (automated detectors) with new populations, their generalizability must be verified. In this study, we examine whether detectors of affect remain valid when applied to new populations. Models of four educationally relevant affective states were constructed based on data from urban, suburban and rural students using ASSISTments software for middle school mathematics in the Northeastern United States. We found that affect detectors trained on a population drawn primarily from one demographic grouping do not generalize to populations drawn primarily from the other demographic groupings, even though those populations might be considered part of the same national or regional culture. Models constructed using data from all three subpopulations are more applicable to students in those populations than those trained on a single group, but still do not achieve ideal population validity--the ability to generalize across all subgroups. In particular, models generalize better across urban and suburban students than rural students. These findings have important implications for data collection efforts, validation techniques, and the design of interventions that are intended to be applied at scale.
Wiley-Blackwell. 350 Main Street, Malden, MA 02148. Tel: 800-835-6770; Tel: 781-388-8598; Fax: 781-388-8232; e-mail:; Web site:
Publication Type: Reports - Research; Journal Articles
Education Level: Middle Schools; Junior High Schools; Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A