Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?.

Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

Notes FAQ Contact Us

Peer reviewed

Direct link

ERIC Number: EJ1038260

Record Type: Journal

Publication Date: 2014-Feb

Pages: 23

Abstractor: As Provided

ISBN: N/A

ISSN: ISSN-1059-0145

EISSN: N/A

Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis; Boone, William J.

Journal of Science Education and Technology, v23 n1 p160-182 Feb 2014

The landscape of science education is being transformed by the new "Framework for Science Education" (National Research Council, "A framework for K-12 science education: practices, crosscutting concepts, and core ideas." The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices--such as explanation, argumentation, and communication--in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new "Framework for Science Education."

Descriptors: Science Education, Educational Assessment, Artificial Intelligence, Open Source Technology, Evolution, Student Evaluation, Science Tests, Multiple Choice Tests, Interviews, Undergraduate Students, Verbal Tests, Item Response Theory, Goodness of Fit, Scoring

Springer. 233 Spring Street, New York, NY 10013. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-348-4505; e-mail: service-ny@springer.com; Web site: http://www.springerlink.com

Publication Type: Journal Articles; Reports - Research

Education Level: Higher Education; Postsecondary Education

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Privacy | Copyright | Contact Us | Selection Policy | API