ERIC - Search Results

Publication Date

In 2024	0
Since 2023	1
Since 2020 (last 5 years)	3
Since 2015 (last 10 years)	5
Since 2005 (last 20 years)	5

Descriptor

Evaluators	9
Scoring	9
Higher Education	3
Interrater Reliability	3
Multiple Choice Tests	3
Standard Setting (Scoring)	3
Comparative Analysis	2
Correlation	2
Cutting Scores	2
Data Collection	2
Decision Making	2
Item Analysis	2
Item Response Theory	2
Mathematics Tests	2
Models	2
Performance Based Assessment	2
Psychology	2
Undergraduate Students	2
Accuracy	1
Achievement Tests	1
Artificial Intelligence	1
Bias	1
Classification	1
Computer Assisted Instruction	1
Computer Assisted Testing	1
More ▼

Source

Educational and Psychological…

Author

Wind, Stefanie A.	2
Ardison, Sharon	1
Engelhard, George, Jr.	1
Fehrmann, Melinda L.	1
Ge, Yuan	1
Guo, Wenjing	1
Khorramdel, Lale	1
Kilcullen, Robert N.	1
LaVoie, Noelle	1
Legree, Peter J.	1
Lunz, Mary E.	1
Melican, Gerald J.	1
Parker, James	1
Plake, Barbara S.	1
Tyack, Lillian	1
Wang, Jue	1
Woehr, David J.	1
von Davier, Matthias	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	8
Reports - Evaluative	1

Education Level

Elementary Secondary Education

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Scoring Graphical Responses in TIMSS 2019 Using Artificial Neural Networks

Peer reviewed

Direct link

von Davier, Matthias; Tyack, Lillian; Khorramdel, Lale – Educational and Psychological Measurement, 2023

Automated scoring of free drawings or images as responses has yet to be used in large-scale assessments of student achievement. In this study, we propose artificial neural networks to classify these types of graphical responses from a TIMSS 2019 item. We are comparing classification accuracy of convolutional and feed-forward approaches. Our…

Descriptors: Scoring, Networks, Artificial Intelligence, Elementary Secondary Education

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A.; Ge, Yuan – Educational and Psychological Measurement, 2021

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for…

Descriptors: Evaluators, Scoring, Data Collection, Design

Using Latent Semantic Analysis to Score Short Answer Constructed Responses: Automated Scoring of the Consequences Test

Peer reviewed

Direct link

LaVoie, Noelle; Parker, James; Legree, Peter J.; Ardison, Sharon; Kilcullen, Robert N. – Educational and Psychological Measurement, 2020

Automated scoring based on Latent Semantic Analysis (LSA) has been successfully used to score essays and constrained short answer responses. Scoring tests that capture open-ended, short answer responses poses some challenges for machine learning approaches. We used LSA techniques to score short answer responses to the Consequences Test, a measure…

Descriptors: Semantics, Evaluators, Essays, Scoring

Exploring the Impersonal Judgments and Personal Preferences of Raters in Rater-Mediated Assessments with Unfolding Models

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019

The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…

Descriptors: Evaluative Thinking, Preferences, Evaluators, Models

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Guo, Wenjing – Educational and Psychological Measurement, 2019

Rater effects, or raters' tendencies to assign ratings to performances that are different from the ratings that the performances warranted, are well documented in rater-mediated assessments across a variety of disciplines. In many real-data studies of rater effects, researchers have reported that raters exhibit more than one effect, such as a…

Descriptors: Evaluators, Bias, Scoring, Data Collection

Effects of Item Context on Intrajudge Consistency of Expert Judgments via the Nedelsky Standard Setting Method.

Peer reviewed

Plake, Barbara S.; Melican, Gerald J. – Educational and Psychological Measurement, 1989

The impact of overall test length and difficulty on the expert judgments of item performance by the Nedelsky method were studied. Five university-level instructors predicting the performance of minimally competent candidates on a mathematics examination were fairly consistent in their assessments regardless of length or difficulty of the test.…

Descriptors: Difficulty Level, Estimation (Mathematics), Evaluators, Higher Education

Interjudge Reliability and Decision Reproducibility.

Peer reviewed

Lunz, Mary E.; And Others – Educational and Psychological Measurement, 1994

In a study involving eight judges, analysis with the FACETS model provides evidence that judges grade differently, whether or not scores correlate well. This outcome suggests that adjustments for differences among judges should be made before student measures are estimated to produce reproducible decisions. (SLD)

Descriptors: Correlation, Decision Making, Evaluation Methods, Evaluators

The Angoff Cutoff Score Method: The Impact of Frame-of-Reference Rater Training.

Peer reviewed

Fehrmann, Melinda L.; And Others – Educational and Psychological Measurement, 1991

Two frame-of-reference rater training approaches were compared for effects on reliability and accuracy of cutoff scores generated by 21 raters using Angoff methods on tests taken by 155 undergraduates. Both approaches result in higher interrater reliability and more accuracy than does a non-frame-of-reference method. (SLD)

Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Higher Education

An Empirical Comparison of Cutoff Score Methods for Content-Related and Criterion-Related Validity Settings.

Peer reviewed

Woehr, David J.; And Others – Educational and Psychological Measurement, 1991

Methods for setting cutoff scores based on criterion performance, normative comparison, and absolute judgment were compared for scores on a multiple-choice psychology examination for 121 undergraduates and 251 undergraduates as a comparison group. All methods fell within the standard error of measurement. Implications of differences for decision…

Descriptors: Comparative Analysis, Concurrent Validity, Content Validity, Cutting Scores

Privacy | Copyright | Contact Us | Selection Policy | API