Publication Date
In 2024 | 6 |
Since 2023 | 33 |
Since 2020 (last 5 years) | 103 |
Since 2015 (last 10 years) | 239 |
Since 2005 (last 20 years) | 453 |
Descriptor
Test Items | 346 |
Item Response Theory | 257 |
Scores | 199 |
Test Construction | 197 |
Models | 177 |
Comparative Analysis | 168 |
Test Reliability | 164 |
Test Validity | 161 |
Simulation | 151 |
Higher Education | 136 |
Statistical Analysis | 133 |
More ▼ |
Source
Journal of Educational… | 1350 |
Author
Publication Type
Education Level
Secondary Education | 24 |
Higher Education | 20 |
Postsecondary Education | 17 |
Elementary Secondary Education | 11 |
High Schools | 9 |
Elementary Education | 7 |
Middle Schools | 7 |
Grade 4 | 3 |
Grade 8 | 3 |
Grade 7 | 2 |
Intermediate Grades | 2 |
More ▼ |
Audience
Researchers | 21 |
Practitioners | 4 |
Teachers | 1 |
Location
Israel | 7 |
Netherlands | 6 |
United States | 5 |
Canada | 4 |
United Kingdom | 3 |
United Kingdom (England) | 3 |
Australia | 2 |
Belgium | 2 |
China | 2 |
Georgia | 2 |
Hong Kong | 2 |
More ▼ |
Laws, Policies, & Programs
Elementary and Secondary… | 2 |
No Child Left Behind Act 2001 | 2 |
Defunis v Odegaard | 1 |
Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Nieto, Ricardo; Casabianca, Jodi M. – Journal of Educational Measurement, 2019
Many large-scale assessments are designed to yield two or more scores for an individual by administering multiple sections measuring different but related skills. Multidimensional tests, or more specifically, simple structured tests, such as these rely on multiple multiple-choice and/or constructed responses sections of items to generate multiple…
Descriptors: Tests, Scoring, Responses, Test Items
Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019
Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…
Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation
Lane, Suzanne – Journal of Educational Measurement, 2019
Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…
Descriptors: Responses, Accuracy, Validity, Interrater Reliability
Wang, Jue; Engelhard, George, Jr. – Journal of Educational Measurement, 2019
Rater-mediated assessments exhibit scoring challenges due to the involvement of human raters. The quality of human ratings largely determines the reliability, validity, and fairness of the assessment process. Our research recommends that the evaluation of ratings should be based on two aspects: a theoretical model of human judgment and an…
Descriptors: Evaluative Thinking, Models, Measurement, Achievement
Wesolowski, Brian C. – Journal of Educational Measurement, 2019
The purpose of this study was to build a Random Forest supervised machine learning model in order to predict musical rater-type classifications based upon a Rasch analysis of raters' differential severity/leniency related to item use. Raw scores (N = 1,704) from 142 raters across nine high school solo and ensemble festivals (grades 9-12) were…
Descriptors: Item Response Theory, Prediction, Classification, Artificial Intelligence
Briggs, Derek C.; Chattergoon, Rajendra; Burkhardt, Amy – Journal of Educational Measurement, 2019
The process of setting and evaluating student learning objectives (SLOs) has become increasingly popular as an example where classroom assessment is intended to fulfill the dual purpose use of informing instruction and holding teachers accountable. A concern is that the high-stakes purpose may lead to distortions in the inferences about students…
Descriptors: Student Educational Objectives, Student Evaluation, Teacher Evaluation, Scores
Liu, Bowen; Kennedy, Patrick C.; Seipel, Ben; Carlson, Sarah E.; Biancarosa, Gina; Davison, Mark L. – Journal of Educational Measurement, 2019
This article describes an ongoing project to develop a formative, inferential reading comprehension assessment of causal story comprehension. It has three features to enhance classroom use: equated scale scores for progress monitoring within and across grades, a scale score to distinguish among low-scoring students based on patterns of mistakes,…
Descriptors: Formative Evaluation, Reading Comprehension, Story Reading, Test Construction
Duckor, Brent; Holmberg, Carrie – Journal of Educational Measurement, 2019
A robust body of evidence supports the finding that particular teaching and assessment strategies in the K-12 classroom can improve student achievement. While experts have identified many effective teaching and learning practices in the assessment for learning literature, teachers' knowledge and use of "high leverage" formative…
Descriptors: Formative Evaluation, Beginning Teachers, Science Teachers, Preservice Teachers
Heritage, Margaret; Kingston, Neal M. – Journal of Educational Measurement, 2019
Classroom assessment and large-scale assessment have, for the most part, existed in mutual isolation. Some experts have felt this is for the best and others have been concerned that the schism limits the potential contribution of both forms of assessment. Margaret Heritage has long been a champion of best practices in classroom assessment. Neal…
Descriptors: Measurement, Psychometrics, Context Effect, Classroom Environment
Chen, Yi-Hsin; Senk, Sharon L.; Thompson, Denisse R.; Voogt, Kevin – Journal of Educational Measurement, 2019
The van Hiele theory and van Hiele Geometry Test have been extensively used in mathematics assessments across countries. The purpose of this study is to use classical test theory (CTT) and cognitive diagnostic modeling (CDM) frameworks to examine psychometric properties of the van Hiele Geometry Test and to compare how various classification…
Descriptors: Geometry, Mathematics Tests, Test Theory, Psychometrics
Hopster-den Otter, Dorien; Wools, Saskia; Eggen, Theo J. H. M.; Veldkamp, Bernard P. – Journal of Educational Measurement, 2019
In educational practice, test results are used for several purposes. However, validity research is especially focused on the validity of summative assessment. This article aimed to provide a general framework for validating formative assessment. The authors applied the argument-based approach to validation to the context of formative assessment.…
Descriptors: Formative Evaluation, Test Validity, Scores, Inferences
Keuning, Trynke; van Geel, Marieke; Visscher, Adrie; Fox, Jean-Paul – Journal of Educational Measurement, 2019
Data-based decision making (DBDM) is presumed to improve student performance in elementary schools in all subjects. The majority of studies in which DBDM effects have been evaluated have focused on mathematics. A hierarchical multiple single-subject design was used to measure effects of a 2-year training, in which entire school teams learned how…
Descriptors: Data, Decision Making, Elementary School Students, Mathematics Instruction
Leighton, Jacqueline P. – Journal of Educational Measurement, 2019
If K-12 students are to be fully integrated as active participants in their own learning, understanding how they interpret formative assessment feedback is needed. The objective of this article is to advance three claims about why teachers and assessment scholars/specialists may have little understanding of students' interpretation of formative…
Descriptors: Elementary Secondary Education, Formative Evaluation, Feedback (Response), Student Attitudes
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019
Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…
Descriptors: Rating Scales, Models, Evaluators, Data Collection