NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 644 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Peabody, Michael R. – Applied Measurement in Education, 2020
The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…
Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020
This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…
Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020
Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…
Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Applied Measurement in Education, 2020
In achievement testing there is typically a practical requirement that the set of items administered should be representative of some target content domain. This is accomplished by establishing test blueprints specifying the content constraints to be followed when selecting the items for a test. Sometimes, however, students give disengaged…
Descriptors: Test Items, Test Content, Achievement Tests, Guessing (Tests)
Peer reviewed Peer reviewed
Direct linkDirect link
Traynor, Anne; Li, Tingxuan; Zhou, Shuqi – Applied Measurement in Education, 2020
During the development of large-scale school achievement tests, panels of independent subject-matter experts use systematic judgmental methods to rate the correspondence between a given test's items and performance objective statements. The individual experts' ratings may then be used to compute summary indices to quantify the match between a…
Descriptors: Alignment (Education), Achievement Tests, Curriculum, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Abbakumov, Dmitry; Desmet, Piet; Van den Noortgate, Wim – Applied Measurement in Education, 2020
Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students' proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might…
Descriptors: Item Response Theory, Formative Evaluation, Online Courses, Response Style (Tests)
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Hansol; Chung, Huy Q.; Zhang, Yu; Abedi, Jamal; Warschauer, Mark – Applied Measurement in Education, 2020
In the present article, we present a systematical review of previous empirical studies that conducted formative assessment interventions to improve student learning. Previous meta-analysis research on the overall effects of formative assessment on student learning has been conclusive, but little has been studied on important features of formative…
Descriptors: Student Evaluation, Formative Evaluation, Elementary Secondary Education, Effect Size
Peer reviewed Peer reviewed
Direct linkDirect link
Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020
The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…
Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
El Masri, Yasmine H.; Andrich, David – Applied Measurement in Education, 2020
In large-scale educational assessments, it is generally required that tests are composed of items that function invariantly across the groups to be compared. Despite efforts to ensure invariance in the item construction phase, for a range of reasons (including the security of items) it is often necessary to account for differential item…
Descriptors: Models, Goodness of Fit, Test Validity, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E. – Applied Measurement in Education, 2020
This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…
Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)
Peer reviewed Peer reviewed
Direct linkDirect link
Confrey, Jere; Toutkoushian, Emily; Shah, Meetal – Applied Measurement in Education, 2019
Fully articulating validation arguments in the context of classroom assessment requires connecting evidence from multiple sources and addressing multiple types of validity in a coherent chain of reasoning. This type of validation argument is particularly complex for assessments that function in close proximity to instruction, address the fine…
Descriptors: Test Validity, Item Response Theory, Middle School Students, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Ketterlin-Geller, Leanne R.; Perry, Lindsey; Adams, Elizabeth – Applied Measurement in Education, 2019
Despite the call for an argument-based approach to validity over 25 years ago, few examples exist in the published literature. One possible explanation for this outcome is that the complexity of the argument-based approach makes implementation difficult. To counter this claim, we propose that the Assessment Triangle can serve as the overarching…
Descriptors: Validity, Educational Assessment, Models, Screening Tests
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  43