NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 637 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020
This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…
Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020
Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…
Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020
Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…
Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling
Peer reviewed Peer reviewed
Direct linkDirect link
Peabody, Michael R. – Applied Measurement in Education, 2020
The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…
Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Confrey, Jere; Toutkoushian, Emily; Shah, Meetal – Applied Measurement in Education, 2019
Fully articulating validation arguments in the context of classroom assessment requires connecting evidence from multiple sources and addressing multiple types of validity in a coherent chain of reasoning. This type of validation argument is particularly complex for assessments that function in close proximity to instruction, address the fine…
Descriptors: Test Validity, Item Response Theory, Middle School Students, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Ketterlin-Geller, Leanne R.; Perry, Lindsey; Adams, Elizabeth – Applied Measurement in Education, 2019
Despite the call for an argument-based approach to validity over 25 years ago, few examples exist in the published literature. One possible explanation for this outcome is that the complexity of the argument-based approach makes implementation difficult. To counter this claim, we propose that the Assessment Triangle can serve as the overarching…
Descriptors: Validity, Educational Assessment, Models, Screening Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jacobson, Erik; Svetina, Dubravka – Applied Measurement in Education, 2019
Contingent argument-based approaches to validity require a unique argument for each use, in contrast to more prescriptive approaches that identify the common kinds of validity evidence researchers should consider for every use. In this article, we evaluate our use of an approach that is both prescriptive "and" argument-based to develop a…
Descriptors: Test Validity, Test Items, Test Construction, Test Interpretation
Peer reviewed Peer reviewed
Direct linkDirect link
Carney, Michele; Crawford, Angela; Siebert, Carl; Osguthorpe, Rich; Thiede, Keith – Applied Measurement in Education, 2019
The "Standards for Educational and Psychological Testing" recommend an argument-based approach to validation that involves a clear statement of the intended interpretation and use of test scores, the identification of the underlying assumptions and inferences in that statement--termed the interpretation/use argument, and gathering of…
Descriptors: Inquiry, Test Interpretation, Validity, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Krupa, Erin Elizabeth; Carney, Michele; Bostic, Jonathan – Applied Measurement in Education, 2019
This article provides a brief introduction to the set of four articles in the special issue. To provide a foundation for the issue, key terms are defined, a brief historical overview of validity is provided, and a description of several different validation approaches used in the issue are explained. Finally, the contribution of the articles to…
Descriptors: Test Items, Program Validation, Test Validity, Mathematics Education
Peer reviewed Peer reviewed
Direct linkDirect link
Cohen, Dale J.; Zhang, Jin; Wothke, Werner – Applied Measurement in Education, 2019
Construct-irrelevant cognitive complexity of some items in the statewide grade-level assessments may impose performance barriers for students with disabilities who are ineligible for alternate assessments based on alternate achievement standards. This has spurred research into whether items can be modified to reduce complexity without affecting…
Descriptors: Test Items, Accessibility (for Disabled), Students with Disabilities, Low Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2019
In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates…
Descriptors: Item Response Theory, Item Banks, Test Items, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019
As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…
Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Applied Measurement in Education, 2019
The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response…
Descriptors: Guessing (Tests), Identification, Reaction Time, Reliability
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  43