NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 1,237 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2018
Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…
Descriptors: Weighted Scores, Error of Measurement, Test Use, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Zwick, Rebecca; Ye, Lei; Isham, Steven – Journal of Educational Measurement, 2018
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the…
Descriptors: Test Bias, Testing, Test Items, Bayesian Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Madison, Matthew J.; Bradshaw, Laine – Journal of Educational Measurement, 2018
The evaluation of intervention effects is an important objective of educational research. One way to evaluate the effectiveness of an intervention is to conduct an experiment that assigns individuals to control and treatment groups. In the context of pretest/posttest designed studies, this is referred to as a control-group pretest/posttest design.…
Descriptors: Intervention, Program Evaluation, Program Effectiveness, Control Groups
Peer reviewed Peer reviewed
Direct linkDirect link
Fay, Derek M.; Levy, Roy; Mehta, Vandhana – Journal of Educational Measurement, 2018
A common practice in educational assessment is to construct multiple forms of an assessment that consists of tasks with similar psychometric properties. This study utilizes a Bayesian multilevel item response model and descriptive graphical representations to evaluate the psychometric similarity of variations of the same task. These approaches for…
Descriptors: Psychometrics, Performance Based Assessment, Bayesian Statistics, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Chih-Kai; Zhang, Jinming – Journal of Educational Measurement, 2018
Under the generalizability-theory (G-theory) framework, the estimation precision of variance components (VCs) is of significant importance in that they serve as the foundation of estimating reliability. Zhang and Lin advanced the discussion of nonadditivity in data from a theoretical perspective and showed the adverse effects of nonadditivity on…
Descriptors: Generalizability Theory, Reliability, Computation, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Embretson, Susan E.; Kingston, Neal M. – Journal of Educational Measurement, 2018
The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time-consuming review processes. In two studies,…
Descriptors: Mathematics Instruction, Mathematics Achievement, Psychometrics, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Sora; Bolt, Daniel M. – Journal of Educational Measurement, 2018
Both the statistical and interpretational shortcomings of the three-parameter logistic (3PL) model in accommodating guessing effects on multiple-choice items are well documented. We consider the use of a residual heteroscedasticity (RH) model as an alternative, and compare its performance to the 3PL with real test data sets and through simulation…
Descriptors: Statistical Analysis, Models, Guessing (Tests), Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Vonkova, Hana; Zamarro, Gema; Hitt, Collin – Journal of Educational Measurement, 2018
Self-reports are an indispensable source of information in education research but they are often affected by heterogeneity in reporting behavior. Failing to correct for this heterogeneity can lead to invalid comparisons across groups. The researchers use the parametric anchoring vignette method to correct for cross-country incomparability of…
Descriptors: Vignettes, Educational Research, Achievement Tests, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Soo; Suh, Youngsuk – Journal of Educational Measurement, 2018
Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect…
Descriptors: Item Response Theory, Sample Size, Models, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Luo, Xiao; Kim, Doyoung – Journal of Educational Measurement, 2018
The top-down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route-based top-down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The…
Descriptors: Computer Assisted Testing, Test Construction, Decision Making, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Bolt, Daniel M.; Kim, Jee-Seon – Journal of Educational Measurement, 2018
Cognitive diagnosis models (CDMs) typically assume skill attributes with discrete (often binary) levels of skill mastery, making the existence of skill continuity an anticipated form of model misspecification. In this article, misspecification due to skill continuity is argued to be of particular concern for several CDM applications due to the…
Descriptors: Cognitive Measurement, Models, Mastery Learning, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2018
Range restrictions, or raters' tendency to limit their ratings to a subset of available rating scale categories, are well documented in large-scale teacher evaluation systems based on principal observations. When these restrictions occur, the ratings observed during operational teacher evaluations are limited to a subset of the available…
Descriptors: Measurement, Classroom Environment, Observation, Rating Scales
Peer reviewed Peer reviewed
Direct linkDirect link
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Baldwin, Peter; Margolis, Melissa J.; Bucak, Deniz; Jodoin, Michael; Walsh, William; Haist, Steven – Journal of Educational Measurement, 2018
Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how…
Descriptors: Comparative Analysis, Observation, Medical Education, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Direct linkDirect link
Andrich, David; Marais, Ida – Journal of Educational Measurement, 2018
Even though guessing biases difficulty estimates as a function of item difficulty in the dichotomous Rasch model, assessment programs with tests which include multiple-choice items often construct scales using this model. Research has shown that when all items are multiple-choice, this bias can largely be eliminated. However, many assessments have…
Descriptors: Multiple Choice Tests, Test Items, Guessing (Tests), Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip – Journal of Educational Measurement, 2018
The value-added method of Haberman is arguably one of the most popular methods to evaluate the quality of subscores. The method is based on the classical test theory and deems a subscore to be of added value if the subscore predicts the corresponding true subscore better than does the total score. Sinharay provided an interpretation of the added…
Descriptors: Scores, Value Added Models, Raw Scores, Item Response Theory
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  83