NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational…1350
What Works Clearinghouse Rating
Showing 121 to 135 of 1,350 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Peabody, Michael R.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard-setting panels should have the proper qualifications to make the judgments asked…
Descriptors: Standard Setting, Decision Making, Performance Based Assessment, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Berger, Stéphanie; Verschoor, Angela J.; Eggen, Theo J. H. M.; Moser, Urs – Journal of Educational Measurement, 2019
Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that…
Descriptors: Simulation, Computer Assisted Testing, Test Items, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Ip, Edward H.; Strachan, Tyler; Fu, Yanyan; Lay, Alexandra; Willse, John T.; Chen, Shyh-Huei; Rutkowski, Leslie; Ackerman, Terry – Journal of Educational Measurement, 2019
Test items must often be broad in scope to be ecologically valid. It is therefore almost inevitable that secondary dimensions are introduced into a test during test development. A cognitive test may require one or more abilities besides the primary ability to correctly respond to an item, in which case a unidimensional test score overestimates the…
Descriptors: Test Items, Test Bias, Test Construction, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Ju, Unhee; Falk, Carl F. – Journal of Educational Measurement, 2019
We examined the feasibility and results of a multilevel multidimensional nominal response model (ML-MNRM) for measuring both substantive constructs and extreme response style (ERS) across countries. The ML-MNRM considers within-country clustering while allowing overall item slopes to vary across items and examination of whether certain items were…
Descriptors: Cross Cultural Studies, Self Efficacy, Item Response Theory, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Zhang, Xue; Tao, Jian; Wang, Chun; Shi, Ning-Zhong – Journal of Educational Measurement, 2019
Model selection is important in any statistical analysis, and the primary goal is to find the preferred (or most parsimonious) model, based on certain criteria, from a set of candidate models given data. Several recent publications have employed the deviance information criterion (DIC) to do model selection among different forms of multilevel item…
Descriptors: Bayesian Statistics, Item Response Theory, Measurement, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019
One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…
Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Wenyi; Song, Lihong; Chen, Ping; Ding, Shuliang – Journal of Educational Measurement, 2019
Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input…
Descriptors: Cognitive Tests, Classification, Accuracy, Diagnostic Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Svetina, Dubravka; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2019
This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s)…
Descriptors: Measurement, Item Analysis, Test Construction, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Wu, Qian; De Laet, Tinne; Janssen, Rianne – Journal of Educational Measurement, 2019
Single-best answers to multiple-choice items are commonly dichotomized into correct and incorrect responses, and modeled using either a dichotomous item response theory (IRT) model or a polytomous one if differences among all response options are to be retained. The current study presents an alternative IRT-based modeling approach to…
Descriptors: Multiple Choice Tests, Item Response Theory, Test Items, Responses
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019
When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…
Descriptors: Measurement, Models, Evaluators, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Man, Kaiwen; Harring, Jeffrey R.; Sinharay, Sandip – Journal of Educational Measurement, 2019
Data mining methods have drawn considerable attention across diverse scientific fields. However, few applications could be found in the areas of psychological and educational measurement, and particularly pertinent to this article, in test security research. In this study, various data mining methods for detecting cheating behaviors on large-scale…
Descriptors: Information Retrieval, Data Analysis, Identification, Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019
Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…
Descriptors: Item Response Theory, Models, Scores, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019
The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…
Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Wolkowitz, Amanda A.; Wright, Keith D. – Journal of Educational Measurement, 2019
This article explores the amount of equating error at a passing score when equating scores from exams with small samples sizes. This article focuses on equating using classical test theory methods of Tucker linear, Levine linear, frequency estimation, and chained equipercentile equating. Both simulation and real data studies were used in the…
Descriptors: Error Patterns, Sample Size, Test Theory, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Bergner, Yoav; Choi, Ikkyu; Castellano, Katherine E. – Journal of Educational Measurement, 2019
Allowance for multiple chances to answer constructed response questions is a prevalent feature in computer-based homework and exams. We consider the use of item response theory in the estimation of item characteristics and student ability when multiple attempts are allowed but no explicit penalty is deducted for extra tries. This is common…
Descriptors: Models, Item Response Theory, Homework, Computer Assisted Instruction
Pages: 1  |  ...  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  12  |  13  |  ...  |  90