NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational…1261
What Works Clearinghouse Rating
Showing 1 to 15 of 1,261 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Kyung Yong – Journal of Educational Measurement, 2020
New items are often evaluated prior to their operational use to obtain item response theory (IRT) item parameter estimates for quality control purposes. Fixed parameter calibration is one linking method that is widely used to estimate parameters for new items and place them on the desired scale. This article provides detailed descriptions of two…
Descriptors: Item Response Theory, Evaluation Methods, Test Items, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…
Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2020
In equating, smoothing techniques are frequently used to diminish sampling error. There are typically two types of smoothing: presmoothing and postsmoothing. For polynomial log-linear presmoothing, an optimum smoothing degree can be determined statistically based on the Akaike information criterion or Chi-square difference criterion. For…
Descriptors: Equated Scores, Sampling, Error of Measurement, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Maeda, Hotaka; Zhang, Bo – Journal of Educational Measurement, 2020
When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods…
Descriptors: Bayesian Statistics, Robustness (Statistics), Computation, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Chun; Chen, Ping; Jiang, Shengyu – Journal of Educational Measurement, 2020
Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait [theta] estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to…
Descriptors: Test Construction, Test Items, Adaptive Testing, Maximum Likelihood Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
Testing programs are often interested in using a student growth measure. This article presents analytic derivations of the accuracy of common student growth measures on both the raw scale of the test and the percentile rank scale in terms of the proportional reduction in mean squared error and the squared correlation between the estimator and…
Descriptors: Student Evaluation, Accuracy, Testing, Student Development
Peer reviewed Peer reviewed
Direct linkDirect link
Langenfeld, Thomas; Thomas, Jay; Zhu, Rongchun; Morris, Carrie A. – Journal of Educational Measurement, 2020
An assessment of graphic literacy was developed by articulating and subsequently validating a skills-based cognitive model intended to substantiate the plausibility of score interpretations. Model validation involved use of multiple sources of evidence derived from large-scale field testing and cognitive labs studies. Data from large-scale field…
Descriptors: Evidence, Scores, Eye Movements, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Chen, Chia-Wen; Wang, Wen-Chung; Chiu, Ming Ming; Ro, Sage – Journal of Educational Measurement, 2020
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Peer reviewed Peer reviewed
Direct linkDirect link
Kaufmann, Esther; Budescu, David V. – Journal of Educational Measurement, 2020
The literature suggests that simple expert (mathematical) models can improve the quality of decisions, but people are not always eager to accept and endorse such models. We ran three online experiments to test the receptiveness to advice from computerized expert models. Middle- and high-school teachers (N = 435) evaluated student profiles that…
Descriptors: Mathematical Models, Computer Uses in Education, Artificial Intelligence, Expertise
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020
The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…
Descriptors: Classification, Accuracy, Scores, Cutting Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Fujimoto, Ken A. – Journal of Educational Measurement, 2020
Multilevel bifactor item response theory (IRT) models are commonly used to account for features of the data that are related to the sampling and measurement processes used to gather those data. These models conventionally make assumptions about the portions of the data structure that represent these features. Unfortunately, when data violate these…
Descriptors: Bayesian Statistics, Item Response Theory, Achievement Tests, Secondary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020
This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…
Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wu, Qian; De Laet, Tinne; Janssen, Rianne – Journal of Educational Measurement, 2019
Single-best answers to multiple-choice items are commonly dichotomized into correct and incorrect responses, and modeled using either a dichotomous item response theory (IRT) model or a polytomous one if differences among all response options are to be retained. The current study presents an alternative IRT-based modeling approach to…
Descriptors: Multiple Choice Tests, Item Response Theory, Test Items, Responses
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  85