NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational…1278
What Works Clearinghouse Rating
Showing 1 to 15 of 1,278 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; McBride, James R. – Journal of Educational Measurement, 2021
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch-based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the…
Descriptors: Item Response Theory, Computer Assisted Testing, Adaptive Testing, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Henninger, Mirka – Journal of Educational Measurement, 2021
Item Response Theory models with varying thresholds are essential tools to account for unknown types of response tendencies in rating data. However, in order to separate constructs to be measured and response tendencies, specific constraints have to be imposed on varying thresholds and their interrelations. In this article, a multidimensional…
Descriptors: Response Style (Tests), Item Response Theory, Models, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021
For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…
Descriptors: Scores, Regression (Statistics), Demography, Data
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L.; Kuhfeld, Megan R. – Journal of Educational Measurement, 2021
There has been a growing research interest in the identification and management of disengaged test taking, which poses a validity threat that is particularly prevalent with low-stakes tests. This study investigated effort-moderated (E-M) scoring, in which item responses classified as rapid guesses are identified and excluded from scoring. Using…
Descriptors: Scoring, Data Use, Response Style (Tests), Guessing (Tests)
Peer reviewed Peer reviewed
Direct linkDirect link
DeCarlo, Lawrence T.; Zhou, Xiaoliang – Journal of Educational Measurement, 2021
In signal detection rater models for constructed response (CR) scoring, it is assumed that raters discriminate equally well between different latent classes defined by the scoring rubric. An extended model that relaxes this assumption is introduced; the model recognizes that a rater may not discriminate equally well between some of the scoring…
Descriptors: Scoring, Models, Bias, Perception
Peer reviewed Peer reviewed
Direct linkDirect link
Baldwin, Peter; Yaneva, Victoria; Mee, Janet; Clauser, Brian E.; Ha, Le An – Journal of Educational Measurement, 2021
In this article, it is shown how item text can be represented by (a) 113 features quantifying the text's linguistic characteristics, (b) 16 measures of the extent to which an information-retrieval-based automatic question-answering system finds an item challenging, and (c) through dense word representations (word embeddings). Using a random…
Descriptors: Natural Language Processing, Prediction, Item Response Theory, Reaction Time
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Kyung Yong – Journal of Educational Measurement, 2020
New items are often evaluated prior to their operational use to obtain item response theory (IRT) item parameter estimates for quality control purposes. Fixed parameter calibration is one linking method that is widely used to estimate parameters for new items and place them on the desired scale. This article provides detailed descriptions of two…
Descriptors: Item Response Theory, Evaluation Methods, Test Items, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…
Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2020
In equating, smoothing techniques are frequently used to diminish sampling error. There are typically two types of smoothing: presmoothing and postsmoothing. For polynomial log-linear presmoothing, an optimum smoothing degree can be determined statistically based on the Akaike information criterion or Chi-square difference criterion. For…
Descriptors: Equated Scores, Sampling, Error of Measurement, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Maeda, Hotaka; Zhang, Bo – Journal of Educational Measurement, 2020
When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods…
Descriptors: Bayesian Statistics, Robustness (Statistics), Computation, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Chun; Chen, Ping; Jiang, Shengyu – Journal of Educational Measurement, 2020
Many large-scale educational surveys have moved from linear form design to multistage testing (MST) design. One advantage of MST is that it can provide more accurate latent trait [theta] estimates using fewer items than required by linear tests. However, MST generates incomplete response data by design; hence, questions remain as to how to…
Descriptors: Test Construction, Test Items, Adaptive Testing, Maximum Likelihood Statistics
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
Testing programs are often interested in using a student growth measure. This article presents analytic derivations of the accuracy of common student growth measures on both the raw scale of the test and the percentile rank scale in terms of the proportional reduction in mean squared error and the squared correlation between the estimator and…
Descriptors: Student Evaluation, Accuracy, Testing, Student Development
Peer reviewed Peer reviewed
Direct linkDirect link
Langenfeld, Thomas; Thomas, Jay; Zhu, Rongchun; Morris, Carrie A. – Journal of Educational Measurement, 2020
An assessment of graphic literacy was developed by articulating and subsequently validating a skills-based cognitive model intended to substantiate the plausibility of score interpretations. Model validation involved use of multiple sources of evidence derived from large-scale field testing and cognitive labs studies. Data from large-scale field…
Descriptors: Evidence, Scores, Eye Movements, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Chen, Chia-Wen; Wang, Wen-Chung; Chiu, Ming Ming; Ro, Sage – Journal of Educational Measurement, 2020
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020
An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…
Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  86