Publication Date
| In 2015 | 6 |
| Since 2014 | 30 |
| Since 2011 (last 5 years) | 105 |
| Since 2006 (last 10 years) | 204 |
| Since 1996 (last 20 years) | 377 |
Descriptor
| Test Items | 266 |
| Test Construction | 176 |
| Item Response Theory | 173 |
| Test Reliability | 156 |
| Scores | 149 |
| Test Validity | 147 |
| Higher Education | 135 |
| Comparative Analysis | 132 |
| Statistical Analysis | 116 |
| Models | 113 |
| More ▼ | |
Author
| Linn, Robert L. | 16 |
| Wainer, Howard | 16 |
| van der Linden, Wim J. | 15 |
| Dorans, Neil J. | 14 |
| Kolen, Michael J. | 14 |
| Bridgeman, Brent | 12 |
| Hambleton, Ronald K. | 12 |
| Livingston, Samuel A. | 12 |
| Sinharay, Sandip | 12 |
| Clauser, Brian E. | 10 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 7 |
| Higher Education | 7 |
| High Schools | 6 |
| Secondary Education | 6 |
| Middle Schools | 4 |
| Postsecondary Education | 4 |
| Grade 8 | 3 |
| Elementary Education | 2 |
| Grade 10 | 1 |
| Grade 4 | 1 |
| More ▼ | |
Audience
| Researchers | 21 |
| Practitioners | 4 |
| Teachers | 1 |
Showing 61 to 75 of 1,152 results
Cui, Ying; Gierl, Mark J.; Chang, Hua-Hua – Journal of Educational Measurement, 2012
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by…
Descriptors: Classification, Accuracy, Cognitive Tests, Diagnostic Tests
Jiang, Yanlin; von Davier, Alina A.; Chen, Haiwen – Journal of Educational Measurement, 2012
This article presents a method for evaluating equating results. Within the kernel equating framework, the percent relative error (PRE) for chained equipercentile equating was computed under the nonequivalent groups with anchor test (NEAT) design. The method was applied to two data sets to obtain the PRE, which can be used to measure equating…
Descriptors: Equated Scores, Evaluation, Error of Measurement, Computation
Kunina-Habenicht, Olga; Rupp, Andre A.; Wilhelm, Oliver – Journal of Educational Measurement, 2012
Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item-fit statistics for correct and misspecified diagnostic classification models within a log-linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3…
Descriptors: Classification, Accuracy, Goodness of Fit, Models
Jiao, Hong; Kamata, Akihito; Wang, Shudong; Jin, Ying – Journal of Educational Measurement, 2012
The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet-based assessment, both local item dependence and local person dependence are likely to be induced.…
Descriptors: Item Response Theory, Test Items, Markov Processes, Monte Carlo Methods
Liu, Jinghua; Dorans, Neil J. – Journal of Educational Measurement, 2012
At times, the same set of test questions is administered under different measurement conditions that might affect the psychometric properties of the test scores enough to warrant different score conversions for the different conditions. We propose a procedure for assessing the practical equivalence of conversions developed for the same set of test…
Descriptors: Measurement, Test Items, Psychometrics
Paek, Insu – Journal of Educational Measurement, 2012
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Descriptors: Test Bias, Tests, Maximum Likelihood Statistics, Statistical Analysis
Mislevy, Robert J.; Zwick, Rebecca – Journal of Educational Measurement, 2012
A new entry in the testing lexicon is through-course summative assessment, a system consisting of components administered periodically during the academic year. As defined in the Race to the Top program, these assessments are intended to yield a yearly summative score for accountability purposes. They must provide for both individual and group…
Descriptors: National Competency Tests, Inferences, Item Response Theory, Summative Evaluation
Mislevy, Jessica L.; Rupp, Andre A.; Harring, Jeffrey R. – Journal of Educational Measurement, 2012
A rapidly expanding arena for item response theory (IRT) is in attitudinal and health-outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of…
Descriptors: Test Items, Investigations, Simulation, Adaptive Testing
Li, Deping; Jiang, Yanlin; von Davier, Alina A. – Journal of Educational Measurement, 2012
This study investigates a sequence of item response theory (IRT) true score equatings based on various scale transformation approaches and evaluates equating accuracy and consistency over time. The results show that the biases and sample variances for the IRT true score equating (both direct and indirect) are quite small (except for the mean/sigma…
Descriptors: True Scores, Equated Scores, Item Response Theory, Accuracy
Zu, Jiyun; Yuan, Ke-Hai – Journal of Educational Measurement, 2012
In the nonequivalent groups with anchor test (NEAT) design, the standard error of linear observed-score equating is commonly estimated by an estimator derived assuming multivariate normality. However, real data are seldom normally distributed, causing this normal estimator to be inconsistent. A general estimator, which does not rely on the…
Descriptors: Sample Size, Equated Scores, Test Items, Error of Measurement
Chen, Haiwen – Journal of Educational Measurement, 2012
In this article, linear item response theory (IRT) observed-score equating is compared under a generalized kernel equating framework with Levine observed-score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when…
Descriptors: Tests, Item Response Theory, Equated Scores, Statistical Analysis
Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…
Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory
Ranger, Jochen; Kuhn, Jorg-Tobias – Journal of Educational Measurement, 2012
The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate…
Descriptors: Goodness of Fit, Item Response Theory, Models, Matrices
Han, Kyung T. – Journal of Educational Measurement, 2012
Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Puhan, Gautam – Journal of Educational Measurement, 2012
Tucker and chained linear equatings were evaluated in two testing scenarios. In Scenario 1, referred to as rater comparability scoring and equating, the anchor-to-total correlation is often very high for the new form but moderate for the reference form. This may adversely affect the results of Tucker equating, especially if the new and reference…
Descriptors: Testing, Scoring, Equated Scores, Statistical Analysis

Peer reviewed
Direct link
