Publication Date
| In 2015 | 6 |
| Since 2014 | 30 |
| Since 2011 (last 5 years) | 105 |
| Since 2006 (last 10 years) | 204 |
| Since 1996 (last 20 years) | 377 |
Descriptor
| Test Items | 266 |
| Test Construction | 176 |
| Item Response Theory | 173 |
| Test Reliability | 156 |
| Scores | 149 |
| Test Validity | 147 |
| Higher Education | 135 |
| Comparative Analysis | 132 |
| Statistical Analysis | 116 |
| Models | 113 |
| More ▼ | |
Author
| Linn, Robert L. | 16 |
| Wainer, Howard | 16 |
| van der Linden, Wim J. | 15 |
| Dorans, Neil J. | 14 |
| Kolen, Michael J. | 14 |
| Bridgeman, Brent | 12 |
| Hambleton, Ronald K. | 12 |
| Livingston, Samuel A. | 12 |
| Sinharay, Sandip | 12 |
| Clauser, Brian E. | 10 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 7 |
| Higher Education | 7 |
| High Schools | 6 |
| Secondary Education | 6 |
| Middle Schools | 4 |
| Postsecondary Education | 4 |
| Grade 8 | 3 |
| Elementary Education | 2 |
| Grade 10 | 1 |
| Grade 4 | 1 |
| More ▼ | |
Audience
| Researchers | 21 |
| Practitioners | 4 |
| Teachers | 1 |
Showing 91 to 105 of 1,152 results
Wang, Chun; Chang, Hua-Hua; Huebner, Alan – Journal of Educational Measurement, 2011
This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback-Leibler (KL) information index but include additional stochastic components either in the item selection index or in…
Descriptors: Test Items, Adaptive Testing, Computer Assisted Testing, Cognitive Tests
Wiberg, Marie; van der Linden, Wim J. – Journal of Educational Measurement, 2011
Two methods of local linear observed-score equating for use with anchor-test and single-group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed-score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980)…
Descriptors: Equated Scores, Statistical Analysis, Comparative Analysis, Statistical Bias
Rutkowski, Leslie – Journal of Educational Measurement, 2011
Although population modeling methods are well established, a paucity of literature appears to exist regarding the effect of missing background data on subpopulation achievement estimates. Using simulated data that follows typical large-scale assessment designs with known parameters and a number of missing conditions, this paper examines the extent…
Descriptors: Data, Computation, Measurement, Achievement
van der Linden, Wim J.; Diao, Qi – Journal of Educational Measurement, 2011
In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
Descriptors: Test Items, Test Format, Test Construction, Item Banks
Suh, Youngsuk; Bolt, Daniel M. – Journal of Educational Measurement, 2011
In multiple-choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two-step approach based on…
Descriptors: Test Items, Test Bias, Multiple Choice Tests, Simulation
Sinharay, Sandip; Haberman, Shelby J. – Journal of Educational Measurement, 2011
Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman (2008b) suggested reporting an augmented subscore that is a linear combination of a subscore and the total score. Sinharay and Haberman (2008) and Sinharay (2010) showed that augmented subscores often lead to more accurate…
Descriptors: Diagnostic Tests, Psychometrics, Testing, Equated Scores
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Zwick, Rebecca; Himelfarb, Igor – Journal of Educational Measurement, 2011
Research has often found that, when high school grades and SAT scores are used to predict first-year college grade-point average (FGPA) via regression analysis, African-American and Latino students, are, on average, predicted to earn higher FGPAs than they actually do. Under various plausible models, this phenomenon can be explained in terms of…
Descriptors: Socioeconomic Status, Grades (Scholastic), Error of Measurement, White Students
Wang, Changjiang; Gierl, Mark J. – Journal of Educational Measurement, 2011
The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees' test item responses into a set of attribute mastery patterns associated with…
Descriptors: Reading Comprehension, Test Items, Critical Reading, Protocol Analysis
van der Linden, Wim J.; Jeon, Minjeong; Ferrara, Steve – Journal of Educational Measurement, 2011
According to a popular belief, test takers should trust their initial instinct and retain their initial responses when they have the opportunity to review test items. More than 80 years of empirical research on item review, however, has contradicted this belief and shown minor but consistently positive score gains for test takers who changed…
Descriptors: Test Items, Item Response Theory, Test Wiseness, Beliefs
Branberg, Kenny; Wiberg, Marie – Journal of Educational Measurement, 2011
This paper examined observed score linear equating in two different data collection designs, the equivalent groups design and the nonequivalent groups design, when information from covariates (i.e., background variables correlated with the test scores) was included. The main purpose of the study was to examine the effect (i.e., bias, variance, and…
Descriptors: Equated Scores, Data Collection, Models, Accuracy
Leckie, George; Baird, Jo-Anne – Journal of Educational Measurement, 2011
This study examined rater effects on essay scoring in an operational monitoring system from England's 2008 national curriculum English writing test for 14-year-olds. We fitted two multilevel models and analyzed: (1) drift in rater severity effects over time; (2) rater central tendency effects; and (3) differences in rater severity and central…
Descriptors: Scoring, Foreign Countries, National Curriculum, Writing Tests
Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Curley, Edward; Feigenbaum, Miriam – Journal of Educational Measurement, 2011
This study explores an anchor that is different from the traditional miniature anchor in test score equating. In contrast to a traditional "mini" anchor that has the same spread of item difficulties as the tests to be equated, the studied anchor, referred to as a "midi" anchor (Sinharay & Holland), has a smaller spread of item difficulties than…
Descriptors: Equated Scores, Case Studies, College Entrance Examinations, Test Items
Wang, Wen-Chung; Wu, Shiu-Lien – Journal of Educational Measurement, 2011
Rating scale items have been widely used in educational and psychological tests. These items require people to make subjective judgments, and these subjective judgments usually involve randomness. To account for this randomness, Wang, Wilson, and Shih proposed the random-effect rating scale model in which the threshold parameters are treated as…
Descriptors: Rating Scales, Models, Statistical Analysis, Computation
Alexeev, Natalia; Templin, Jonathan; Cohen, Allan S. – Journal of Educational Measurement, 2011
Mixture Rasch models have been used to study a number of psychometric issues such as goodness of fit, response strategy differences, strategy shifts, and multidimensionality. Although these models offer the potential for improving understanding of the latent variables being measured, under some conditions overextraction of latent classes may…
Descriptors: Item Response Theory, Models, Psychometrics, Tests

Peer reviewed
Direct link
