Publication Date
| In 2015 | 6 |
| Since 2014 | 30 |
| Since 2011 (last 5 years) | 105 |
| Since 2006 (last 10 years) | 204 |
| Since 1996 (last 20 years) | 377 |
Descriptor
| Test Items | 266 |
| Test Construction | 176 |
| Item Response Theory | 173 |
| Test Reliability | 156 |
| Scores | 149 |
| Test Validity | 147 |
| Higher Education | 135 |
| Comparative Analysis | 132 |
| Statistical Analysis | 116 |
| Models | 113 |
| More ▼ | |
Author
| Linn, Robert L. | 16 |
| Wainer, Howard | 16 |
| van der Linden, Wim J. | 15 |
| Dorans, Neil J. | 14 |
| Kolen, Michael J. | 14 |
| Bridgeman, Brent | 12 |
| Hambleton, Ronald K. | 12 |
| Livingston, Samuel A. | 12 |
| Sinharay, Sandip | 12 |
| Clauser, Brian E. | 10 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 7 |
| Higher Education | 7 |
| High Schools | 6 |
| Secondary Education | 6 |
| Middle Schools | 4 |
| Postsecondary Education | 4 |
| Grade 8 | 3 |
| Elementary Education | 2 |
| Grade 10 | 1 |
| Grade 4 | 1 |
| More ▼ | |
Audience
| Researchers | 21 |
| Practitioners | 4 |
| Teachers | 1 |
Showing 136 to 150 of 1,152 results
van der Linden, Wim J. – Journal of Educational Measurement, 2009
Two different traditions of response-time (RT) modeling are reviewed: the tradition of distinct models for RTs and responses, and the tradition of model integration in which RTs are incorporated in response models or the other way around. Several conceptual issues underlying both traditions are made explicit and analyzed for their consequences. We…
Descriptors: Test Items, Models, Reaction Time, Measurement
Gierl, Mark J.; Cui, Ying; Zhou, Jiawen – Journal of Educational Measurement, 2009
The attribute hierarchy method (AHM) is a psychometric procedure for classifying examinees' test item responses into a set of structured attribute patterns associated with different components from a cognitive model of task performance. Results from an AHM analysis yield information on examinees' cognitive strengths and weaknesses. Hence, the AHM…
Descriptors: Test Items, True Scores, Psychometrics, Algebra
Puhan, Gautam; Moses, Timothy P.; Grant, Mary C.; McHale, Frederick – Journal of Educational Measurement, 2009
A single-group (SG) equating with nearly equivalent test forms (SiGNET) design was developed by Grant to equate small-volume tests. Under this design, the scored items for the operational form are divided into testlets or mini tests. An additional testlet is created but not scored for the first form. If the scored testlets are testlets 1-6 and the…
Descriptors: Equated Scores, Test Construction, Measurement, Measures (Individuals)
Livingston, Samuel A.; Kim, Sooyeon – Journal of Educational Measurement, 2009
This article suggests a method for estimating a test-score equating relationship from small samples of test takers. The method does not require the estimated equating transformation to be linear. Instead, it constrains the estimated equating curve to pass through two pre-specified end points and a middle point determined from the data. In a…
Descriptors: Measurement, Measurement Techniques, Psychometrics, Sample Size
Meyer, J. Patrick; Setzer, J. Carl – Journal of Educational Measurement, 2009
Recent changes to federal guidelines for the collection of data on race and ethnicity allow respondents to select multiple race categories. Redefining race subgroups in this manner poses problems for research spanning both sets of definitions. NAEP long-term trends have used the single-race subgroup definitions for over thirty years. Little is…
Descriptors: Elementary Secondary Education, Federal Legislation, Simulation, Maximum Likelihood Statistics
Finkelman, Matthew; Nering, Michael L.; Roussos, Louis A. – Journal of Educational Measurement, 2009
In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed…
Descriptors: Test Items, Adaptive Testing, Item Analysis, Item Response Theory
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Nungester, Ronald J.; Swanson, Dave; Nandakumar, Ratna – Journal of Educational Measurement, 2009
The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE[R] Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to…
Descriptors: Generalizability Theory, Performance Based Assessment, Performance Tests, Clinical Experience
Randall, Jennifer; Engelhard, George, Jr. – Journal of Educational Measurement, 2009
In this study, we present an approach to questionnaire design within educational research based on Guttman's mapping sentences and Many-Facet Rasch Measurement Theory. We designed a 54-item questionnaire using Guttman's mapping sentences to examine the grading practices of teachers. Each item in the questionnaire represented a unique student…
Descriptors: Student Evaluation, Educational Research, Grades (Scholastic), Public School Teachers
Zimmerman, Donald W. – Journal of Educational Measurement, 2009
This study was an investigation of the relation between the reliability of difference scores, considered as a parameter characterizing a population of examinees, and the reliability estimates obtained from random samples from the population. The parameters in familiar equations for the reliability of difference scores were redefined in such a way…
Descriptors: Computer Simulation, Reliability, Population Groups, Scores
Puhan, Gautam; Moses, Timothy P.; Yu, Lei; Dorans, Neil J. – Journal of Educational Measurement, 2009
This study examined the extent to which log-linear smoothing could improve the accuracy of differential item functioning (DIF) estimates in small samples of examinees. Examinee responses from a certification test were analyzed using White examinees in the reference group and African American examinees in the focal group. Using a simulation…
Descriptors: Test Items, Reference Groups, Testing Programs, Raw Scores
Culpepper, Steven A.; Davenport, Ernest C. – Journal of Educational Measurement, 2009
Previous research notes the importance of understanding racial/ethnic differential prediction of college grades across multiple institutions. Institutional variation in selection indices is especially important given some states' laws governing public institutions' admissions decisions. This paper employed multilevel moderated multiple regression…
Descriptors: Prediction, College Students, Grades (Scholastic), Race
Cui, Zhongmin; Kolen, Michael J. – Journal of Educational Measurement, 2009
This article considers two new smoothing methods in equipercentile equating, the cubic B-spline presmoothing method and the direct presmoothing method. Using a simulation study, these two methods are compared with established methods, the beta-4 method, the polynomial loglinear method, and the cubic spline postsmoothing method, under three sample…
Descriptors: Equated Scores, Methods, Sample Size, Test Content
Muckle, Timothy J.; Karabatsos, George – Journal of Educational Measurement, 2009
It is known that the Rasch model is a special two-level hierarchical generalized linear model (HGLM). This article demonstrates that the many-faceted Rasch model (MFRM) is also a special case of the two-level HGLM, with a random intercept representing examinee ability on a test, and fixed effects for the test items, judges, and possibly other…
Descriptors: Test Items, Item Response Theory, Models, Regression (Statistics)
Yao, Lihua; Boughton, Keith – Journal of Educational Measurement, 2009
Numerous assessments contain a mixture of multiple choice (MC) and constructed response (CR) item types and many have been found to measure more than one trait. Thus, there is a need for multidimensional dichotomous and polytomous item response theory (IRT) modeling solutions, including multidimensional linking software. For example,…
Descriptors: Multiple Choice Tests, Responses, Test Items, Item Response Theory
Moses, Tim; Holland, Paul W. – Journal of Educational Measurement, 2009
In this study, we compared 12 statistical strategies proposed for selecting loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. The major focus was on evaluating the effects of the selection strategies on equating function accuracy. Selection strategies' influence…
Descriptors: Equated Scores, Selection, Statistical Analysis, Models

Peer reviewed
Direct link
