Publication Date
| In 2015 | 1 |
| Since 2014 | 4 |
| Since 2011 (last 5 years) | 10 |
| Since 2006 (last 10 years) | 29 |
| Since 1996 (last 20 years) | 38 |
Descriptor
| Test Bias | 102 |
| Test Items | 36 |
| Models | 19 |
| Item Analysis | 18 |
| Higher Education | 15 |
| Comparative Analysis | 14 |
| Item Response Theory | 14 |
| Scores | 14 |
| Evaluation Methods | 13 |
| Test Validity | 13 |
| More ▼ | |
Source
| Journal of Educational… | 102 |
Author
| Linn, Robert L. | 4 |
| Novick, Melvin R. | 4 |
| Penfield, Randall D. | 4 |
| Goldman, Roy D. | 3 |
| Kim, Sooyeon | 3 |
| Bolt, Daniel M. | 2 |
| Camilli, Gregory | 2 |
| Chase, Clinton I. | 2 |
| Darlington, Richard B. | 2 |
| Dorans, Neil J. | 2 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 1 |
| Grade 4 | 1 |
| Grade 8 | 1 |
| Higher Education | 1 |
| Postsecondary Education | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 3 |
Showing 1 to 15 of 102 results
Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example
Li, Xiaomin; Wang, Wen-Chung – Journal of Educational Measurement, 2015
The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…
Descriptors: Test Bias, Models, Cognitive Measurement, Evaluation Methods
Hou, Likun; de la Torre, Jimmy; Nandakumar, Ratna – Journal of Educational Measurement, 2014
Analyzing examinees' responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This…
Descriptors: Test Bias, Models, Simulation, Error Patterns
Naumann, Alexander; Hochweber, Jan; Hartig, Johannes – Journal of Educational Measurement, 2014
Students' performance in assessments is commonly attributed to more or less effective teaching. This implies that students' responses are significantly affected by instruction. However, the assumption that outcome measures indeed are instructionally sensitive is scarcely investigated empirically. In the present study, we propose a…
Descriptors: Test Bias, Longitudinal Studies, Hierarchical Linear Modeling, Test Items
Li, Zhushan – Journal of Educational Measurement, 2014
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)
Albano, Anthony D. – Journal of Educational Measurement, 2013
In many testing programs it is assumed that the context or position in which an item is administered does not have a differential effect on examinee responses to the item. Violations of this assumption may bias item response theory estimates of item and person parameters. This study examines the potentially biasing effects of item position. A…
Descriptors: Test Items, Item Response Theory, Test Format, Questioning Techniques
Pohl, Steffi – Journal of Educational Measurement, 2013
This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for…
Descriptors: Adaptive Testing, Longitudinal Studies, Difficulty Level, Comparative Analysis
Jiao, Hong; Wang, Shudong; He, Wei – Journal of Educational Measurement, 2013
This study demonstrated the equivalence between the Rasch testlet model and the three-level one-parameter testlet model and explored the Markov Chain Monte Carlo (MCMC) method for model parameter estimation in WINBUGS. The estimation accuracy from the MCMC method was compared with those from the marginalized maximum likelihood estimation (MMLE)…
Descriptors: Computation, Item Response Theory, Models, Monte Carlo Methods
Debeer, Dries; Janssen, Rianne – Journal of Educational Measurement, 2013
Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…
Descriptors: Item Response Theory, Test Items, Test Format, Models
Paek, Insu – Journal of Educational Measurement, 2012
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Descriptors: Test Bias, Tests, Maximum Likelihood Statistics, Statistical Analysis
Suh, Youngsuk; Bolt, Daniel M. – Journal of Educational Measurement, 2011
In multiple-choice items, differential item functioning (DIF) in the correct response may or may not be caused by differentially functioning distractors. Identifying distractors as causes of DIF can provide valuable information for potential item revision or the design of new test items. In this paper, we examine a two-step approach based on…
Descriptors: Test Items, Test Bias, Multiple Choice Tests, Simulation
Frederickx, Sofie; Tuerlinckx, Francis; De Boeck, Paul; Magis, David – Journal of Educational Measurement, 2010
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the…
Descriptors: Test Bias, Models, Test Items, Difficulty Level
French, Brian F.; Finch, W. Holmes – Journal of Educational Measurement, 2010
The purpose of this study was to examine the performance of differential item functioning (DIF) assessment in the presence of a multilevel structure that often underlies data from large-scale testing programs. Analyses were conducted using logistic regression (LR), a popular, flexible, and effective tool for DIF detection. Data were simulated…
Descriptors: Test Bias, Testing Programs, Evaluation, Measurement
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
In this study we examined variations of the nonequivalent groups equating design for tests containing both multiple-choice (MC) and constructed-response (CR) items to determine which design was most effective in producing equivalent scores across the two tests to be equated. Using data from a large-scale exam, this study investigated the use of…
Descriptors: Measures (Individuals), Scoring, Equated Scores, Test Bias
Penfield, Randall D. – Journal of Educational Measurement, 2010
In this article, I address two competing conceptions of differential item functioning (DIF) in polytomously scored items. The first conception, referred to as net DIF, concerns between-group differences in the conditional expected value of the polytomous response variable. The second conception, referred to as global DIF, concerns the conditional…
Descriptors: Test Bias, Test Items, Evaluation Methods, Item Response Theory
Kim, Sooyeon; Walker, Michael E.; McHale, Frederick – Journal of Educational Measurement, 2010
Using data from a large-scale exam, in this study we compared various designs for equating constructed-response (CR) tests to determine which design was most effective in producing equivalent scores across the two tests to be equated. In the context of classical equating methods, four linking designs were examined: (a) an anchor set containing…
Descriptors: Equated Scores, Responses, Tests, Measurement

Peer reviewed
Direct link
