ERIC - Search Results

Publication Date

In 2024	1
Since 2023	4
Since 2020 (last 5 years)	14
Since 2015 (last 10 years)	30
Since 2005 (last 20 years)	52

Descriptor

Error of Measurement	111
Scores	30
Item Response Theory	29
Test Items	22
Statistical Analysis	19
Test Reliability	19
Estimation (Mathematics)	18
Simulation	18
Mathematical Models	17
True Scores	17
Reliability	16
Sample Size	16
Comparative Analysis	15
Models	15
Equated Scores	12
Accuracy	11
Item Analysis	11
Statistical Bias	11
Correlation	10
Evaluation Methods	10
Test Construction	10
Psychometrics	9
Test Interpretation	9
Test Validity	9
Computation	8
More ▼

Source

Journal of Educational…

111

Publication Type

Journal Articles	97
Reports - Research	56
Reports - Evaluative	35
Reports - Descriptive	5
Speeches/Meeting Papers	3
Book/Product Reviews	1
Guides - Non-Classroom	1
Numerical/Quantitative Data	1

Education Level

Secondary Education	5
Elementary Secondary Education	3
Elementary Education	2
High Schools	2
Higher Education	2
Postsecondary Education	2
Grade 10	1
Grade 4	1
Grade 7	1
Grade 9	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers

Location

South Carolina	1
United Kingdom (England)	1
United Kingdom (Scotland)	1

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

Program for International…	2
ACT Assessment	1
Comprehensive Tests of Basic…	1
Graduate Record Examinations	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
National Longitudinal Study…	1
Progress in International…	1
SAT (College Admission Test)	1
Test of English as a Foreign…	1
Torrance Tests of Creative…	1
Trends in International…	1
United States Medical…	1
Work Keys (ACT)	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 111 results Save | Export

Detecting Multidimensional DIF in Polytomous Items with IRT Methods and Estimation Approaches

Peer reviewed

Direct link

Güler Yavuz Temel – Journal of Educational Measurement, 2024

The purpose of this study was to investigate multidimensional DIF with a simple and nonsimple structure in the context of multidimensional Graded Response Model (MGRM). This study examined and compared the performance of the IRT-LR and Wald test using MML-EM and MHRM estimation approaches with different test factors and test structures in…

Descriptors: Computation, Multidimensional Scaling, Item Response Theory, Models

Using Item Scores and Distractors in Person-Fit Assessment

Peer reviewed

Direct link

Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2023

In order to detect a wide range of aberrant behaviors, it can be useful to incorporate information beyond the dichotomous item scores. In this paper, we extend the l[subscript z] and l*[subscript z] person-fit statistics so that unusual behavior in item scores and unusual behavior in item distractors can be used as indicators of aberrance. Through…

Descriptors: Test Items, Scores, Goodness of Fit, Statistics

A Factor Mixture Model for Item Responses and Certainty of Response Indices to Identify Student Knowledge Profiles

Peer reviewed

Direct link

Chen, Chia-Wen; Andersson, Björn; Zhu, Jinxin – Journal of Educational Measurement, 2023

The certainty of response index (CRI) measures respondents' confidence level when answering an item. In conjunction with the answers to the items, previous studies have used descriptive statistics and arbitrary thresholds to identify student knowledge profiles with the CRIs. Whereas this approach overlooked the measurement error of the observed…

Descriptors: Item Response Theory, Factor Analysis, Psychometrics, Test Items

A Note on Latent Traits Estimates under IRT Models with Missingness

Peer reviewed

Direct link

Guo, Jinxin; Xu, Xin; Xin, Tao – Journal of Educational Measurement, 2023

Missingness due to not-reached items and omitted items has received much attention in the recent psychometric literature. Such missingness, if not handled properly, would lead to biased parameter estimation, as well as inaccurate inference of examinees, and further erode the validity of the test. This paper reviews some commonly used IRT based…

Descriptors: Psychometrics, Bias, Error of Measurement, Test Validity

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

Standard Errors of Variance Components, Measurement Errors and Generalizability Coefficients for Crossed Designs

Peer reviewed

Direct link

Almehrizi, Rashid S. – Journal of Educational Measurement, 2021

Estimates of various variance components, universe score variance, measurement error variances, and generalizability coefficients, like all statistics, are subject to sampling variability, particularly in small samples. Such variability is quantified traditionally through estimated standard errors and/or confidence intervals. The paper derived new…

Descriptors: Error of Measurement, Statistics, Design, Generalizability Theory

Performance of Person-Fit Statistics under Model Misspecification

Peer reviewed

Direct link

Hong, Seong Eun; Monroe, Scott; Falk, Carl F. – Journal of Educational Measurement, 2020

In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects…

Descriptors: Educational Testing, Psychological Testing, Goodness of Fit, Error of Measurement

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Peer reviewed

Direct link

Lee, Sunbok – Journal of Educational Measurement, 2020

In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML…

Descriptors: Regression (Statistics), Test Bias, Maximum Likelihood Statistics, Simulation

A New Statistic for Selecting the Smoothing Parameter for Polynomial Loglinear Equating under the Random Groups Design

Peer reviewed

Direct link

Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2020

Smoothing is designed to yield smoother equating results that can reduce random equating error without introducing very much systematic error. The main objective of this study is to propose a new statistic and to compare its performance to the performance of the Akaike information criterion and likelihood ratio chi-square difference statistics in…

Descriptors: Equated Scores, Statistical Analysis, Error of Measurement, Criteria

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

Classification Consistency and Accuracy with Atypical Score Distributions

Peer reviewed

Direct link

Kim, Stella Y.; Lee, Won-Chan – Journal of Educational Measurement, 2020

The current study aims to evaluate the performance of three non-IRT procedures (i.e., normal approximation, Livingston-Lewis, and compound multinomial) for estimating classification indices when the observed score distribution shows atypical patterns: (a) bimodality, (b) structural (i.e., systematic) bumpiness, or (c) structural zeros (i.e., no…

Descriptors: Classification, Accuracy, Scores, Cutting Scores

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

Sensitivity of the RMSD for Detecting Item-Level Misfit in Low-Performing Countries

Peer reviewed

Direct link

Tijmstra, Jesper; Bolsinova, Maria; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2020

Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered…

Descriptors: Test Items, Goodness of Fit, Probability, Accuracy

A New Statistic to Assess Fitness of Cubic-Spline Postsmoothing

Peer reviewed

Direct link

Kim, Hyung Jin; Brennan, Robert L.; Lee, Won-Chan – Journal of Educational Measurement, 2020

In equating, smoothing techniques are frequently used to diminish sampling error. There are typically two types of smoothing: presmoothing and postsmoothing. For polynomial log-linear presmoothing, an optimum smoothing degree can be determined statistically based on the Akaike information criterion or Chi-square difference criterion. For…

Descriptors: Equated Scores, Sampling, Error of Measurement, Statistical Analysis

Examining Differential Rater Functioning Using a Between-Subgroup Outfit Approach

Peer reviewed

Direct link

Wind, Stefanie A.; Sebok-Syer, Stefanie S. – Journal of Educational Measurement, 2019

When practitioners use modern measurement models to evaluate rating quality, they commonly examine rater fit statistics that summarize how well each rater's ratings fit the expectations of the measurement model. Essentially, this approach involves examining the unexpected ratings that each misfitting rater assigned (i.e., carrying out analyses of…

Descriptors: Measurement, Models, Evaluators, Simulation

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Privacy | Copyright | Contact Us | Selection Policy | API

Kolen, Michael J.	7
Lee, Won-Chan	7
Livingston, Samuel A.	4
Andersson, Björn	3
Lee, Guemin	3
Moses, Tim	3
Brennan, Robert L.	2
Feldt, Leonard S.	2
Hanson, Bradley A.	2
Harris, Chester W.	2
Harris, Deborah J.	2
Kamata, Akihito	2
Kane, Michael	2
Kim, Stella Y.	2
Liu, Chunyan	2
Puhan, Gautam	2
Raymond, Mark R.	2
Roussos, Louis A.	2
Rowley, Glenn L.	2
Rutkowski, Leslie	2
Shang, Yi	2
Shavelson, Richard J.	2
Subkoviak, Michael J.	2
Whitely, Susan E.	2
More ▼