NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Journal of Educational…1306
What Works Clearinghouse Rating
Showing 1 to 15 of 1,306 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022
In operational testing, item response theory (IRT) models for dichotomous responses are popular for measuring a single latent construct [theta], such as cognitive ability in a content domain. Estimates of [theta], also called IRT scores or [theta hat], can be computed using estimators based on the likelihood function, such as maximum likelihood…
Descriptors: Scores, Item Response Theory, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Gorney, Kylie; Wollack, James A. – Journal of Educational Measurement, 2022
Detection methods for item preknowledge are often evaluated in simulation studies where models are used to generate the data. To ensure the reliability of such methods, it is crucial that these models are able to accurately represent situations that are encountered in practice. The purpose of this article is to provide a critical analysis of…
Descriptors: Prior Learning, Simulation, Models, Reaction Time
Peer reviewed Peer reviewed
Direct linkDirect link
Lim, Hwanggyu; Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022
Differential item functioning (DIF) of test items should be evaluated using practical methods that can produce accurate and useful results. Among a plethora of DIF detection techniques, we introduce the new "Residual DIF" (RDIF) framework, which stands out for its accessibility without sacrificing efficacy. This framework consists of…
Descriptors: Test Items, Item Response Theory, Identification, Robustness (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Kuan-Yu; Siu, Wai-Lok; Huang, Xiaoting – Journal of Educational Measurement, 2022
Multiple-choice (MC) items are widely used in educational tests. Distractor analysis, an important procedure for checking the utility of response options within an MC item, can be readily implemented in the framework of item response theory (IRT). Although random guessing is a popular behavior of test-takers when answering MC items, none of the…
Descriptors: Guessing (Tests), Multiple Choice Tests, Item Response Theory, Attention
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Dongmei – Journal of Educational Measurement, 2022
Equating error is usually small relative to the magnitude of measurement error, but it could be one of the major sources of error contributing to mean scores of large groups in educational measurement, such as the year-to-year state mean score fluctuations. Though testing programs may routinely calculate the standard error of equating (SEE), the…
Descriptors: Error Patterns, Educational Testing, Group Testing, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Hyung Jin; Lee, Won-Chan – Journal of Educational Measurement, 2022
Orlando and Thissen (2000) introduced the "S - X[superscript 2]" item-fit index for testing goodness-of-fit with dichotomous item response theory (IRT) models. This study considers and evaluates an alternative approach for computing "S - X[superscript 2]" values and other factors associated with collapsing tables of observed…
Descriptors: Goodness of Fit, Test Items, Item Response Theory, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Jinghua; Becker, Kirk – Journal of Educational Measurement, 2022
For any testing programs that administer multiple forms across multiple years, maintaining score comparability via equating is essential. With continuous testing and high-stakes results, especially with less secure online administrations, testing programs must consider the potential for cheating on their exams. This study used empirical and…
Descriptors: Cheating, Item Response Theory, Scores, High Stakes Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022
This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…
Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Puhan, Gautam; Kim, Sooyeon – Journal of Educational Measurement, 2022
As a result of the COVID-19 pandemic, at-home testing has become a popular delivery mode in many testing programs. When programs offer at-home testing to expand their service, the score comparability between test takers testing remotely and those testing in a test center is critical. This article summarizes statistical procedures that could be…
Descriptors: Scores, Scoring, Comparative Analysis, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Moses, Tim – Journal of Educational Measurement, 2022
One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different…
Descriptors: Measures (Individuals), Educational Assessment, Test Construction, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Wyse, Adam E.; McBride, James R. – Journal of Educational Measurement, 2021
A key consideration when giving any computerized adaptive test (CAT) is how much adaptation is present when the test is used in practice. This study introduces a new framework to measure the amount of adaptation of Rasch-based CATs based on looking at the differences between the selected item locations (Rasch item difficulty parameters) of the…
Descriptors: Item Response Theory, Computer Assisted Testing, Adaptive Testing, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Henninger, Mirka – Journal of Educational Measurement, 2021
Item Response Theory models with varying thresholds are essential tools to account for unknown types of response tendencies in rating data. However, in order to separate constructs to be measured and response tendencies, specific constraints have to be imposed on varying thresholds and their interrelations. In this article, a multidimensional…
Descriptors: Response Style (Tests), Item Response Theory, Models, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021
For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…
Descriptors: Scores, Regression (Statistics), Demography, Data
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L.; Kuhfeld, Megan R. – Journal of Educational Measurement, 2021
There has been a growing research interest in the identification and management of disengaged test taking, which poses a validity threat that is particularly prevalent with low-stakes tests. This study investigated effort-moderated (E-M) scoring, in which item responses classified as rapid guesses are identified and excluded from scoring. Using…
Descriptors: Scoring, Data Use, Response Style (Tests), Guessing (Tests)
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  88