Publication Date
In 2024 | 1 |
Since 2023 | 2 |
Since 2020 (last 5 years) | 7 |
Since 2015 (last 10 years) | 26 |
Since 2005 (last 20 years) | 67 |
Descriptor
Scores | 71 |
Foreign Countries | 30 |
Comparative Analysis | 23 |
Test Items | 18 |
Item Response Theory | 17 |
Test Bias | 14 |
Measurement | 13 |
Mathematics Tests | 12 |
Correlation | 11 |
Psychometrics | 11 |
Reliability | 11 |
More ▼ |
Source
International Journal of… | 71 |
Author
Publication Type
Journal Articles | 71 |
Reports - Research | 49 |
Reports - Evaluative | 15 |
Reports - Descriptive | 6 |
Opinion Papers | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 12 |
Secondary Education | 10 |
Elementary Education | 7 |
Postsecondary Education | 7 |
Grade 4 | 4 |
Grade 8 | 4 |
High Schools | 4 |
Junior High Schools | 3 |
Intermediate Grades | 2 |
Middle Schools | 2 |
Adult Education | 1 |
More ▼ |
Audience
Location
Canada | 5 |
China | 5 |
Germany | 4 |
United States | 3 |
Australia | 2 |
France | 2 |
Greece | 2 |
Iran | 2 |
Israel | 2 |
Philippines | 2 |
United Kingdom | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Karoline A. Sachse; Sebastian Weirich; Nicole Mahler; Camilla Rjosk – International Journal of Testing, 2024
In order to ensure content validity by covering a broad range of content domains, the testing times of some educational large-scale assessments last up to a total of two hours or more. Performance decline over the course of taking the test has been extensively documented in the literature. It can occur due to increases in the numbers of: (a)…
Descriptors: Test Wiseness, Test Score Decline, Testing Problems, Foreign Countries
Magarotto Machado, Gisele; Hauck-Filho, Nelson; Lima-Costa, Ariela Raissa; Carvalho, Lucas de Francisco – International Journal of Testing, 2023
In the current study, we used latent profile analysis to investigate the Dimensional Clinical Personality Inventory 2 capacity to discriminate psychopathy traits in a sample of adults. Participants were 628 adults from the general population recruited by convenience. Our latent profile analysis recovered two groups: Psychopathic Tendencies and…
Descriptors: Adults, Psychopathology, Empathy, Comparative Analysis
Karakolidis, Anastasios; O'Leary, Michael; Scully, Darina – International Journal of Testing, 2021
The linguistic complexity of many text-based tests can be a source of construct-irrelevant variance, as test-takers' performance may be affected by factors that are beyond the focus of the assessment itself, such as reading comprehension skills. This experimental study examined the extent to which the use of animated videos, as opposed to written…
Descriptors: Animation, Vignettes, Video Technology, Test Format
Roschmann, Sarina; Witmer, Sara E.; Volker, Martin A. – International Journal of Testing, 2021
Accommodations are commonly provided to address language-related barriers students may experience during testing. Research on the validity of scores from accommodated test administrations remains somewhat inconclusive. The current study investigated item response patterns to understand whether accommodations, as used in practice among English…
Descriptors: Testing Accommodations, English Language Learners, Scores, Item Response Theory
Ramirez, Anely; Koljatic, Mladen; Silva, Monica – International Journal of Testing, 2020
The study addresses the association between coaching practices and university admission test performance in Chile. Estimates of coaching effects are reported for test-takers from the private and public school systems. Our results indicate that coaching is associated with variations in test scores. The estimated magnitude of coaching appears to…
Descriptors: Foreign Countries, College Entrance Examinations, Test Preparation, Coaching (Performance)
Walker, A. Adrienne; Wind, Stefanie A. – International Journal of Testing, 2020
Researchers apply individual person fit analyses as a procedure for checking model-data fit for individual test-takers. When a test-taker "misfits," it means that the inferences from their test score regarding what they know and can do may not be accurate. One problem in applying individual person fit procedures in practice is the…
Descriptors: Test Items, Scores, Achievement, Item Response Theory
Wise, Steven L.; Soland,, James; Bo, Yuanchao – International Journal of Testing, 2020
Disengaged test taking tends to be most prevalent with low-stakes tests. This has led to questions about the validity of aggregated scores from large-scale international assessments such as PISA and TIMSS, as previous research has found a meaningful correlation between the mean engagement and mean performance of countries. The current study, using…
Descriptors: Foreign Countries, International Assessment, Achievement Tests, Secondary School Students
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Adding Value to Second-Language Listening and Reading Subscores: Using a Score Augmentation Approach
Papageorgiou, Spiros; Choi, Ikkyu – International Journal of Testing, 2018
This study examined whether reporting subscores for groups of items within a test section assessing a second-language modality (specifically reading or listening comprehension) added value from a measurement perspective to the information already provided by the section scores. We analyzed the responses of 116,489 test takers to reading and…
Descriptors: Second Language Learning, Second Language Instruction, English (Second Language), Language Tests
Wu, Amery D.; Chen, Michelle Y.; Stone, Jake E. – International Journal of Testing, 2018
This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition…
Descriptors: Test Wiseness, Reading Tests, Reading Comprehension, Difficulty Level
Kajonius, Petri J.; Dåderman, Anna M. – International Journal of Testing, 2017
Previous research has long advocated that emotional and behavioral disorders are related to general personality traits, such as the Five Factor Model (FFM). The addition of section III in the latest "Diagnostic and Statistical Manual of Mental Disorders" (DSM) recommends that extremity in personality traits together with maladaptive…
Descriptors: Personality Problems, Empathy, Personality Traits, Scores
Moshinsky, Avital; Ziegler, David; Gafni, Naomi – International Journal of Testing, 2017
Many medical schools have adopted multiple mini-interviews (MMI) as an advanced selection tool. MMIs are expensive and used to test only a few dozen candidates per day, making it infeasible to develop a different test version for each test administration. Therefore, some items are reused both within and across years. This study investigated the…
Descriptors: Interviews, Medical Schools, Test Validity, Test Reliability
Kunina-Habenicht, Olga; Rupp, André A.; Wilhelm, Oliver – International Journal of Testing, 2017
Diagnostic classification models (DCMs) hold great potential for applications in summative and formative assessment by providing discrete multivariate proficiency scores that yield statistically driven classifications of students. Using data from a newly developed diagnostic arithmetic assessment that was administered to 2032 fourth-grade students…
Descriptors: Grade 4, Foreign Countries, Classification, Mathematics Tests
Rios, Joseph A.; Guo, Hongwen; Mao, Liyang; Liu, Ou Lydia – International Journal of Testing, 2017
When examinees' test-taking motivation is questionable, practitioners must determine whether careless responding is of practical concern and if so, decide on the best approach to filter such responses. As there has been insufficient research on these topics, the objectives of this study were to: a) evaluate the degree of underestimation in the…
Descriptors: Response Style (Tests), Scores, Motivation, Computation
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores