Publication Date
| In 2015 | 4 |
| Since 2014 | 20 |
| Since 2011 (last 5 years) | 79 |
| Since 2006 (last 10 years) | 177 |
Descriptor
| Foreign Countries | 73 |
| Test Items | 48 |
| Comparative Analysis | 40 |
| Scores | 39 |
| Item Response Theory | 38 |
| Test Bias | 35 |
| Measures (Individuals) | 30 |
| Psychometrics | 30 |
| Models | 28 |
| Evaluation Methods | 26 |
| More ▼ | |
Source
| International Journal of… | 177 |
Author
| Bartram, Dave | 5 |
| Ercikan, Kadriye | 5 |
| Buckendahl, Chad W. | 4 |
| Oliveri, Maria Elena | 4 |
| Sijtsma, Klaas | 4 |
| Byrne, Barbara M. | 3 |
| Cui, Ying | 3 |
| Finney, Sara J. | 3 |
| Geisinger, Kurt F. | 3 |
| Sireci, Stephen G. | 3 |
| More ▼ | |
Publication Type
| Journal Articles | 177 |
| Reports - Research | 88 |
| Reports - Evaluative | 52 |
| Reports - Descriptive | 34 |
| Information Analyses | 2 |
| Opinion Papers | 2 |
| Guides - Non-Classroom | 1 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 36 |
| Postsecondary Education | 18 |
| Secondary Education | 14 |
| Elementary Secondary Education | 13 |
| Elementary Education | 12 |
| High Schools | 11 |
| Grade 4 | 7 |
| Grade 8 | 6 |
| Intermediate Grades | 6 |
| Grade 3 | 4 |
| More ▼ | |
Audience
Showing 1 to 15 of 177 results
Wei, Hua; Lin, Jie – International Journal of Testing, 2015
Out-of-level testing refers to the practice of assessing a student with a test that is intended for students at a higher or lower grade level. Although the appropriateness of out-of-level testing for accountability purposes has been questioned by educators and policymakers, incorporating out-of-level items in formative assessments for accurate…
Descriptors: Test Items, Computer Assisted Testing, Adaptive Testing, Instructional Program Divisions
Cui, Ying; Mousavi, Amin – International Journal of Testing, 2015
The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…
Descriptors: Measurement, Achievement Tests, Comparative Analysis, Test Items
Rindermann, Heiner; Baumeister, Antonia E. E. – International Journal of Testing, 2015
Scholastic tests regard cognitive abilities to be domain-specific competences. However, high correlations between competences indicate either high task similarity or a dependence on common factors. The present rating study examined the validity of 12 Programme for International Student Assessment (PISA) and Third or Trends in International…
Descriptors: Test Validity, Test Interpretation, Competence, Reading Tests
Baghaei, Purya; Aryadoust, Vahid – International Journal of Testing, 2015
Research shows that test method can exert a significant impact on test takers' performance and thereby contaminate test scores. We argue that common test method can exert the same effect as common stimuli and violate the conditional independence assumption of item response theory models because, in general, subsets of items which have a…
Descriptors: Test Format, Item Response Theory, Models, Test Items
Almond, Russell G. – International Journal of Testing, 2014
Assessments consisting of only a few extended constructed response items (essays) are not typically equated using anchor test designs as there are typically too few essay prompts in each form to allow for meaningful equating. This article explores the idea that output from an automated scoring program designed to measure writing fluency (a common…
Descriptors: Automation, Equated Scores, Writing Tests, Essay Tests
Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…
Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)
Jurich, Daniel P.; Bradshaw, Laine P. – International Journal of Testing, 2014
The assessment of higher-education student learning outcomes is an important component in understanding the strengths and weaknesses of academic and general education programs. This study illustrates the application of diagnostic classification models, a burgeoning set of statistical models, in assessing student learning outcomes. To facilitate…
Descriptors: College Outcomes Assessment, Classification, Statistical Analysis, Models
Oliveri, Maria Elena; von Davier, Matthias – International Journal of Testing, 2014
In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often…
Descriptors: Test Bias, Scores, International Programs, Educational Assessment
Oliveri, María Elena; Ercikan, Kadriye; Zumbo, Bruno D.; Lawless, René – International Journal of Testing, 2014
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response…
Descriptors: Test Bias, Comparative Analysis, Reading Tests, Effect Size
Fischer, Sebastian; Freund, Philipp Alexander – International Journal of Testing, 2014
The Adaption-Innovation Inventory (AII), originally developed by Kirton (1976), is a widely used self-report instrument for measuring problem-solving styles at work. The present study investigates how scores on the AII are affected by different response styles. Data are collected from a combined sample (N = 738) of students, employees, and…
Descriptors: Measures (Individuals), Scores, Item Response Theory, Response Style (Tests)
Kan, Adnan; Bulut, Okan – International Journal of Testing, 2014
This study investigated whether the linguistic complexity of items leads to gender differential item functioning (DIF) on mathematics assessments. Two forms of a mathematics test were developed. The first form consisted of algebra items based on mathematical expressions, terms, and equations. In the second form, the same items were written as word…
Descriptors: Gender Differences, Test Bias, Difficulty Level, Test Items
Allalouf, Avi – International Journal of Testing, 2014
The Quality Control (QC) Guidelines are intended to increase the efficiency, precision, and accuracy of the scoring, analysis, and reporting process of testing. The QC Guidelines focus on large-scale testing operations where multiple forms of tests are created for use on set dates. However, they may also be used for a wide variety of other testing…
Descriptors: Quality Control, Scoring, Test Theory, Scores
Chu, Man-Wai; Babenko, Oksana; Cui, Ying; Leighton, Jacqueline P. – International Journal of Testing, 2014
The study examines the role that perceptions or impressions of learning environments and assessments play in students' performance on a large-scale standardized test. Hierarchical linear modeling (HLM) was used to test aspects of the Learning Errors and Formative Feedback model to determine how much variation in students' performance was…
Descriptors: Hierarchical Linear Modeling, Secondary School Students, Student Attitudes, Educational Environment
Quaiser-Pohl, Claudia; Neuburger, Sarah; Heil, Martin; Jansen, Petra; Schmelter, Andrea – International Journal of Testing, 2014
This article presents a reanalysis of the data of 862 second and fourth graders collected in two previous studies, focusing on the influence of method (psychometric vs. chronometric) and stimulus type on the gender difference in mental-rotation accuracy. The children had to solve mental-rotation tasks with animal pictures, letters, or cube…
Descriptors: Foreign Countries, Gender Differences, Accuracy, Age Differences
Byrne, Barbara M.; van de Vijver, Fons J. R. – International Journal of Testing, 2014
In cross-cultural research, there is a tendency for researchers to draw inferences at the country level based on individual-level data. Such action implicitly and often mistakenly assumes that both the measuring instrument and its underlying construct(s) are operating equivalently across both levels. Based on responses from 5,482 college students…
Descriptors: Factor Structure, Measures (Individuals), Cross Cultural Studies, Structural Equation Models

Peer reviewed
Direct link
