Publication Date
| In 2015 | 4 |
| Since 2014 | 20 |
| Since 2011 (last 5 years) | 79 |
| Since 2006 (last 10 years) | 177 |
| Since 1996 (last 20 years) | 278 |
Descriptor
| Foreign Countries | 86 |
| Test Items | 61 |
| Item Response Theory | 51 |
| Psychometrics | 50 |
| Comparative Analysis | 47 |
| Scores | 46 |
| Measures (Individuals) | 42 |
| Models | 41 |
| Test Bias | 38 |
| Evaluation Methods | 36 |
| More ▼ | |
Source
| International Journal of… | 278 |
Author
| Bartram, Dave | 7 |
| Ercikan, Kadriye | 7 |
| Zumbo, Bruno D. | 7 |
| Byrne, Barbara M. | 5 |
| Oakland, Thomas | 5 |
| Sireci, Stephen G. | 5 |
| Buckendahl, Chad W. | 4 |
| Evers, Arne | 4 |
| Gregoire, Jacques | 4 |
| Hambleton, Ronald K. | 4 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 41 |
| Postsecondary Education | 18 |
| Elementary Secondary Education | 15 |
| Secondary Education | 14 |
| Elementary Education | 12 |
| High Schools | 11 |
| Grade 4 | 7 |
| Grade 8 | 6 |
| Intermediate Grades | 6 |
| Grade 3 | 4 |
| More ▼ | |
Audience
| Administrators | 1 |
| Counselors | 1 |
| Parents | 1 |
| Teachers | 1 |
Showing 46 to 60 of 278 results
Geisinger, Kurt F. – International Journal of Testing, 2012
This article sets the stage for the description of a variety of approaches to test reviewing worldwide. It describes the importance of test reviewing as a protection of the public and of society and also the benefits of this activity for test users, who must choose measures to use in particular situations with particular clients at a particular…
Descriptors: Test Reviews, Evaluation Methods, Evaluation Criteria, Global Approach
Evers, Arne – International Journal of Testing, 2012
In this article, the characteristics of five test review models are described. The five models are the US review system at the Buros Center for Testing, the German Test Review System of the Committee on Tests, the Brazilian System for the Evaluation of Psychological Tests, the European EFPA Review Model, and the Dutch COTAN Evaluation System for…
Descriptors: Program Evaluation, Test Reviews, Trend Analysis, International Education
Lindley, Patricia A.; Bartram, Dave – International Journal of Testing, 2012
In this article, we present the background to the development of test reviewing by the British Psychological Society (BPS) in the United Kingdom. We also describe the role played by the BPS in the development of the EFPA test review model and its adaptation for use in test reviewing in the United Kingdom. We conclude with a discussion of lessons…
Descriptors: Test Reviews, Professional Associations, Psychology, Global Approach
Kolen, Michael J.; Wang, Tianyou; Lee, Won-Chan – International Journal of Testing, 2012
Composite scores are often formed from test scores on educational achievement test batteries to provide a single index of achievement over two or more content areas or two or more item types on that test. Composite scores are subject to measurement error, and as with scores on individual tests, the amount of error variability typically depends on…
Descriptors: Mathematics Tests, Achievement Tests, College Entrance Examinations, Error of Measurement
Phelps, Richard P. – International Journal of Testing, 2012
This article summarizes research on the effect of testing on student achievement as found in English-language sources, comprising several hundred studies conducted between 1910 and 2010. Among quantitative studies, mean effect sizes range from a moderate d [image omitted] 0.55 to a fairly large d [image omitted] 0.88, depending on the way effects…
Descriptors: Feedback (Response), Testing, Academic Achievement, Effect Size
Dodeen, Hamzeh; Abdelfattah, Faisal; Shumrani, Saleh; Hilal, Maher Abu – International Journal of Testing, 2012
This study focused on comparing mathematics teachers' qualifications, practices, and perceptions between Saudi and Taiwanese schools. Data analyzed in this study were the responses of mathematics teachers to the Teacher Background Questionnaire--8th Grade from the Trends in International Mathematics and Science Study (TIMSS) in 2007. The Saudi…
Descriptors: Grade 8, Teacher Background, Mathematics Teachers, Educational Environment
Arce, Alvaro J.; Wang, Ze – International Journal of Testing, 2012
The traditional approach to scale modified-Angoff cut scores transfers the raw cuts to an existing raw-to-scale score conversion table. Under the traditional approach, cut scores and conversion table raw scores are not only seen as interchangeable but also as originating from a common scaling process. In this article, we propose an alternative…
Descriptors: Generalizability Theory, Item Response Theory, Cutting Scores, Scaling
Mucherah, Winnie; Finch, W. Holmes; Keaikitse, Setlhomo – International Journal of Testing, 2012
Understanding adolescent self-concept is of great concern for educators, mental health professionals, and parents, as research consistently demonstrates that low self-concept is related to a number of problem behaviors and poor outcomes. Thus, accurate measurements of self-concept are key, and the validity of such measurements, including the…
Descriptors: Test Bias, Mental Health Workers, Validity, Self Concept Measures
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D. – International Journal of Testing, 2012
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Descriptors: Foreign Students, Test Bias, Speech Communication, Effect Size
Duong, Minh Q.; von Davier, Alina A. – International Journal of Testing, 2012
Test equating is a statistical procedure for adjusting for test form differences in difficulty in a standardized assessment. Equating results are supposed to hold for a specified target population (Kolen & Brennan, 2004; von Davier, Holland, & Thayer, 2004) and to be (relatively) independent of the subpopulations from the target population (see…
Descriptors: Ability Grouping, Difficulty Level, Psychometrics, Statistical Analysis
Gattamorta, Karina A.; Penfield, Randall D.; Myers, Nicholas D. – International Journal of Testing, 2012
Measurement invariance is a common consideration in the evaluation of the validity and fairness of test scores when the tested population contains distinct groups of examinees, such as examinees receiving different forms of a translated test. Measurement invariance in polytomous items has traditionally been evaluated at the item-level,…
Descriptors: Foreign Countries, Psychometrics, Test Bias, Test Items
Gierl, Mark J.; Lai, Hollis – International Journal of Testing, 2012
Automatic item generation represents a relatively new but rapidly evolving research area where cognitive and psychometric theories are used to produce tests that include items generated using computer technology. Automatic item generation requires two steps. First, test development specialists create item models, which are comparable to templates…
Descriptors: Foreign Countries, Psychometrics, Test Construction, Test Items
Kim, Sooyeon; Walker, Michael E.; Larkin, Kevin – International Journal of Testing, 2012
We demonstrate how to assess the potential changes to a test's score scale necessitated by changes to the test specifications when a field study is not feasible. We used a licensure test, which is currently under revision, as an example. We created two research forms from an actual form of the test. One research form was developed with the current…
Descriptors: Equated Scores, Licensing Examinations (Professions), Test Reliability, Construct Validity
Wang, Ning; Stahl, John – International Journal of Testing, 2012
This article discusses the use of the Many-Facets Rasch Model, via the FACETS computer program (Linacre, 2006a), to scale job/practice analysis survey data as well as to combine multiple rating scales into single composite weights representing the tasks' relative importance. Results from the Many-Facets Rasch Model are compared with those…
Descriptors: Job Analysis, Surveys, Rating Scales, Scaling
Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012
Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…
Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

Peer reviewed
Direct link
