Publication Date
| In 2015 | 1 |
| Since 2014 | 6 |
| Since 2011 (last 5 years) | 19 |
| Since 2006 (last 10 years) | 40 |
| Since 1996 (last 20 years) | 47 |
Descriptor
| Comparative Analysis | 47 |
| Foreign Countries | 18 |
| Scores | 16 |
| Test Bias | 13 |
| Item Response Theory | 12 |
| Evaluation Methods | 11 |
| Psychometrics | 11 |
| Test Items | 11 |
| Testing | 11 |
| Measurement | 9 |
| More ▼ | |
Source
| International Journal of… | 47 |
Author
| Ercikan, Kadriye | 4 |
| Oliveri, Maria Elena | 3 |
| Zumbo, Bruno D. | 3 |
| Byrne, Barbara M. | 2 |
| Cascallar, Alicia S. | 2 |
| Dorans, Neil J. | 2 |
| Wang, Ning | 2 |
| Abdelfattah, Faisal | 1 |
| Andersson, Gerhard | 1 |
| Austin, David W. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 47 |
| Reports - Research | 22 |
| Reports - Evaluative | 16 |
| Reports - Descriptive | 7 |
| Information Analyses | 2 |
Education Level
| Higher Education | 6 |
| Elementary Education | 4 |
| Grade 4 | 4 |
| Secondary Education | 4 |
| High Schools | 3 |
| Adult Education | 2 |
| Grade 8 | 2 |
| Intermediate Grades | 2 |
| Elementary Secondary Education | 1 |
| Grade 2 | 1 |
| More ▼ | |
Audience
Showing 1 to 15 of 47 results
Cui, Ying; Mousavi, Amin – International Journal of Testing, 2015
The current study applied the person-fit statistic, l[subscript z], to data from a Canadian provincial achievement test to explore the usefulness of conducting person-fit analysis on large-scale assessments. Item parameter estimates were compared before and after the misfitting student responses, as identified by l[subscript z], were removed. The…
Descriptors: Measurement, Achievement Tests, Comparative Analysis, Test Items
Sinharay, Sandip; Haberman, Shelby J. – International Journal of Testing, 2014
Recently there has been an increasing level of interest in subtest scores, or subscores, for their potential diagnostic value. Haberman (2008) suggested a method to determine if a subscore has added value over the total score. Researchers have often been interested in the performance of subgroups--for example, those based on gender or…
Descriptors: Scores, Achievement Tests, Language Tests, English (Second Language)
Oliveri, Maria Elena; von Davier, Matthias – International Journal of Testing, 2014
In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often…
Descriptors: Test Bias, Scores, International Programs, Educational Assessment
Oliveri, María Elena; Ercikan, Kadriye; Zumbo, Bruno D.; Lawless, René – International Journal of Testing, 2014
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response…
Descriptors: Test Bias, Comparative Analysis, Reading Tests, Effect Size
Quaiser-Pohl, Claudia; Neuburger, Sarah; Heil, Martin; Jansen, Petra; Schmelter, Andrea – International Journal of Testing, 2014
This article presents a reanalysis of the data of 862 second and fourth graders collected in two previous studies, focusing on the influence of method (psychometric vs. chronometric) and stimulus type on the gender difference in mental-rotation accuracy. The children had to solve mental-rotation tasks with animal pictures, letters, or cube…
Descriptors: Foreign Countries, Gender Differences, Accuracy, Age Differences
Rios, Joseph A.; Sireci, Stephen G. – International Journal of Testing, 2014
The International Test Commission's "Guidelines for Translating and Adapting Tests" (2010) provide important guidance on developing and evaluating tests for use across languages. These guidelines are widely applauded, but the degree to which they are followed in practice is unknown. The objective of this study was to perform a…
Descriptors: Guidelines, Translation, Adaptive Testing, Second Languages
Fine, Saul – International Journal of Testing, 2013
While psychological tests are used extensively in Israel, the current controls over testing practices in Israel deserve some attention. Specifically, unlike in some European countries and the United States, (a) no specific certifications are offered to Israeli psychologists in the area of testing; (b) Israeli psychologists are not obligated to…
Descriptors: Foreign Countries, Psychological Testing, Psychologists, Attitudes
Talento-Miller, Eileen; Guo, Fanmin; Han, Kyung T. – International Journal of Testing, 2013
When power tests include a time limit, it is important to assess the possibility of speededness for examinees. Past research on differential speededness has examined gender and ethnic subgroups in the United States on paper and pencil tests. When considering the needs of a global audience, research regarding different native language speakers is…
Descriptors: Adaptive Testing, Computer Assisted Testing, English, Scores
Makransky, Guido; Glas, Cees A. W. – International Journal of Testing, 2013
Cognitive ability tests are widely used in organizations around the world because they have high predictive validity in selection contexts. Although these tests typically measure several subdomains, testing is usually carried out for a single subdomain at a time. This can be ineffective when the subdomains assessed are highly correlated. This…
Descriptors: Foreign Countries, Cognitive Ability, Adaptive Testing, Feedback (Response)
Sandilands, Debra; Oliveri, Maria Elena; Zumbo, Bruno D.; Ercikan, Kadriye – International Journal of Testing, 2013
International large-scale assessments of achievement often have a large degree of differential item functioning (DIF) between countries, which can threaten score equivalence and reduce the validity of inferences based on comparisons of group performances. It is important to understand potential sources of DIF to improve the validity of future…
Descriptors: Validity, Measures (Individuals), International Studies, Foreign Countries
Evers, Arne – International Journal of Testing, 2012
In this article, the characteristics of five test review models are described. The five models are the US review system at the Buros Center for Testing, the German Test Review System of the Committee on Tests, the Brazilian System for the Evaluation of Psychological Tests, the European EFPA Review Model, and the Dutch COTAN Evaluation System for…
Descriptors: Program Evaluation, Test Reviews, Trend Analysis, International Education
Dodeen, Hamzeh; Abdelfattah, Faisal; Shumrani, Saleh; Hilal, Maher Abu – International Journal of Testing, 2012
This study focused on comparing mathematics teachers' qualifications, practices, and perceptions between Saudi and Taiwanese schools. Data analyzed in this study were the responses of mathematics teachers to the Teacher Background Questionnaire--8th Grade from the Trends in International Mathematics and Science Study (TIMSS) in 2007. The Saudi…
Descriptors: Grade 8, Teacher Background, Mathematics Teachers, Educational Environment
Mucherah, Winnie; Finch, W. Holmes; Keaikitse, Setlhomo – International Journal of Testing, 2012
Understanding adolescent self-concept is of great concern for educators, mental health professionals, and parents, as research consistently demonstrates that low self-concept is related to a number of problem behaviors and poor outcomes. Thus, accurate measurements of self-concept are key, and the validity of such measurements, including the…
Descriptors: Test Bias, Mental Health Workers, Validity, Self Concept Measures
Oliveri, Maria Elena; Olson, Brent F.; Ercikan, Kadriye; Zumbo, Bruno D. – International Journal of Testing, 2012
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item…
Descriptors: Foreign Students, Test Bias, Speech Communication, Effect Size
Wang, Ning; Stahl, John – International Journal of Testing, 2012
This article discusses the use of the Many-Facets Rasch Model, via the FACETS computer program (Linacre, 2006a), to scale job/practice analysis survey data as well as to combine multiple rating scales into single composite weights representing the tasks' relative importance. Results from the Many-Facets Rasch Model are compared with those…
Descriptors: Job Analysis, Surveys, Rating Scales, Scaling

Peer reviewed
Direct link
