Publication Date
| In 2015 | 5 |
| Since 2014 | 25 |
| Since 2011 (last 5 years) | 71 |
| Since 2006 (last 10 years) | 170 |
| Since 1996 (last 20 years) | 359 |
Descriptor
Source
| Applied Measurement in… | 520 |
Author
| Hambleton, Ronald K. | 15 |
| Plake, Barbara S. | 9 |
| Shavelson, Richard J. | 9 |
| Sireci, Stephen G. | 9 |
| Ercikan, Kadriye | 8 |
| Engelhard, George, Jr. | 7 |
| Feldt, Leonard S. | 7 |
| Linn, Robert L. | 7 |
| Pomplun, Mark | 7 |
| Wise, Steven L. | 7 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 30 |
| Grade 8 | 21 |
| High Schools | 21 |
| Higher Education | 21 |
| Secondary Education | 19 |
| Elementary Education | 17 |
| Grade 5 | 16 |
| Middle Schools | 14 |
| Grade 4 | 13 |
| Grade 3 | 12 |
| More ▼ | |
Audience
| Researchers | 3 |
| Teachers | 2 |
| Administrators | 1 |
Showing 61 to 75 of 520 results
Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011
The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…
Descriptors: Test Items, Item Analysis, Achievement Tests, French
Brennan, Robert L. – Applied Measurement in Education, 2011
Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory
Oliveri, Maria E.; Ercikan, Kadriye – Applied Measurement in Education, 2011
In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of…
Descriptors: Foreign Countries, English, French, Tests
Kim, Sooyeon; Livingston, Samuel A.; Lewis, Charles – Applied Measurement in Education, 2011
This article describes a preliminary investigation of an empirical Bayes (EB) procedure for using collateral information to improve equating of scores on test forms taken by small numbers of examinees. Resampling studies were done on two different forms of the same test. In each study, EB and non-EB versions of two equating methods--chained linear…
Descriptors: Sample Size, Equated Scores, Bayesian Statistics, Accuracy
Van Nijlen, Daniel; Janssen, Rianne – Applied Measurement in Education, 2011
The distinction between quantitative and qualitative differences in mastery is essential when monitoring student progress and is crucial for instructional interventions to deal with learning difficulties. Mixture item response theory (IRT) models can provide a convenient way to make the distinction between quantitative and qualitative differences…
Descriptors: Spelling, Indo European Languages, Vowels, Verbal Tests
Finch, Holmes – Applied Measurement in Education, 2011
Methods of uniform differential item functioning (DIF) detection have been extensively studied in the complete data case. However, less work has been done examining the performance of these methods when missing item responses are present. Research that has been done in this regard appears to indicate that treating missing item responses as…
Descriptors: Test Bias, Data Analysis, Error of Measurement
Leighton, Jacqueline P.; Heffernan, Colleen; Cor, M. Kenneth; Gokiert, Rebecca J.; Cui, Ying – Applied Measurement in Education, 2011
The "Standards for Educational and Psychological Testing" indicate that test instructions, and by extension item objectives, presented to examinees should be sufficiently clear and detailed to help ensure that they respond as developers intend them to respond (Standard 3.20; AERA, APA, & NCME, 1999). The present study investigates the use of…
Descriptors: Test Construction, Validity, Evidence, Science Tests
Demars, Christine E. – Applied Measurement in Education, 2011
Three types of effects sizes for DIF are described in this exposition: log of the odds-ratio (differences in log-odds), differences in probability-correct, and proportion of variance accounted for. Using these indices involves conceptualizing the degree of DIF in different ways. This integrative review discusses how these measures are impacted in…
Descriptors: Effect Size, Test Bias, Probability, Difficulty Level
Kettler, Ryan J.; Rodriguez, Michael C.; Bolt, Daniel M.; Elliott, Stephen N.; Beddow, Peter A.; Kurz, Alexander – Applied Measurement in Education, 2011
Federal policy on alternate assessment based on modified academic achievement standards (AA-MAS) inspired this research. Specifically, an experimental study was conducted to determine whether tests composed of modified items would have the same level of reliability as tests composed of original items, and whether these modified items helped reduce…
Descriptors: Multiple Choice Tests, Test Items, Alternative Assessment, Test Reliability
Woods, Carol M. – Applied Measurement in Education, 2011
This research introduces, illustrates, and tests a variation of IRT-LR-DIF, called EH-DIF-2, in which the latent density for each group is estimated simultaneously with the item parameters as an empirical histogram (EH). IRT-LR-DIF is used to evaluate the degree to which items have different measurement properties for one group of people versus…
Descriptors: Test Bias, Item Response Theory, Test Items, Measurement
Liu, Ou Lydia – Applied Measurement in Education, 2011
The TOEFL[R] iBT has increased the length of each reading passage to better approximate academic reading at North American universities, resulting in a reduction in the number of passages on the reading section of the test. One of the concerns brought about by this change is whether the decrease in topic variety increases the likelihood that an…
Descriptors: Language Tests, Reading Tests, English (Second Language), Test Bias
Kim, HeeKyoung; Kolen, Michael J. – Applied Measurement in Education, 2010
Test equating might be affected by including in the equating analyses examinees who have taken the test previously. This study evaluated the effect of including such repeaters on Medical College Admission Test (MCAT) equating using a population invariance approach. Three-parameter logistic (3-PL) item response theory (IRT) true score and…
Descriptors: Repetition, Equated Scores, College Entrance Examinations, Medical Schools
Kane, Michael T.; Mroch, Andrew A. – Applied Measurement in Education, 2010
In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit (minimizing…
Descriptors: Least Squares Statistics, Regression (Statistics), Differences, Validity
Puhan, Gautam; Sinharay, Sandip; Haberman, Shelby; Larkin, Kevin – Applied Measurement in Education, 2010
Will subscores provide additional information than what is provided by the total score? Is there a method that can estimate more trustworthy subscores than observed subscores? To answer the first question, this study evaluated whether the true subscore was more accurately predicted by the observed subscore or total score. To answer the second…
Descriptors: Licensing Examinations (Professions), Scores, Computation, Methods
Randall, Jennifer; Engelhard, George, Jr. – Applied Measurement in Education, 2010
The psychometric properties and multigroup measurement invariance of scores across subgroups, items, and persons on the "Reading for Meaning" items from the Georgia Criterion Referenced Competency Test (CRCT) were assessed in a sample of 778 seventh-grade students. Specifically, we sought to determine the extent to which score-based inferences on…
Descriptors: Testing Accommodations, Test Items, Learning Disabilities, Factor Analysis

Peer reviewed
Direct link
