Publication Date
| In 2015 | 5 |
| Since 2014 | 21 |
| Since 2011 (last 5 years) | 54 |
| Since 2006 (last 10 years) | 106 |
| Since 1996 (last 20 years) | 194 |
Descriptor
| Test Items | 96 |
| Scores | 51 |
| Mathematics Tests | 49 |
| Item Response Theory | 48 |
| Comparative Analysis | 41 |
| Test Construction | 40 |
| High School Students | 35 |
| Higher Education | 34 |
| Difficulty Level | 30 |
| Elementary School Students | 29 |
| More ▼ | |
Source
| Applied Measurement in… | 273 |
Author
Publication Type
| Journal Articles | 273 |
| Reports - Research | 273 |
| Speeches/Meeting Papers | 16 |
| Reports - Evaluative | 4 |
| Information Analyses | 2 |
| Collected Works - General | 1 |
Education Level
| Elementary Secondary Education | 20 |
| Grade 8 | 15 |
| High Schools | 15 |
| Secondary Education | 13 |
| Elementary Education | 12 |
| Grade 5 | 12 |
| Higher Education | 11 |
| Middle Schools | 10 |
| Grade 3 | 9 |
| Grade 4 | 8 |
| More ▼ | |
Audience
| Researchers | 2 |
| Teachers | 2 |
| Administrators | 1 |
Showing 1 to 15 of 273 results
Phillips, Gary W. – Applied Measurement in Education, 2015
This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…
Descriptors: State Programs, Sampling, Research Design, Error of Measurement
McClintock, Joseph Clair – Applied Measurement in Education, 2015
Erasure analysis is the study of the pattern or quantity of erasures on multiple-choice paper-and-pencil examinations, to determine whether erasures were made post-testing for the purpose of unfairly increasing students' scores. This study examined the erasure data from over 1.4 million exams, taken by more than 600,000 students. Three…
Descriptors: Multiple Choice Tests, Cheating, Methods, Computation
DeMars, Christine – Applied Measurement in Education, 2015
In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…
Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory
Suh, Youngsuk; Talley, Anna E. – Applied Measurement in Education, 2015
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
Descriptors: Test Bias, Multiple Choice Tests, Test Items, Methods
Nijlen, Daniel Van; Janssen, Rianne – Applied Measurement in Education, 2015
In this study it is investigated to what extent contextualized and non-contextualized mathematics test items have a differential impact on examinee effort. Mixture item response theory (IRT) models are applied to two subsets of items from a national assessment on mathematics in the second grade of the pre-vocational track in secondary education in…
Descriptors: Mathematics Tests, Measurement, Item Response Theory, Test Items
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory
Roduta Roberts, Mary; Alves, Cecilia B.; Chu, Man-Wai; Thompson, Margaret; Bahry, Louise M.; Gotzmann, Andrea – Applied Measurement in Education, 2014
The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The…
Descriptors: Foreign Countries, Mathematics Tests, Cognitive Processes, Models
Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu – Applied Measurement in Education, 2014
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…
Descriptors: Test Items, Test Bias, Equated Scores, Item Response Theory
Taherbhai, Husein; Seo, Daeryong; O'Malley, Kimberly – Applied Measurement in Education, 2014
English language learners (ELLs) are the fastest growing subgroup in American schools. These students, by a provision in the reauthorization of the Elementary and Secondary Education Act, are to be supported in their quest for language proficiency through the creation of systems that more effectively measure ELLs' progress across years. In…
Descriptors: English Language Learners, Second Language Learning, Achievement Gains, Language Proficiency
Noble, Tracy; Rosebery, Ann; Suarez, Catherine; Warren, Beth; O'Connor, Mary Catherine – Applied Measurement in Education, 2014
English language learners (ELLs) and their teachers, schools, and communities face increasingly high-stakes consequences due to test score gaps between ELLs and non-ELLs. It is essential that the field of educational assessment continue to investigate the meaning of these test score gaps. This article discusses the findings of an exploratory study…
Descriptors: English Language Learners, Evidence, Educational Assessment, Achievement Gap
Ercikan, Kadriye; Roth, Wolff-Michael; Simon, Marielle; Sandilands, Debra; Lyons-Thomas, Juliette – Applied Measurement in Education, 2014
Diversity and heterogeneity among language groups have been well documented. Yet most fairness research that focuses on measurement comparability considers linguistic minority students such as English language learners (ELLs) or Francophone students living in minority contexts in Canada as a single group. Our focus in this research is to examine…
Descriptors: Test Bias, Language Minorities, French Canadians, Measurement
Oliveri, María Elena; Ercikan, Kadriye; Zumbo, Bruno D. – Applied Measurement in Education, 2014
Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on…
Descriptors: Test Bias, Accuracy, English Language Learners, Simulation
Michaelides, Michalis P.; Haertel, Edward H. – Applied Measurement in Education, 2014
The standard error of equating quantifies the variability in the estimation of an equating function. Because common items for deriving equated scores are treated as fixed, the only source of variability typically considered arises from the estimation of common-item parameters from responses of samples of examinees. Use of alternative, equally…
Descriptors: Equated Scores, Test Items, Sampling, Statistical Inference
Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014
The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…
Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation
Humphry, Stephen; Heldsinger, Sandra; Andrich, David – Applied Measurement in Education, 2014
One of the best-known methods for setting a benchmark standard on a test is that of Angoff and its modifications. When scored dichotomously, judges estimate the probability that a benchmark student has of answering each item correctly. As in most methods of standard setting, it is assumed implicitly that the unit of the latent scale of the…
Descriptors: Foreign Countries, Standard Setting (Scoring), Judges, Item Response Theory

Peer reviewed
Direct link
