Publication Date
| In 2015 | 0 |
| Since 2014 | 2 |
| Since 2011 (last 5 years) | 6 |
| Since 2006 (last 10 years) | 23 |
| Since 1996 (last 20 years) | 59 |
Descriptor
| Test Construction | 102 |
| Test Items | 36 |
| Scores | 21 |
| Item Response Theory | 16 |
| Evaluation Methods | 14 |
| Multiple Choice Tests | 14 |
| Performance Based Assessment | 14 |
| Test Use | 14 |
| Test Validity | 14 |
| Educational Assessment | 13 |
| More ▼ | |
Source
| Applied Measurement in… | 102 |
Author
| Downing, Steven M. | 5 |
| Haladyna, Thomas M. | 4 |
| Feldt, Leonard S. | 3 |
| Hambleton, Ronald K. | 3 |
| Huff, Kristen | 3 |
| Clauser, Brian E. | 2 |
| Ercikan, Kadriye | 2 |
| Forsyth, Robert A. | 2 |
| Frary, Robert B. | 2 |
| Linn, Robert L. | 2 |
| More ▼ | |
Publication Type
| Journal Articles | 102 |
| Reports - Evaluative | 50 |
| Reports - Research | 40 |
| Reports - Descriptive | 10 |
| Information Analyses | 5 |
| Speeches/Meeting Papers | 4 |
| Guides - Non-Classroom | 1 |
| Historical Materials | 1 |
Education Level
| High Schools | 9 |
| Secondary Education | 8 |
| Elementary Secondary Education | 4 |
| Elementary Education | 2 |
| Grade 3 | 2 |
| Grade 5 | 2 |
| Grade 8 | 2 |
| Higher Education | 2 |
| Junior High Schools | 2 |
| Middle Schools | 2 |
| More ▼ | |
Audience
Showing 1 to 15 of 102 results
Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014
In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…
Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory
Chia, Magda Y. – Applied Measurement in Education, 2014
The Smarter Balanced Assessment Consortium (Smarter Balanced) serves over 19 million primary, middle, and high school students from across 26 states and affiliates (Smarter Balanced, n.d). As one of the two Race to the Top (RTT)-funded assessment consortia, Smarter Balanced is responsible for developing formative, interim, and summative…
Descriptors: State Standards, Academic Standards, Educational Assessment, English Language Learners
Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…
Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores
Boyd, Aimee M.; Dodd, Barbara; Fitzpatrick, Steven – Applied Measurement in Education, 2013
This study compared several exposure control procedures for CAT systems based on the three-parameter logistic testlet response theory model (Wang, Bradlow, & Wainer, 2002) and Masters' (1982) partial credit model when applied to a pool consisting entirely of testlets. The exposure control procedures studied were the modified within 0.10 logits…
Descriptors: Computer Assisted Testing, Item Response Theory, Test Construction, Models
Rogers, W. Todd; Lin, Jie; Rinaldi, Christia M. – Applied Measurement in Education, 2011
The evidence gathered in the present study supports the use of the simultaneous development of test items for different languages. The simultaneous approach used in the present study involved writing an item in one language (e.g., French) and, before moving to the development of a second item, translating the item into the second language (e.g.,…
Descriptors: Test Items, Item Analysis, Achievement Tests, French
Leighton, Jacqueline P.; Heffernan, Colleen; Cor, M. Kenneth; Gokiert, Rebecca J.; Cui, Ying – Applied Measurement in Education, 2011
The "Standards for Educational and Psychological Testing" indicate that test instructions, and by extension item objectives, presented to examinees should be sufficiently clear and detailed to help ensure that they respond as developers intend them to respond (Standard 3.20; AERA, APA, & NCME, 1999). The present study investigates the use of…
Descriptors: Test Construction, Validity, Evidence, Science Tests
Hendrickson, Amy; Huff, Kristen; Luecht, Richard – Applied Measurement in Education, 2010
Evidence-centered assessment design (ECD) explicates a transparent evidentiary argument to warrant the inferences we make from student test performance. This article describes how the vehicles for gathering student evidence--task models and test specifications--are developed. Task models, which are the basis for item development, flow directly…
Descriptors: Evidence, Test Construction, Measurement, Classification
Bejar, Isaac I. – Applied Measurement in Education, 2010
The foregoing articles constitute what I consider a comprehensive and clear description of the redesign process of a major assessment. The articles serve to illustrate the problems that will need to be addressed by large-scale assessments in the twenty-first century. Primary among them is how to organize the development of such assessments to meet…
Descriptors: Advanced Placement Programs, Equivalency Tests, Evidence, Test Construction
Brennan, Robert L. – Applied Measurement in Education, 2010
This paper provides an overview of evidence-centered assessment design (ECD) and some general information about of the Advanced Placement (AP[R]) Program. Then the papers in this special issue are discussed, as they relate to the use of ECD in the revision of various AP tests. This paper concludes with some observations about the need to validate…
Descriptors: Advanced Placement Programs, Equivalency Tests, Evidence, Test Construction
Ewing, Maureen; Packman, Sheryl; Hamen, Cynthia; Thurber, Allison Clark – Applied Measurement in Education, 2010
In the last few years, the Advanced Placement (AP) Program[R] has used evidence-centered assessment design (ECD) to articulate the knowledge, skills, and abilities to be taught in the course and measured on the summative exam for four science courses, three history courses, and six world language courses; its application to calculus and English…
Descriptors: Advanced Placement Programs, Equivalency Tests, Evidence, Test Construction
Plake, Barbara S.; Huff, Kristen; Reshetar, Rosemary – Applied Measurement in Education, 2010
In many large-scale assessment programs, achievement level descriptors (ALDs) provide a critical role in communicating what scores on the assessment mean and in interpreting what examinees know and are able to do based on their test performance. Based on their test performance, examinees are often classified into performance categories. The…
Descriptors: Evidence, Test Construction, Measurement, Standard Setting
Huff, Kristen; Steinberg, Linda; Matts, Thomas – Applied Measurement in Education, 2010
The cornerstone of evidence-centered assessment design (ECD) is an evidentiary argument that requires that each target of measurement (e.g., learning goal) for an assessment be expressed as a "claim" to be made about an examinee that is relevant to the specific purpose and audience(s) for the assessment. The "observable evidence" required to…
Descriptors: Advanced Placement Programs, Equivalency Tests, Evidence, Test Construction
Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010
Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…
Descriptors: Item Response Theory, Case Studies, Reliability, Scores
Hein, Serge F.; Skaggs, Gary E. – Applied Measurement in Education, 2009
Only a small number of qualitative studies have investigated panelists' experiences during standard-setting activities or the thought processes associated with panelists' actions. This qualitative study involved an examination of the experiences of 11 panelists who participated in a prior, one-day standard-setting meeting in which either the…
Descriptors: Focus Groups, Standard Setting, Cutting Scores, Cognitive Processes
Meyers, Jason L.; Miller, G. Edward; Way, Walter D. – Applied Measurement in Education, 2009
In operational testing programs using item response theory (IRT), item parameter invariance is threatened when an item appears in a different location on the live test than it did when it was field tested. This study utilizes data from a large state's assessments to model change in Rasch item difficulty (RID) as a function of item position change,…
Descriptors: Test Items, Test Content, Testing Programs, Simulation

Peer reviewed
Direct link
