Publication Date
| In 2015 | 0 |
| Since 2014 | 35 |
| Since 2011 (last 5 years) | 121 |
| Since 2006 (last 10 years) | 214 |
| Since 1996 (last 20 years) | 319 |
Descriptor
| Foreign Countries | 68 |
| Student Evaluation | 65 |
| Academic Achievement | 53 |
| Scores | 45 |
| Educational Assessment | 44 |
| Evaluation Methods | 41 |
| Test Items | 37 |
| Comparative Analysis | 34 |
| Validity | 34 |
| Elementary Secondary Education | 30 |
| More ▼ | |
Author
| Baker, Eva L. | 10 |
| Gearhart, Maryl | 5 |
| Niemi, David | 5 |
| Shavelson, Richard J. | 5 |
| Wilson, Mark | 5 |
| Borko, Hilda | 4 |
| Furtak, Erin Marie | 4 |
| Martinez, Jose Felipe | 4 |
| Roeser, Robert W. | 4 |
| Sireci, Stephen G. | 4 |
| More ▼ | |
Publication Type
Education Level
| Elementary Secondary Education | 57 |
| Higher Education | 49 |
| Elementary Education | 43 |
| Postsecondary Education | 29 |
| Grade 4 | 24 |
| Middle Schools | 23 |
| Secondary Education | 23 |
| Grade 5 | 21 |
| Intermediate Grades | 14 |
| Grade 8 | 13 |
| More ▼ | |
Audience
| Administrators | 1 |
| Researchers | 1 |
| Teachers | 1 |
Showing 31 to 45 of 319 results
Buschang, Rebecca E.; Chung, Gregory K. W. K.; Delacruz, Girlie C.; Baker, Eva L. – Educational Assessment, 2012
The purpose of this study was to validate inferences about scores of one task designed to measure subject matter knowledge and three tasks designed to measure aspects of pedagogical content knowledge. Evidence for the validity of inferences was based on two expectations. First, if tasks were sensitive to expertise, we would find group differences.…
Descriptors: Algebra, Mathematics Teachers, Teacher Characteristics, Knowledge Base for Teaching
Bell, Courtney A.; Gitomer, Drew H.; McCaffrey, Daniel F.; Hamre, Bridget K.; Pianta, Robert C.; Qi, Yi – Educational Assessment, 2012
This article develops a validity argument approach for use on observation protocols currently used to assess teacher quality for high-stakes personnel and professional development decisions. After defining the teaching quality domain, we articulate an interpretive argument for observation protocols. To illustrate the types of evidence that might…
Descriptors: Teacher Effectiveness, Teacher Evaluation, Observation, Validity
Correnti, Richard; Matsumura, Lindsay Clare; Hamilton, Laura S.; Wang, Elaine – Educational Assessment, 2012
Guided by evidence that teachers contribute to student achievement outcomes, researchers have been reexamining how to study instruction and the classroom opportunities teachers create for students. We describe our experience measuring students' opportunities to develop analytic, text-based writing skills. Utilizing multiple methods of data…
Descriptors: Writing Skills, Skill Development, Educational Opportunities, Educational Quality
Martinez, Jose Felipe; Borko, Hilda; Stecher, Brian; Luskin, Rebecca; Kloser, Matt – Educational Assessment, 2012
We report the results of a pilot validation study of the Quality Assessment in Science Notebook, a portfolio-like instrument for measuring teacher assessment practices in middle school science classrooms. A statewide sample of 42 teachers collected 2 notebooks during the school year, corresponding to science topics taught in the fall and spring.…
Descriptors: Validity, Middle School Teachers, Evaluation Methods, Educational Assessment
Hickson, Stephen; Reed, W. Robert; Sander, Nicholas – Educational Assessment, 2012
This study investigates the degree to which grades based solely on constructed-response (CR) questions differ from grades based solely on multiple-choice (MC) questions. If CR questions are to justify their higher costs, they should produce different grade outcomes than MC questions. We use a data set composed of thousands of observations on…
Descriptors: Grades (Scholastic), Student Evaluation, Multiple Choice Tests, Observation
Sparfeldt, Jorn R.; Kimmel, Rumena; Lowenkamp, Lena; Steingraber, Antje; Rost, Detlef H. – Educational Assessment, 2012
Multiple-choice (MC) reading comprehension test items comprise three components: text passage, questions about the text, and MC answers. The construct validity of this format has been repeatedly criticized. In three between-subjects experiments, fourth graders (N[subscript 1] = 230, N[subscript 2] = 340, N[subscript 3] = 194) worked on three…
Descriptors: Test Items, Reading Comprehension, Construct Validity, Grade 4
Taut, Sandy; Santelices, Maria Veronica; Stecher, Brian – Educational Assessment, 2012
The task of validating a teacher assessment and improvement system is similar whether the system operates in the United States or in another country. Chile has a national teacher evaluation system (NTES) that is standards based, uses multiple instruments, and is intended to serve both formative and summative purposes. For the past 6 years the…
Descriptors: Evidence, Foreign Countries, Teacher Evaluation, Standards
Hill, Heather C.; Charalambous, Charalambos Y.; Blazar, David; McGinn, Daniel; Kraft, Matthew A.; Beisiegel, Mary; Humez, Andrea; Litke, Erica; Lynch, Kathleen – Educational Assessment, 2012
Measurement scholars have recently constructed validity arguments in support of a variety of educational assessments, including classroom observation instruments. In this article, we note that users must examine the robustness of validity arguments to variation in the implementation of these instruments. We illustrate how such an analysis might be…
Descriptors: Validity, Classroom Observation Techniques, Measures (Individuals), Teacher Effectiveness
Huffman, Loreen; Adamopoulos, Anthony; Murdock, Gwendolyn; Cole, AmyKay; McDermid, Robert – Educational Assessment, 2011
Accountability in higher education has increased, with more institutions requiring standardized tests. These tests are high stakes for institutions, but low-stakes test for students, who seldom experience consequences for their performance. This study describes how one psychology department improved students' scores on the Psychology Area…
Descriptors: Student Motivation, Undergraduate Students, Program Evaluation, Standardized Tests
Whittaker, Tiffany A.; Williams, Natasha J.; Dodd, Barbara G. – Educational Assessment, 2011
This study assessed the interpretability of scaled scores based on either number correct (NC) scoring for a paper-and-pencil test or one of two methods of scoring computer-based tests: an item pattern (IP) scoring method and a method based on equated NC scoring. The equated NC scoring method for computer-based tests was proposed as an alternative…
Descriptors: Computer Assisted Testing, Scoring, Test Interpretation, Equated Scores
Cheng, Liying; DeLuca, Christopher – Educational Assessment, 2011
Test-takers' interpretations of validity as related to test constructs and test use have been widely debated in large-scale language assessment. This study contributes further evidence to this debate by examining 59 test-takers' perspectives in writing large-scale English language tests. Participants wrote about their test-taking experiences in…
Descriptors: Language Tests, Test Validity, Test Use, English
Riggan, Matthew; Olah, Leslie Nabors – Educational Assessment, 2011
Promising research on the teaching and learning impact of classroom-embedded formative assessment has spawned interest in a broader array of assessment tools and practices, including interim assessment. Although researchers have begun to explore the impact of interim assessments in the classroom, like other assessment tools and practices, they…
Descriptors: Homework, Student Evaluation, Observation, Formative Evaluation
Taylor, Catherine S.; Lee, Yoonsun – Educational Assessment, 2011
This article presents a study of ethnic Differential Item Functioning (DIF) for 4th-, 7th-, and 10th-grade reading items on a state criterion-referenced achievement test. The tests, administered 1997 to 2001, were composed of multiple-choice and constructed-response items. Item performance by focal groups (i.e., students from Asian/Pacific Island,…
Descriptors: Test Bias, Test Items, Pacific Islanders, American Indians
Kobrin, Jennifer L.; Patterson, Brian F. – Educational Assessment, 2011
Prior research has shown that there is substantial variability in the degree to which the SAT and high school grade point average (HSGPA) predict 1st-year college performance at different institutions. This article demonstrates the usefulness of multilevel modeling as a tool to uncover institutional characteristics that are associated with this…
Descriptors: College Entrance Examinations, Scores, Grade Point Average, High School Students
Wyse, Adam E.; Viger, Steven G. – Educational Assessment, 2011
An important part of test development is ensuring alignment between test forms and content standards. One common way of measuring alignment is the Webb (1997, 2007) alignment procedure. This article investigates (a) how well item writers understand components of the definition of Depth of Knowledge (DOK) from the Webb alignment procedure and (b)…
Descriptors: Test Items, Difficulty Level, Test Construction, Alignment (Education)

Peer reviewed
Direct link
