Publication Date
| In 2024 | 7 |
| Since 2023 | 23 |
| Since 2020 (last 5 years) | 69 |
| Since 2015 (last 10 years) | 187 |
| Since 2005 (last 20 years) | 443 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 28 |
| Practitioners | 2 |
| Policymakers | 1 |
| Students | 1 |
Location
| Turkey | 14 |
| Canada | 10 |
| United States | 10 |
| California | 9 |
| Netherlands | 9 |
| Australia | 6 |
| Germany | 6 |
| South Korea | 6 |
| Iowa | 5 |
| Norway | 5 |
| Turkey (Ankara) | 5 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 2 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Ure, Abigail C. – ProQuest LLC, 2011
This study investigated how 2 different rating conditions, the controlled rating condition (CRC) and the uncontrolled rating condition (URC), effected rater behavior and the reliability of a performance assessment (PA) known as the Missionary Teaching Assessment (MTA). The CRC gives raters the capability to manipulate (pause, rewind, fast-forward)…
Descriptors: Teacher Evaluation, Performance Based Assessment, Performance Tests, Generalizability Theory
Lewis, Scott E.; Shaw, Janet L.; Freeman, Kathryn A. – Chemistry Education Research and Practice, 2011
Open-ended assessments, defined as assessments with a large set of possible correct answers, by nature lend themselves to concerns regarding accurate and consistent grading. This article describes one particular open-ended assessment, named Creative Exercises (CE), designed for promoting students' interconnection of concepts in a college general…
Descriptors: Evidence, Concept Mapping, Knowledge Level, Chemistry
Christ, Theodore J.; Riley-Tillman, T. Chris; Chafouleas, Sandra M.; Boice, Christina H. – Educational and Psychological Measurement, 2010
Generalizability theory was used to examine the generalizability and dependability of outcomes from two single-item Direct Behavior Rating (DBR) scales: DBR of actively manipulating and DBR of visually distracted. DBR is a behavioral assessment tool with specific instrumentation and procedures that can be used by a variety of service delivery…
Descriptors: Generalizability Theory, Student Behavior, Data Collection, Student Evaluation
Anderson, Daniel; Alonzo, Julie; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we describe the results of a study of mathematics items written to align with the Common Core State Standards (CCSS) in grades 6-8. In each grade, CCSS items were organized into forms, and the reliability of these forms was evaluated along with an experimental form including items aligned with the National Council of…
Descriptors: Curriculum Based Assessment, Mathematics Tests, Academic Standards, State Standards
Lakin, Joni M.; Lai, Emily R. – Educational and Psychological Measurement, 2012
For educators seeking to differentiate instruction, cognitive ability tests sampling multiple content domains, including verbal, quantitative, and nonverbal reasoning, provide superior information about student strengths and weaknesses compared with unidimensional reasoning measures. However, these ability tests have not been fully evaluated with…
Descriptors: Aptitude Tests, Nonverbal Ability, Cognitive Ability, Verbal Ability
Lamprianou, Iasonas; Christie, Thomas – Educational Assessment, Evaluation and Accountability, 2009
Accepting that school based assessment may have the potential to bring additional reliability to the assessment outcomes of an educational system, this research uses Generalizability Theory to address the question "why school based assessment is not a universal feature of high stakes assessment systems"? Three major issues are identified: (a)…
Descriptors: Generalizability Theory, High Stakes Tests, Psychometrics, Evaluation
Huang, Jinyan – TESOL Journal, 2011
Using generalizability theory, this study examined both the rating variability and reliability of English as a second language (ESL) students' writing in two provincial examinations in Canada. This article discusses expected and unexpected similarities and differences related to rating variability and reliability between the two testing programs.…
Descriptors: Foreign Countries, Generalizability Theory, Test Reliability, Testing Programs
Ahn, Soyeon; Ames, Allison J.; Myers, Nicholas D. – Review of Educational Research, 2012
The current review addresses the validity of published meta-analyses in education that determines the credibility and generalizability of study findings using a total of 56 meta-analyses published in education in the 2000s. Our objectives were to evaluate the current meta-analytic practices in education, identify methodological strengths and…
Descriptors: Inferences, Meta Analysis, Educational Practices, Research Methodology
Cook, David A.; Beckman, Thomas J.; Mandrekar, Jayawant N.; Pankratz, V. Shane – Advances in Health Sciences Education, 2010
The mini-CEX is widely used to rate directly observed resident-patient encounters. Although several studies have explored the reliability of mini-CEX scores, the dimensionality of mini-CEX scores is incompletely understood. Objective: Explore the dimensionality of mini-CEX scores through factor analysis and generalizability analysis. Design:…
Descriptors: Graduate Students, Medical Students, Internal Medicine, Rating Scales
Briesch, Amy M.; Chafouleas, Sandra M.; Riley-Tillman, T. Chris – School Psychology Review, 2010
Although substantial attention has been directed toward building the psychometric evidence base for academic assessment methods (e.g., state mastery tests, curriculum-based measurement), similar examination of behavior assessment methods has been comparatively limited, particularly with regard to assessment purposes most desirable within…
Descriptors: Generalizability Theory, Student Behavior, Curriculum Based Assessment, Observation
Guler, Nese; Gelbal, Selahattin – Educational Sciences: Theory and Practice, 2010
In this study, the Classical test theory and generalizability theory were used for determination to reliability of scores obtained from measurement tool of mathematics success. 24 open-ended mathematics question of the TIMSS-1999 was applied to 203 students in 2007-spring semester. Internal consistency of scores was found as 0.92. For…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Interrater Reliability
Kim, Youn-Hee – Applied Linguistics, 2009
The current status of English as an international language has come with challenges to the native speaker norms and raised the relevance of localized varieties in language assessment. This preliminary study investigates whether native English-speaking (NS) and non-native English-speaking (NNS) raters differ in their effect on score reliability in…
Descriptors: Generalizability Theory, Speech Communication, Native Speakers, English (Second Language)
Harik, Polina; Clauser, Brian E.; Grabovsky, Irina; Nungester, Ronald J.; Swanson, Dave; Nandakumar, Ratna – Journal of Educational Measurement, 2009
The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE[R] Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to…
Descriptors: Generalizability Theory, Performance Based Assessment, Performance Tests, Clinical Experience
Harsch, Claudia; Rupp, Andre Alexander – Language Assessment Quarterly, 2011
The "Common European Framework of Reference" (CEFR; Council of Europe, 2001) provides a competency model that is increasingly used as a point of reference to compare language examinations. Nevertheless, aligning examinations to the CEFR proficiency levels remains a challenge. In this article, we propose a new, level-centered approach to…
Descriptors: Language Tests, Writing Tests, Test Construction, Test Items
Leclerc, Bernard-Simon; Dassa, Clement – Canadian Journal of Program Evaluation, 2009
This study examines the usefulness of the Montreal Service Concept framework of service quality measurement, when it was used as a predefined set of codes in content analysis of patients' responses. As well, the study quantifies the interrater agreement of coded data. Two raters independently reviewed each of the responses from a mail survey of…
Descriptors: Interrater Reliability, Content Analysis, Health Services, Mail Surveys

Direct link
Peer reviewed
