Publication Date
| In 2024 | 48 |
| Since 2023 | 115 |
| Since 2020 (last 5 years) | 383 |
| Since 2015 (last 10 years) | 946 |
| Since 2005 (last 20 years) | 2090 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 52 |
| United Kingdom | 46 |
| Canada | 44 |
| Netherlands | 40 |
| California | 37 |
| China | 34 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Japan | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedGierl, Mark J. – Alberta Journal of Educational Research, 1998
Examined the generalizability of written-response scores on the English 30 diploma examination administered to Alberta 12th-grade students. Student scores differed as a function of rater, but this variance component was small across two tasks and two administrations; score generalizability was high using a two-rater system; and scale variability…
Descriptors: Error of Measurement, Foreign Countries, Generalizability Theory, High School Seniors
Peer reviewedKlein, Stephen P.; Stecher, Brian M.; Shavelson, Richard J.; McCaffrey, Daniel; Ormseth, Tor; Bell, Robert M.; Comfort, Kathy; Othman, Abdul R. – Applied Measurement in Education, 1998
Two studies involving 368 elementary and high school students and 29 readers were conducted to investigate reader consistency, score reliability, and reader time requirements of three hands-on science performance tasks. Holistic scores were as reliable as analytic scores, and there was a high correlation between them after they were disattenuated…
Descriptors: Elementary School Students, Elementary Secondary Education, Hands on Science, High School Students
Peer reviewedMagin, D. J. – Assessment & Evaluation in Higher Education, 2001
Presents a novel application of analysis of variance (ANOVA) techniques to compare the reliability of multiple peer ratings with single teacher ratings. Uses rating data from two different courses, both involving multiple peer and individual teacher ratings that were used to assess student contributions to group process work. Discusses…
Descriptors: Analysis of Variance, Comparative Analysis, Cooperative Learning, Evaluation Methods
Anderson, Rachel L.; Lyons, John S.; Giles, Debra M.; Price, Judith A.; Estle, George – Journal of Child and Family Studies, 2003
We examined the interrater reliability of the "Child and Adolescent Needs and Strengths-Mental Health" (CANS-MH) scale among researchers and between researchers and clinicians. All children presenting to a treatment facility for either protective or mental health needs were eligible to be included in the study. As part of standard assessment…
Descriptors: Health Needs, Mental Health, Interrater Reliability, Quality Assurance
Bodzin, Alec M.; Beerer, Karen M. – Journal of Elementary Science Education, 2003
The National Science Education Standards recognize that inquiry-based instruction holds significant promise for developing scientifically literate students. The Science Teacher Inquiry Rubric (STIR) was developed based upon the National Science Education Standards' essential features of inquiry instruction (NRC, 2000). A pilot study using a…
Descriptors: Observation, Science Teachers, Science Instruction, Inquiry
ur Rehman, Sajjad; Al-Ansari, Husain; Yousef, Nibal – Education for Information, 2002
Presents the collective judgments of a group of academics from North America, Southeast Asia and the Arabian Gulf region, as well as leading practitioners from the Arabian Gulf region, about the content of graduate degrees in information studies. The participants generally agreed about the content of the curriculum of information studies. The most…
Descriptors: Curriculum Development, Foreign Countries, Curriculum Evaluation, Minimum Competencies
Clark, Jeffrey K.; Ogletree, Roberta J.; McKenzie, James F.; Dennis, Dixie; Chamness, Brenda E. – American Journal of Health Education, 2002
For more than a decade the health education profession has used the seven responsibilities, outlined from the 1978-1988 Role Delineation Project, as the foundation for credentialing, curricular structure in professional preparation programs, and continuing education. The purpose of this study was to investigate the extent to which the seven…
Descriptors: Health Education, Certification, Credentials, Interrater Reliability
Romero, Fernando; Paris, Scott G.; Brem, Sarah K. – Current Issues in Education, 2005
We examined underlying mechanisms for comprehension differences across expository and narrative text while controlling for factors confounded in the extant literature. Fourth grade students (n=32) read both an expository and a narrative text, and completed both a local comprehension assessment, and a global retelling assessment for each text.…
Descriptors: Reading Comprehension, Grade 4, Psycholinguistics, Models
Christie, Christina A.; Azzam, Tarek – New Directions for Evaluation, 2005
The purpose of this issue of "New Directions for Evaluation" is to examine, comparatively, the practical application of theorists' approaches to evaluation by examining four evaluations of the same case. The thought is that when asked to evaluate the same program (holding the case constant), the practical distinctions between theorists' approaches…
Descriptors: Theory Practice Relationship, Interrater Reliability, Meta Analysis, Case Studies
Lustick, David; Sykes, Gary – Education Policy Analysis Archives, 2006
This study investigated the National Board for Professional Teaching Standards' (NBPTS) assessment process in order to identify, quantify, and substantiate learning outcomes from the participants. One hundred and twenty candidates for the Adolescent and Young Adult Science (AYA Science) Certificate were studied over a two-year period using the…
Descriptors: Intervention, National Standards, Young Adults, Program Effectiveness
Brown, William L.; And Others – 1996
This study presents psychometric characteristics of the mathematics problem solving performance assessment used in the Minneapolis Public Schools, focusing on the interrater reliability, scoring reliability, and validity of the assessment. The Minneapolis Math Problem Solving Assessment (MPSA) was established in 1991. Students are asked to solve…
Descriptors: Elementary School Students, Grade 5, Intermediate Grades, Interrater Reliability
Tanner, David E. – 1997
During a period when the reform-minded are very critical of the degree to which testing is actually related to the conditions for which data are employed, authentic assessment offers the opportunity to evaluate learning in settings closely related to the real world. It also allows the evaluator to tailor assessment conditions for individual…
Descriptors: Construct Validity, Content Validity, Elementary Secondary Education, Evaluation Criteria
Aycock, Tim – 1993
To determine trends in reporting test reliability, 88 articles addressing 188 instruments in 1980, 81 articles covering 205 instruments in 1985, and 67 articles assessing 195 instruments in 1990 in the "Journal of Counseling Psychology" were reviewed. Articles were examined for the way in which reliability was discussed and reported, and…
Descriptors: Educational Practices, Educational Research, Estimation (Mathematics), Interrater Reliability
Dolmans, Diana H. J. M.; And Others – 1992
A method is presented for collecting information about the match between students' learning issues in problem-based learning and teachers' objectives. Subjects were 82 second-year medical students at the University of Limburg in Maastricht (Netherlands) in a problem-based curriculum. During a unit on pregnancy, childbirth, and child development,…
Descriptors: Educational Objectives, Evaluators, Foreign Countries, Higher Education
McNamara, T. F.; Adams, R. J. – 1991
A preliminary study is reported of the use of new multifaceted Rasch measurement mechanisms for investigating rater characteristics in language testing. Ratings from four judges of scripts from 50 candidates taking the International English Language Testing System test, a test of English for Academic Purposes, are analyzed. The analysis…
Descriptors: Comparative Analysis, English (Second Language), Foreign Countries, Interrater Reliability

Direct link
