Publication Date
| In 2024 | 48 |
| Since 2023 | 115 |
| Since 2020 (last 5 years) | 383 |
| Since 2015 (last 10 years) | 946 |
| Since 2005 (last 20 years) | 2090 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 52 |
| United Kingdom | 46 |
| Canada | 44 |
| Netherlands | 40 |
| California | 37 |
| China | 34 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Japan | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedNorcini, John J.; And Others – Journal of Educational Measurement, 1987
This study examined whether two variations on the typical Angoff group standard-setting process would produce sufficiently consistent results to recommend their use. The results imply that judgments gathered after an initial traditional group-process session can provide an efficient alternative mechanism for setting cutting scores using the Angoff…
Descriptors: Cutting Scores, Generalizability Theory, Graduate Medical Education, Group Dynamics
Peer reviewedRidley, Charles R. – Journal of Cross-Cultural Psychology, 1986
This study investigated the effects of therapists' observer-client race pairing and client self-disclosure on observers' descriptive and attitudinal ratings of clients. A major implication is that observer race, client race, and client self-disclosure influence clinical decision-making. (Author/LHW)
Descriptors: Clinical Diagnosis, Counselor Client Relationship, Cross Cultural Studies, Ethnic Groups
Peer reviewedBerk, Ronald A. – Review of Educational Research, 1986
Thirty-eight methods are presented for either setting standards or adjusting them based on an analysis of classification error rates. A trilevel classification scheme is used to categorize the methods, and 10 criteria of technical adequacy and practicability are proposed to evaluate them. (Author/LMO)
Descriptors: Criterion Referenced Tests, Cutting Scores, Elementary Secondary Education, Error of Measurement
Peer reviewedPayne, Beverly Dean – Educational and Psychological Measurement, 1984
The validity of elementary school pupil ratings of the teaching performance of 33 student teachers was examined using the nine competencies of the Teacher Performance Assessment Instruments. Ratings of college supervisors and supervising teachers were criteria for contrast of validity coefficients of student ratings. (Author/BS)
Descriptors: Elementary Education, Elementary School Students, Interrater Reliability, Rating Scales
Campbell, Stephen R. – Online Submission, 2004
This paper charts a cognitive history of the concepts of quantity and quality from three inter-related and inter-dependent perspectives of mathematics, logic, and physics. In so doing, other notions associated with the evolution of these concepts are identified and explicated. It is argued that the concepts of quantity and quality, considered in…
Descriptors: Educational Research, Qualitative Research, Research Methodology, Statistical Analysis
McGinty, Dixie; Neel, John H. – 1996
A new standard setting approach is introduced, called the cognitive components approach. Like the Angoff method, the cognitive components method generates minimum pass levels (MPLs) for each item. In both approaches, the item MPLs are summed for each judge, then averaged across judges to yield the standard. In the cognitive components approach,…
Descriptors: Cognitive Processes, Criterion Referenced Tests, Evaluation Methods, Grade 3
Takala, Sauli – 1998
This paper discusses recent developments in language testing. It begins with a review of the traditional criteria that are applied to all measurement and outlines recent emphases that derive from the expanding range of stakeholders. Drawing on Alderson's seminal work, criteria are presented for evaluating communicative language tests. Developments…
Descriptors: Alternative Assessment, Communicative Competence (Languages), Comparative Analysis, Evaluation Criteria
Peer reviewedAngoff, William H. – Applied Measurement in Education, 1988
Suggestions are provided for future research in item bias detection, reduction of essay-reader variation in setting cut-score levels, and limitations of equating theory. (TJH)
Descriptors: College Entrance Examinations, Cutting Scores, Equated Scores, Essay Tests
Peer reviewedTyson, LeaAnn; Silverman, Stephen – Journal of Personnel Evaluation in Education, 1994
Differences in the Texas Teacher Appraisal System scores of teacher subgroups over 2 years were examined for 2,366 teachers for scores on individual domains, sums of scores of the 1st 4 domains, and overall summary performance scores, as well as appraiser differences. Implications for teacher evaluation are discussed. (SLD)
Descriptors: Educational Assessment, Elementary Secondary Education, Evaluation Methods, Evaluators
Peer reviewedGross, Leon J. – Evaluation and the Health Professions, 1994
Whether adequate levels of interrater reliability could be obtained on a national, standardized examination using one examiner per observation was studied with 101 paired candidate observations on an examination for optometry. Results indicate that psychometrically sound judgments can be obtained with one examiner. (SLD)
Descriptors: Educational Assessment, Error of Measurement, Evaluation Methods, Evaluators
Peer reviewedWigglesworth, Gillian – Australian Review of Applied Linguistics, 1994
Multifaceted Rasch analysis was used to determine whether bias was evident in the way a group of raters graded two different versions of an oral interaction test, undertaken by the same candidates. Results indicate that certain raters consistently rated the tape version of the test more harshly while others rated the live one more harshly. (10…
Descriptors: Data Collection, Foreign Countries, Graphs, Interaction Process Analysis
Peer reviewedJaeger, Richard M. – Educational Measurement: Issues and Practice, 1991
Issues concerning the selection of judges for standard setting are discussed. Determining the consistency of judges' recommendations, or their congruity with other expert recommendations, would help in selection. Enough judges must be chosen to allow estimation of recommendations by an entire population of judges. (SLD)
Descriptors: Cutting Scores, Evaluation Methods, Evaluators, Examiners
Peer reviewedReid, Jerry B. – Educational Measurement: Issues and Practice, 1991
Training judges to generate item ratings in standard setting once the reference group has been defined is discussed. It is proposed that sensitivity to the factors that determine difficulty can be improved through training. Three criteria for determining when training is sufficient are offered. (SLD)
Descriptors: Computer Assisted Instruction, Difficulty Level, Evaluators, Interrater Reliability
Peer reviewedElam, Carol L.; Andrykowski, Michael A. – Academic Medicine, 1991
Medical school admission interview ratings for four entering classes (n=356 students) were compared with preadmission academic variables (admission test scores, undergraduate grades), student characteristics (age, gender, residence), and interviewer characteristics (gender, professional background, admission committee membership). Recommendations…
Descriptors: Academic Achievement, Admission Criteria, College Admission, Higher Education
Peer reviewedHughes, I. E.; Large, B. J. – Studies in Higher Education, 1993
A study investigated the consistency of faculty and peer evaluations of the oral communication skills of 44 fourth-year pharmacology students. Substantial agreement between faculty and students was found. Peer evaluations were independent of their own communication skills. In addition, a significant correlation between oral and written…
Descriptors: Communication Skills, Comparative Analysis, Evaluation Methods, Higher Education


