NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Showing 1 to 15 of 663 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Bostic, Jonathan David; Sondergeld, Toni A.; Matney, Gabriel; Stone, Gregory; Hicks, Tiara – Applied Measurement in Education, 2021
Response process validity evidence provides a window into a respondent's cognitive processing. The purpose of this study is to describe a new data collection tool called a whole-class think aloud (WCTA). This work is performed as part of test development for a series of problem-solving measures to be used in elementary and middle grades. Data from…
Descriptors: Data Collection, Protocol Analysis, Problem Solving, Cognitive Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Mo, Ya; Carney, Michele; Cavey, Laurie; Totorica, Tatia – Applied Measurement in Education, 2021
There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop…
Descriptors: Protocol Analysis, Test Items, Responses, Mathematics Teachers
Peer reviewed Peer reviewed
Direct linkDirect link
Leighton, Jacqueline P. – Applied Measurement in Education, 2021
The objective of this paper is to comment on the think-aloud methods presented in the three papers included in this special issue. The commentary offered stems from the author's own psychological investigations of unobservable information processes and the conditions under which the most defensible claims can be advanced. The structure of this…
Descriptors: Protocol Analysis, Data Collection, Test Construction, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Bonner, Sarah; Chen, Peggy; Jones, Kristi; Milonovich, Brandon – Applied Measurement in Education, 2021
We describe the use of think alouds to examine substantive processes involved in performance on a formative assessment of computational thinking (CT) designed to support self-regulated learning (SRL). Our task design model included three phases of work on a computational thinking problem: forethought, performance, and reflection. The cognitive…
Descriptors: Formative Evaluation, Thinking Skills, Metacognition, Computer Science Education
Peer reviewed Peer reviewed
Direct linkDirect link
Bostic, Jonathan David – Applied Measurement in Education, 2021
Think alouds are valuable tools for academicians, test developers, and practitioners as they provide a unique window into a respondent's thinking during an assessment. The purpose of this special issue is to highlight novel ways to use think alouds as a means to gather evidence about respondents' thinking. An intended outcome from this special…
Descriptors: Protocol Analysis, Cognitive Processes, Data Collection, STEM Education
Peer reviewed Peer reviewed
Direct linkDirect link
Bejar, Isaac I.; Li, Chen; McCaffrey, Daniel – Applied Measurement in Education, 2020
We evaluate the feasibility of developing predictive models of rater behavior, that is, "rater-specific" models for predicting the scores produced by a rater under operational conditions. In the present study, the dependent variable is the score assigned to essays by a rater, and the predictors are linguistic attributes of the essays…
Descriptors: Scoring, Essays, Behavior, Predictive Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020
This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…
Descriptors: Scoring, Test Format, Responses, Predictor Variables
Peer reviewed Peer reviewed
Direct linkDirect link
Finn, Bridgid; Arslan, Burcu; Walsh, Matthew – Applied Measurement in Education, 2020
To score an essay response, raters draw on previously trained skills and knowledge about the underlying rubric and score criterion. Cognitive processes such as remembering, forgetting, and skill decay likely influence rater performance. To investigate how forgetting influences scoring, we evaluated raters' scoring accuracy on TOEFL and GRE essays.…
Descriptors: Epistemology, Essay Tests, Evaluators, Cognitive Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Choi, Ikkyu; Wolfe, Edward W. – Applied Measurement in Education, 2020
Rater training is essential in ensuring the quality of constructed response scoring. Most of the current knowledge about rater training comes from experimental contexts with an emphasis on short-term effects. Few sources are available for empirical evidence on whether and how raters become more accurate as they gain scoring experiences or what…
Descriptors: Scoring, Accuracy, Training, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Wendler, Cathy; Glazer, Nancy; Bridgeman, Brent – Applied Measurement in Education, 2020
Efficient constructed response (CR) scoring requires both accuracy and speed from human raters. This study was designed to determine if setting scoring rate expectations would encourage raters to score at a faster pace, and if so, if there would be differential effects on scoring accuracy for raters who score at different rates. Three rater groups…
Descriptors: Scoring, Expectation, Accuracy, Time
Peer reviewed Peer reviewed
Direct linkDirect link
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020
This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…
Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020
Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…
Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Goodman, Joshua T.; Dallas, Andrew D.; Fan, Fen – Applied Measurement in Education, 2020
Recent research has suggested that re-setting the standard for each administration of a small sample examination, in addition to the high cost, does not adequately maintain similar performance expectations year after year. Small-sample equating methods have shown promise with samples between 20 and 30. For groups that have fewer than 20 students,…
Descriptors: Equated Scores, Sample Size, Sampling, Weighted Scores
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  45