Publication Date
| In 2015 | 8 |
| Since 2014 | 55 |
| Since 2011 (last 5 years) | 206 |
| Since 2006 (last 10 years) | 509 |
| Since 1996 (last 20 years) | 1047 |
Descriptor
| Test Validity | 781 |
| Higher Education | 571 |
| Correlation | 536 |
| Factor Analysis | 531 |
| Test Reliability | 481 |
| Factor Structure | 423 |
| Statistical Analysis | 421 |
| Scores | 368 |
| Comparative Analysis | 356 |
| Test Construction | 347 |
| More ▼ | |
Author
| Michael, William B. | 66 |
| Thompson, Bruce | 26 |
| Krus, David J. | 21 |
| Marcoulides, George A. | 20 |
| Vegelius, Jan | 20 |
| Aiken, Lewis R. | 19 |
| Plake, Barbara S. | 19 |
| Wang, Wen-Chung | 19 |
| Wilcox, Rand R. | 19 |
| Powers, Stephen | 18 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 86 |
| Postsecondary Education | 35 |
| Elementary Education | 30 |
| High Schools | 27 |
| Secondary Education | 24 |
| Middle Schools | 17 |
| Elementary Secondary Education | 16 |
| Grade 4 | 14 |
| Grade 3 | 12 |
| Grade 8 | 11 |
| More ▼ | |
Audience
| Researchers | 4 |
| Practitioners | 3 |
| Students | 1 |
Showing 46 to 60 of 3,486 results
Ye, Meng; Xin, Tao – Educational and Psychological Measurement, 2014
The authors explored the effects of drifting common items on vertical scaling within the higher order framework of item parameter drift (IPD). The results showed that if IPD occurred between a pair of test levels, the scaling performance started to deviate from the ideal state, as indicated by bias of scaling. When there were two items drifting…
Descriptors: Scaling, Test Items, Equated Scores, Achievement Gains
Meyer, J. Patrick; Liu, Xiang; Mashburn, Andrew J. – Educational and Psychological Measurement, 2014
Researchers often use generalizability theory to estimate relative error variance and reliability in teaching observation measures. They also use it to plan future studies and design the best possible measurement procedures. However, designing the best possible measurement procedure comes at a cost, and researchers must stay within their budget…
Descriptors: Reliability, Classroom Observation Techniques, Generalizability Theory, Error of Measurement
Williams, Ryan T.; Swanlund, Andrew; Miller, Shazia; Konstantopoulos, Spyros; Eno, Jared; van der Ploeg, Arie; Meyers, Coby – Educational and Psychological Measurement, 2014
This study operationalizes four measures of instructional differentiation: one for Grade 2 English language arts (ELA), one for Grade 2 mathematics, one for Grade 5 ELA, and one for Grade 5 mathematics. Our study evaluates their measurement properties of each measure in a large field experiment: the Indiana Diagnostic Assessment Tools Study, which…
Descriptors: Individualized Instruction, Grade 2, Grade 5, English Instruction
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung – Educational and Psychological Measurement, 2014
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Descriptors: Test Bias, Regression (Statistics), Test Items, True Scores
Warner, Janis A.; Koufteros, Xenophon; Verghese, Anto – Educational and Psychological Measurement, 2014
This article introduces a new construct coined as Computer User Learning Aptitude (CULA). To establish construct validity, CULA is embedded in a nomological network that extends the technology acceptance model (TAM). Specifically, CULA is posited to affect perceived usefulness and perceived ease of use, the two underlying TAM constructs.…
Descriptors: Second Language Learning, Language Aptitude, Computer Mediated Communication, Construct Validity
Hayduk, Leslie – Educational and Psychological Measurement, 2014
Researchers using factor analysis tend to dismiss the significant ill fit of factor models by presuming that if their factor model is close-to-fitting, it is probably close to being properly causally specified. Close fit may indeed result from a model being close to properly causally specified, but close-fitting factor models can also be seriously…
Descriptors: Factor Analysis, Goodness of Fit, Factor Structure, Structural Equation Models
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D. – Educational and Psychological Measurement, 2014
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Descriptors: Regression (Statistics), Test Bias, Effect Size, Test Items
Jones, W. Paul – Educational and Psychological Measurement, 2014
A study in a university clinic/laboratory investigated adaptive Bayesian scaling as a supplement to interpretation of scores on the Mini-IPIP. A "probability of belonging" in categories of low, medium, or high on each of the Big Five traits was calculated after each item response and continued until all items had been used or until a…
Descriptors: Personality Traits, Personality Measures, Bayesian Statistics, Clinics
Paulhus, Delroy L.; Dubois, Patrick J. – Educational and Psychological Measurement, 2014
The overclaiming technique is a novel assessment procedure that uses signal detection analysis to generate indices of knowledge accuracy (OC-accuracy) and self-enhancement (OC-bias). The technique has previously shown robustness over varied knowledge domains as well as low reactivity across administration contexts. Here we compared the OC-accuracy…
Descriptors: Educational Assessment, Knowledge Level, Accuracy, Cognitive Ability
Kersting, Nicole B.; Sherin, Bruce L.; Stigler, James W. – Educational and Psychological Measurement, 2014
In this study, we explored the potential for machine scoring of short written responses to the Classroom-Video-Analysis (CVA) assessment, which is designed to measure teachers' usable mathematics teaching knowledge. We created naïve Bayes classifiers for CVA scales assessing three different topic areas and compared computer-generated scores…
Descriptors: Scoring, Automation, Video Technology, Teacher Evaluation
Whittaker, Tiffany A.; Chang, Wanchen; Dodd, Barbara G. – Educational and Psychological Measurement, 2013
Whittaker, Chang, and Dodd compared the performance of model selection criteria when selecting among mixed-format IRT models and found that the criteria did not perform adequately when selecting the more parameterized models. It was suggested by M. S. Johnson that the problems when selecting the more parameterized models may be because of the low…
Descriptors: Item Response Theory, Models, Selection Criteria, Accuracy
Keeley, Jared W.; English, Taylor; Irons, Jessica; Henslee, Amber M. – Educational and Psychological Measurement, 2013
Many measurement biases affect student evaluations of instruction (SEIs). However, two have been relatively understudied: halo effects and ceiling/floor effects. This study examined these effects in two ways. To examine the halo effect, using a videotaped lecture, we manipulated specific teacher behaviors to be "good" or "bad"…
Descriptors: Robustness (Statistics), Test Bias, Course Evaluation, Student Evaluation of Teacher Performance
Li, Xueming; Sireci, Stephen G. – Educational and Psychological Measurement, 2013
Validity evidence based on test content is of essential importance in educational testing. One source for such evidence is an alignment study, which helps evaluate the congruence between tested objectives and those specified in the curriculum. However, the results of an alignment study do not always sufficiently capture the degree to which a test…
Descriptors: Content Validity, Multidimensional Scaling, Data Analysis, Educational Testing
Raykov, Tenko; Dimitrov, Dimiter M.; von Eye, Alexander; Marcoulides, George A. – Educational and Psychological Measurement, 2013
A latent variable modeling method for evaluation of interrater agreement is outlined. The procedure is useful for point and interval estimation of the degree of agreement among a given set of judges evaluating a group of targets. In addition, the approach allows one to test for identity in underlying thresholds across raters as well as to identify…
Descriptors: Interrater Reliability, Models, Statistical Analysis, Computation
Kaliski, Pamela K.; Wind, Stefanie A.; Engelhard, George, Jr.; Morgan, Deanna L.; Plake, Barbara S.; Reshetar, Rosemary A. – Educational and Psychological Measurement, 2013
The many-faceted Rasch (MFR) model has been used to evaluate the quality of ratings on constructed response assessments; however, it can also be used to evaluate the quality of judgments from panel-based standard setting procedures. The current study illustrates the use of the MFR model for examining the quality of ratings obtained from a standard…
Descriptors: Item Response Theory, Models, Standard Setting (Scoring), Science Tests

Peer reviewed
Direct link
