Publication Date
| In 2015 | 0 |
| Since 2014 | 1 |
| Since 2011 (last 5 years) | 2 |
| Since 2006 (last 10 years) | 4 |
| Since 1996 (last 20 years) | 12 |
Descriptor
| Test Validity | 147 |
| Test Reliability | 61 |
| Test Construction | 41 |
| Item Analysis | 24 |
| Multiple Choice Tests | 24 |
| Test Items | 24 |
| Higher Education | 23 |
| Achievement Tests | 22 |
| Testing Problems | 18 |
| Test Interpretation | 17 |
| More ▼ | |
Author
| Hanna, Gerald S. | 3 |
| Wainer, Howard | 3 |
| Bennett, Randy Elliot | 2 |
| Brandenburg, Dale C. | 2 |
| Ebel, Robert L. | 2 |
| Farr, Roger | 2 |
| Fitzpatrick, Anne R. | 2 |
| Hakstian, A. Ralph | 2 |
| Hambleton, Ronald K. | 2 |
| Kansup, Wanlop | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 5 |
| Practitioners | 2 |
Showing 1 to 15 of 147 results
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…
Descriptors: Test Length, Test Items, Alignment (Education), Models
Pommerich, Mary – Journal of Educational Measurement, 2006
Domain scores have been proposed as a user-friendly way of providing instructional feedback about examinees' skills. Domain performance typically cannot be measured directly; instead, scores must be estimated using available information. Simulation studies suggest that IRT-based methods yield accurate group domain score estimates. Because…
Descriptors: Test Validity, Scores, Simulation, Evaluation Methods
Wise, Steven L.; DeMars, Christine E. – Journal of Educational Measurement, 2006
The validity of inferences based on achievement test scores is dependent on the amount of effort that examinees put forth while taking the test. With low-stakes tests, for which this problem is particularly prevalent, there is a consequent need for psychometric models that can take into account differing levels of examinee effort. This article…
Descriptors: Guessing (Tests), Psychometrics, Inferences, Reaction Time
Peer reviewedEnright, Mary K.; Rock, Donald A.; Bennett, Randy Elliot – Journal of Educational Measurement, 1998
Examined alternative-item types and section configurations for improving the discriminant and convergent validity of the Graduate Record Examination (GRE) general test using a computer-based test given to 388 examinees who had taken the GRE previously. Adding new variations of logical meaning appeared to decrease discriminant validity. (SLD)
Descriptors: Admission (School), College Entrance Examinations, College Students, Computer Assisted Testing
Peer reviewedAllalouf, Avi; Ben-Shakhar, Gershon – Journal of Educational Measurement, 1998
Examined how coaching affects the predictive validity and fairness of scholastic aptitude tests. A coached (n=271) and uncoached (n=95) group were compared. Comparison revealed that although coaching enhanced scores on the Israeli Psychometric Entrance Test by about 25% of a standard deviation, it did not create a prediction bias or affect…
Descriptors: College Entrance Examinations, High School Students, High Schools, Higher Education
Peer reviewedVispoel, Walter P.; And Others – Journal of Educational Measurement, 1997
Efficiency, precision, and concurrent validity of results from adaptive and fixed-item music listening tests were studied using: (1) 2,200 simulated examinees; (2) 204 live examinees; and (3) 172 live examinees. Results support the usefulness of adaptive tests for measuring skills that require aurally produced items. (SLD)
Descriptors: Adaptive Testing, Adults, College Students, Comparative Analysis
Peer reviewedBennett, Randy Elliot; Sebrechts, Marc M. – Journal of Educational Measurement, 1997
A computer-delivered problem-solving task based on cognitive research literature was developed and its validity for graduate admissions assessment was studied with 107 undergraduates. Use of the test, which asked examinees to sort word-problem stems by prototypes, was supported by the findings. (SLD)
Descriptors: Admission (School), College Entrance Examinations, Computer Assisted Testing, Graduate Study
Peer reviewedSykes, Robert C.; Ito, Kyoko; Fitzpatrick, Anne R.; Ercikan, Kadriye – Journal of Educational Measurement, 1997
The five chapters of this report provide resources that deal with the validity, generalizability, comparability, performance standards, and fairness, equity, and bias of performance assessments. The book is written for experienced educational measurement practitioners, although an extensive familiarity with performance assessment is not required.…
Descriptors: Educational Assessment, Measurement Techniques, Performance Based Assessment, Standards
Peer reviewedBridgeman, Brent; Morgan, Rick; Wang, Ming-mei – Journal of Educational Measurement, 1997
Test results of 915 high school students taking a history examination with a choice of topics show that students were generally able to pick the topic on which they could get the highest score. Implications for fair scoring when topic choice is allowed are discussed. (SLD)
Descriptors: Essay Tests, High School Students, History, Performance Factors
Peer reviewedLane, Suzanne; And Others – Journal of Educational Measurement, 1996
Evidence from test results of 3,604 sixth and seventh graders is provided for the generalizability and validity of the Quantitative Understanding: Amplifying Student Achievement and Reasoning (QUASAR) Cognitive Assessment Instrument, which is designed to measure program outcomes and growth in mathematics. (SLD)
Descriptors: Achievement Tests, Cognitive Processes, Elementary Education, Elementary School Students
Peer reviewedSawyer, Richard – Journal of Educational Measurement, 1996
Decision theory is a useful method for assessing the effectiveness of the components of a course placement system. The effectiveness of placement tests or other variables in identifying underprepared students is described by the conditional probability of success in a standard course. Estimating the conditional probability of success is discussed.…
Descriptors: College Students, Estimation (Mathematics), Higher Education, Mathematical Models
Peer reviewedMedley, Donald M.; Quirk, Thomas J. – Journal of Educational Measurement, 1974
Descriptors: Blacks, Comparative Analysis, Culture Fair Tests, Item Analysis
Peer reviewedCrehan, Kevin D. – Journal of Educational Measurement, 1974
Various item selection techniques are compared on criterion-referenced reliability and validity. Techniques compared include three nominal criterion-referenced methods, a traditional point biserial selection, teacher selection, and random selection. (Author)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Item Banks
Peer reviewedHaladyna, Thomas Michael – Journal of Educational Measurement, 1974
Classical test construction and analysis procedures are applicable and appropriate for use with criterion referenced tests when samples of both mastery and nonmastery examinees are employed. (Author/BB)
Descriptors: Criterion Referenced Tests, Item Analysis, Mastery Tests, Test Construction

Direct link
