Publication Date
| In 2015 | 0 |
| Since 2014 | 0 |
| Since 2011 (last 5 years) | 0 |
| Since 2006 (last 10 years) | 1 |
| Since 1996 (last 20 years) | 6 |
Descriptor
| Test Reliability | 6 |
| Guessing (Tests) | 5 |
| Multiple Choice Tests | 4 |
| Objective Tests | 4 |
| Test Items | 3 |
| Test Length | 3 |
| Higher Education | 2 |
| Models | 2 |
| Scoring | 2 |
| Difficulty Level | 1 |
| More ▼ | |
Author
| Burton, Richard F. | 6 |
| Miller, David J. | 1 |
Publication Type
| Journal Articles | 6 |
| Reports - Descriptive | 3 |
| Reports - Research | 2 |
| Reports - Evaluative | 1 |
Education Level
| Higher Education | 1 |
Audience
Showing all 6 results
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2006
Many academic tests (e.g. short-answer and multiple-choice) sample required knowledge with questions scoring 0 or 1 (dichotomous scoring). Few textbooks give useful guidance on the length of test needed to do this reliably. Posey's binomial error model of 1932 provides the best starting point, but allows neither for heterogeneity of question…
Descriptors: Item Sampling, Tests, Test Length, Test Reliability
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Peer reviewedBurton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Item-discrimination indices are numbers calculated from test data that are used in assessing the effectiveness of individual test questions. This article asserts that the indices are so unreliable as to suggest that countless good questions may have been discarded over the years. It considers how the indices, and hence overall test reliability,…
Descriptors: Guessing (Tests), Item Analysis, Test Reliability, Testing Problems
Peer reviewedBurton, Richard F. – Assessment & Evaluation in Higher Education, 2001
Describes four measures of test unreliability that quantify effects of question selection and guessing, both separately and together--three chosen for immediacy and one for greater mathematical elegance. Quantifies their dependence on test length and number of answer options per question. Concludes that many multiple choice tests are unreliable…
Descriptors: Guessing (Tests), Mathematical Models, Multiple Choice Tests, Objective Tests
Burton, Richard F. – Assessment and Evaluation in Higher Education, 2005
Examiners seeking guidance on multiple-choice and true/false tests are likely to encounter various faulty or questionable ideas. Twelve of these are discussed in detail, having to do mainly with the effects on test reliability of test length, guessing and scoring method (i.e. number-right scoring or negative marking). Some misunderstandings could…
Descriptors: Guessing (Tests), Multiple Choice Tests, Objective Tests, Test Reliability
Peer reviewedBurton, Richard F.; Miller, David J. – Assessment & Evaluation in Higher Education, 1999
Discusses statistical procedures for increasing test unreliability due to guessing in multiple choice and true/false tests. Proposes two new measures of test unreliability: one concerned with resolution of defined levels of knowledge and the other with the probability of examinees being incorrectly ranked. Both models are based on the binomial…
Descriptors: Guessing (Tests), Higher Education, Multiple Choice Tests, Objective Tests

Direct link
