ERIC Number: ED566890
Record Type: Non-Journal
Publication Date: 2016
Evaluating Equity at the Local Level Using Bootstrap Tests. Research Report 2016-4
Kim, YoungKoung; DeCarlo, Lawrence T.
Because of concerns about test security, different test forms are typically used across different testing occasions. As a result, equating is necessary in order to get scores from the different test forms that can be used interchangeably. In order to assure the quality of equating, multiple equating methods are often examined. Various equity properties have been used to assess the adequacy of different equating methods. The current study proposes a method that evaluates equity properties using the bootstrap technique, which allows for a statistical test of equity at the local level without making any distributional assumptions. The approach is particularly helpful for high-stakes assessments where the determination of cut scores takes on great importance and has implications for practice. The study demonstrates the bootstrap tests using a random sample from a pre-2015 PSAT/NMSQT® test administration in which the scores are used to determine eligibility for scholarship for students. The equity properties for the assessment, in particular the cut scores in which the scholarships are determined, are examined within an IRT framework. In terms of equating design, the current study focuses on the common item nonequivalent groups equating design. Four equating methods--IRT true score, IRT observed score, frequency estimation (FE), and chained equipercentile (CE) method--are compared. Although the results of the bootstrap test revealed great utility to assess equity properties, it is also important to notice that the results are limited to the specific sample data from the pre-2015 PSAT/NMSQT test administration in which the ability difference between new and old forms was relatively large due to the difference in the target populations. In addition, the equating design for the current study was the common nonequivalent design. Thus, simulation studies that include different equating designs and different levels of examinee ability differences (between old and new forms) should be examined in future studies. Sensitivity analysis can be conducted to assess the efficacy of the bootstrap test.
Descriptors: Equated Scores, Evaluation Methods, Sampling, Statistical Inference, High Stakes Tests, Aptitude Tests, College Entrance Examinations, High School Students, Cutting Scores, Item Response Theory, Computation, Comparative Analysis
College Board. 250 Vesey Street, New York, NY 10281. Tel: 212-713-8000; e-mail: email@example.com; Web site: http://research.collegeboard.org
Publication Type: Reports - Research; Numerical/Quantitative Data
Education Level: High Schools; Secondary Education; Higher Education; Postsecondary Education
Authoring Institution: College Board
Identifiers - Assessments and Surveys: National Merit Scholarship Qualifying Test; Preliminary Scholastic Aptitude Test