NotesFAQContact Us
Search Tips
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 23 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Warne, Russell T.; Doty, Kristine J.; Malbica, Anne Marie; Angeles, Victor R.; Innes, Scott; Hall, Jared; Masterson-Nixon, Kelli – Journal of Psychoeducational Assessment, 2016
"Above-level testing" (also called "above-grade testing," "out-of-level testing," and "off-level testing") is the practice of administering to a child a test that is designed for an examinee population that is older or in a more advanced grade. Above-level testing is frequently used to help educators design…
Descriptors: Test Items, Testing, Academically Gifted, Talent Identification
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Chubbuck, Kay; Curley, W. Edward; King, Teresa C. – ETS Research Report Series, 2016
This study gathered quantitative and qualitative evidence concerning gender differences in performance by using critical reading material on the "SAT"® test with sports and science content. The fundamental research questions guiding the study were: If sports and science are to be included in a skills test, what kinds of material are…
Descriptors: College Entrance Examinations, Gender Differences, Critical Reading, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Engelhard, George, Jr.; Kobrin, Jennifer L.; Wind, Stefanie A. – International Journal of Testing, 2014
The purpose of this study is to explore patterns in model-data fit related to subgroups of test takers from a large-scale writing assessment. Using data from the SAT, a calibration group was randomly selected to represent test takers who reported that English was their best language from the total population of test takers (N = 322,011). A…
Descriptors: College Entrance Examinations, Writing Tests, Goodness of Fit, English
Peer reviewed Peer reviewed
Direct linkDirect link
Wolkowitz, Amanda A.; Skorupski, William P. – Educational and Psychological Measurement, 2013
When missing values are present in item response data, there are a number of ways one might impute a correct or incorrect response to a multiple-choice item. There are significantly fewer methods for imputing the actual response option an examinee may have provided if he or she had not omitted the item either purposely or accidentally. This…
Descriptors: Multiple Choice Tests, Statistical Analysis, Models, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moses, Tim; Liu, Jinghua; Tan, Adele; Deng, Weiling; Dorans, Neil J. – ETS Research Report Series, 2013
In this study, differential item functioning (DIF) methods utilizing 14 different matching variables were applied to assess DIF in the constructed-response (CR) items from 6 forms of 3 mixed-format tests. Results suggested that the methods might produce distinct patterns of DIF results for different tests and testing programs, in that the DIF…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Santelices, Maria Veronica; Wilson, Mark – Educational and Psychological Measurement, 2012
The relationship between differential item functioning (DIF) and item difficulty on the SAT is such that more difficult items tended to exhibit DIF in favor of the focal group (usually minority groups). These results were reported by Kulick and Hu, and Freedle and have been enthusiastically discussed by more recent literature. Examining the…
Descriptors: Test Bias, Test Items, Difficulty Level, Statistical Analysis
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Guo, Hongwen; Liu, Jinghua; Dorans, Neil; Feigenbaum, Miriam – ETS Research Report Series, 2011
Maintaining score stability is crucial for an ongoing testing program that administers several tests per year over many years. One way to stall the drift of the score scale is to use an equating design with multiple links. In this study, we use the operational and experimental SAT® data collected from 44 administrations to investigate the effect…
Descriptors: Equated Scores, College Entrance Examinations, Reliability, Testing Programs
Peer reviewed Peer reviewed
Direct linkDirect link
Adedoyin, O. O. – Educational Research and Reviews, 2010
This is a quantitative study, which attempted to detect gender bias test items from the Botswana Junior Certificate Examination in mathematics. To detect gender bias test items, a randomly selected sample of 4000 students responses to mathematics paper 1 of the Botswana Junior Certificate examination were selected from 36,000 students who sat for…
Descriptors: Test Items, Foreign Countries, Statistical Analysis, Gender Bias
Liu, Jinghua; Sinharay, Sandip; Holland, Paul W.; Feigenbaum, Miriam; Curley, Edward – Educational Testing Service, 2009
This study explores the use of a different type of anchor, a "midi anchor", that has a smaller spread of item difficulties than the tests to be equated, and then contrasts its use with the use of a "mini anchor". The impact of different anchors on observed score equating were evaluated and compared with respect to systematic…
Descriptors: Equated Scores, Test Items, Difficulty Level, Error of Measurement
Peer reviewed Peer reviewed
PDF on ERIC Download full text
von Davier, Alina A., Ed.; Liu, Mei, Ed. – ETS Research Report Series, 2006
This report builds on and extends existent research on population invariance to new tests and issues. The authors lay the foundation for a deeper understanding of the use of population invariance measures in a wide variety of practical contexts. The invariance of linear, equipercentile and IRT equating methods are examined using data from five…
Descriptors: Equated Scores, Statistical Analysis, Data Collection, Test Format
Zhang, Yanling; Dorans, Neil J.; Matthews-López, Joy L. – College Board, 2005
Statistical procedures for detecting differential item functioning (DIF) are often used as an initial step to screen items for construct irrelevant variance. This research applies a DIF dissection method and a two-way classification scheme to SAT Reasoning Test™ verbal section data and explores the effects of deleting sizable DIF items on reported…
Descriptors: Test Bias, Test Items, Statistical Analysis, Classification
Peer reviewed Peer reviewed
Direct linkDirect link
Freedle, Roy O. – Harvard Educational Review, 2004
I see much to be pleased with in Dorans' interesting response to my article, "Correcting the SAT's Ethnic and Social-Class Bias: A Method for Reestimating SAT Scores." However, I need to deal with several unstated assumptions and errors that underlie his presentation. In the process of enumerating his covert assumptions, I will take up…
Descriptors: Aptitude Tests, Scores, Statistical Analysis, African American Students
Liu, Jinghua; Allspach, Jill R.; Feigenbaum, Miriam; Oh, Hyeon-Joo; Burton, Nancy – College Entrance Examination Board, 2004
This study evaluated whether the addition of a writing section to the SAT Reasoning Test™ (referred to as the SAT® in this study) would impact test-taker performance because of fatigue caused by increased test length. The study also investigated test-takers' subjective feelings of fatigue. Ninety-seven test-takers were randomly assigned to three…
Descriptors: College Entrance Examinations, Writing Skills, Fatigue (Biology), Influences
Liu, Jinghua; Feigenbaum, Miriam; Cook, Linda – College Entrance Examination Board, 2004
This study explored possible configurations of the new SAT® critical reading section without analogy items. The item pool contained items from SAT verbal (SAT-V) sections of 14 previously administered SAT tests, calibrated using the three-parameter logistic IRT model. Multiple versions of several prototypes that do not contain analogy items were…
Descriptors: College Entrance Examinations, Critical Reading, Logical Thinking, Difficulty Level
Previous Page | Next Page »
Pages: 1  |  2