ERIC - Search Results

Publication Date

In 2024	0
Since 2023	1
Since 2020 (last 5 years)	2
Since 2015 (last 10 years)	5
Since 2005 (last 20 years)	12

Descriptor

Test Reliability	14
Foreign Countries	7
Psychometrics	7
Test Validity	7
College Students	6
Undergraduate Students	6
Item Response Theory	5
Scores	5
Correlation	4
Mathematics Tests	4
Rating Scales	4
Factor Analysis	3
Likert Scales	3
Measures (Individuals)	3
Test Items	3
Bayesian Statistics	2
Case Studies	2
Goodness of Fit	2
High School Students	2
Item Analysis	2
Questionnaires	2
Response Style (Tests)	2
Self Concept Measures	2
Self Esteem	2
Statistical Analysis	2
More ▼

Source

Educational and Psychological…

Publication Type

Journal Articles	14
Reports - Research	11
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

Higher Education	14
Postsecondary Education	8
High Schools	2
Secondary Education	2
Elementary Education	1
Grade 4	1
Intermediate Grades	1

Audience

Location

Canada	1
Colombia	1
Mexico (Mexico City)	1
Netherlands	1
Saudi Arabia	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Rosenberg Self Esteem Scale	2
Beck Depression Inventory	1
Marlowe Crowne Social…	1
Students Evaluation of…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing all 14 results Save | Export

Are Speeded Tests Unfair? Modeling the Impact of Time Limits on the Gender Gap in Mathematics

Peer reviewed

Direct link

Stoevenbelt, Andrea H.; Wicherts, Jelte M.; Flore, Paulette C.; Phillips, Lorraine A. T.; Pietschnig, Jakob; Verschuere, Bruno; Voracek, Martin; Schwabe, Inga – Educational and Psychological Measurement, 2023

When cognitive and educational tests are administered under time limits, tests may become speeded and this may affect the reliability and validity of the resulting test scores. Prior research has shown that time limits may create or enlarge gender gaps in cognitive and academic testing. On average, women complete fewer items than men when a test…

Descriptors: Timed Tests, Gender Differences, Item Response Theory, Correlation

A Short Note on Optimizing Cost-Generalizability via a Machine-Learning Approach

Peer reviewed

Direct link

Jiang, Zhehan; Shi, Dexin; Distefano, Christine – Educational and Psychological Measurement, 2021

The costs of an objective structured clinical examination (OSCE) are of concern to health profession educators globally. As OSCEs are usually designed under generalizability theory (G-theory) framework, this article proposes a machine-learning-based approach to optimize the costs, while maintaining the minimum required generalizability…

Descriptors: Artificial Intelligence, Generalizability Theory, Objective Tests, Foreign Countries

Survey Satisficing Inflates Reliability and Validity Measures: An Experimental Comparison of College and Amazon Mechanical Turk Samples

Peer reviewed

Direct link

Hamby, Tyler; Taylor, Wyn – Educational and Psychological Measurement, 2016

This study examined the predictors and psychometric outcomes of survey satisficing, wherein respondents provide quick, "good enough" answers (satisficing) rather than carefully considered answers (optimizing). We administered surveys to university students and respondents--half of whom held college degrees--from a for-pay survey website,…

Descriptors: Surveys, Test Reliability, Test Validity, Comparative Analysis

Improving the Factor Structure of Psychological Scales: The Expanded Format as an Alternative to the Likert Scale Format

Peer reviewed

Direct link

Zhang, Xijuan; Savalei, Victoria – Educational and Psychological Measurement, 2016

Many psychological scales written in the Likert format include reverse worded (RW) items in order to control acquiescence bias. However, studies have shown that RW items often contaminate the factor structure of the scale by creating one or more method factors. The present study examines an alternative scale format, called the Expanded format,…

Descriptors: Factor Structure, Psychological Testing, Alternative Assessment, Test Items

Developing a Measure of General Academic Ability: An Application of Maximal Reliability and Optimal Linear Combination to High School Students' Scores

Peer reviewed

Direct link

Dimitrov, Dimiter M.; Raykov, Tenko; AL-Qataee, Abdullah Ali – Educational and Psychological Measurement, 2015

This article is concerned with developing a measure of general academic ability (GAA) for high school graduates who apply to colleges, as well as with the identification of optimal weights of the GAA indicators in a linear combination that yields a composite score with maximal reliability and maximal predictive validity, employing the framework of…

Descriptors: Foreign Countries, Academic Ability, Aptitude Tests, High School Students

Multilevel Higher-Order Item Response Theory Models

Peer reviewed

Direct link

Huang, Hung-Yu; Wang, Wen-Chung – Educational and Psychological Measurement, 2014

In the social sciences, latent traits often have a hierarchical structure, and data can be sampled from multiple levels. Both hierarchical latent traits and multilevel data can occur simultaneously. In this study, we developed a general class of item response theory models to accommodate both hierarchical latent traits and multilevel data. The…

Descriptors: Item Response Theory, Hierarchical Linear Modeling, Computation, Test Reliability

Investigating Halo and Ceiling Effects in Student Evaluations of Instruction

Peer reviewed

Direct link

Keeley, Jared W.; English, Taylor; Irons, Jessica; Henslee, Amber M. – Educational and Psychological Measurement, 2013

Many measurement biases affect student evaluations of instruction (SEIs). However, two have been relatively understudied: halo effects and ceiling/floor effects. This study examined these effects in two ways. To examine the halo effect, using a videotaped lecture, we manipulated specific teacher behaviors to be "good" or "bad"…

Descriptors: Robustness (Statistics), Test Bias, Course Evaluation, Student Evaluation of Teacher Performance

Immediate Feedback and Opportunity to Revise Answers to Open-Ended Questions

Peer reviewed

Direct link

Attali, Yigal; Powers, Don – Educational and Psychological Measurement, 2010

Two experiments examine the psychometric effects of providing immediate feedback on the correctness of answers to open-ended questions, and allowing participants to revise their answers following feedback. Participants answering verbal and math questions are able to correct many of their initial incorrect answers, resulting in higher revised…

Descriptors: Feedback (Response), Psychometrics, Test Anxiety, Error Correction

An Investigation of Calculator Use on Employment Tests of Mathematical Ability: Effects on Reliability, Validity, Test Scores, and Speed of Completion

Peer reviewed

Direct link

Bing, Mark N.; Stewart, Susan M.; Davison, H. Kristl – Educational and Psychological Measurement, 2009

Handheld calculators have been used on the job for more than 30 years, yet the degree to which these devices can affect performance on employment tests of mathematical ability has not been thoroughly examined. This study used a within-subjects research design (N = 167) to investigate the effects of calculator use on test score reliability, test…

Descriptors: Calculators, Mathematics Tests, Occupational Tests, Test Reliability

Measurement of Epistemological Beliefs: Psychometric Properties of the EQEBI Test Scores

Peer reviewed

Direct link

Ordonez, Xavier G.; Ponsoda, Vicente; Abad, Francisco J.; Romero, Sonia J. – Educational and Psychological Measurement, 2009

This article proposes a new test (called the EQEBI) for the measurement of epistemological beliefs, integrating and extending the Epistemological Questionnaire (EQ) and the Epistemic Beliefs Inventory (EBI). In Study 1, the two tests were translated and applied to a Spanish-speaking sample. A detailed dimensionality exploration, by means of the…

Descriptors: Epistemology, Beliefs, Tests, Spanish Speaking

Setting the Response Time Threshold Parameter to Differentiate Solution Behavior from Rapid-Guessing Behavior

Peer reviewed

Direct link

Kong, Xiaojing J.; Wise, Steven L.; Bhola, Dennison S. – Educational and Psychological Measurement, 2007

This study compared four methods for setting item response time thresholds to differentiate rapid-guessing behavior from solution behavior. Thresholds were either (a) common for all test items, (b) based on item surface features such as the amount of reading required, (c) based on visually inspecting response time frequency distributions, or (d)…

Descriptors: Test Items, Reaction Time, Timed Tests, Item Response Theory

The Munroe Multicultural Attitude Scale Questionnaire: A New Instrument for Multicultural Studies

Peer reviewed

Direct link

Munroe, Arnold; Pearson, Carolyn – Educational and Psychological Measurement, 2006

Institutions of higher education want to diversify their learning climates, and many offer courses in multiculturalism, yet these courses still do not meet the needs of attitudinal change. A new instrument was developed, the Munroe Multicultural Attitude Scale Questionnaire (MASQUE), that was theoretically based in Banks's transformative approach,…

Descriptors: Higher Education, Colleges, Data Analysis, Test Reliability

Reliability and Validity Evidence for the Institutional Integration Scale

Peer reviewed

Direct link

French, Brian F.; Oakes, William – Educational and Psychological Measurement, 2004

The Institutional Integration Scale is claimed to measure five facets of college student academic and social integration. The scale was based on Tintos model of college student withdrawal. Psychometric properties of the scale were examined based on a sample of 1st-year college students. These results led to item revisions and additions. The scale…

Descriptors: Measures (Individuals), Psychometrics, Social Integration, Test Validity

Impact of the Number of Response Categories and Anchor Labels on Coefficient Alpha and Test-Retest Reliability

Peer reviewed

Direct link

Weng, Li-Jen – Educational and Psychological Measurement, 2004

A total of 1,247 college students participated in this study on the effect of scale format on the reliability of Likert-type rating scales. The number of response categories ranged from 3 to 9. Anchor labels on the scales were provided for each response option or for the end points only. The results indicated that the scales with few response…

Descriptors: Rating Scales, Test Reliability, Foreign Countries, College Students

Privacy | Copyright | Contact Us | Selection Policy | API

AL-Qataee, Abdullah Ali	1
Abad, Francisco J.	1
Attali, Yigal	1
Bhola, Dennison S.	1
Bing, Mark N.	1
Davison, H. Kristl	1
Dimitrov, Dimiter M.	1
Distefano, Christine	1
English, Taylor	1
Flore, Paulette C.	1
French, Brian F.	1
Hamby, Tyler	1
Henslee, Amber M.	1
Huang, Hung-Yu	1
Irons, Jessica	1
Jiang, Zhehan	1
Keeley, Jared W.	1
Kong, Xiaojing J.	1
Munroe, Arnold	1
Oakes, William	1
Ordonez, Xavier G.	1
Pearson, Carolyn	1
Phillips, Lorraine A. T.	1
Pietschnig, Jakob	1
Ponsoda, Vicente	1
More ▼