|
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Effect Size; Test Bias; Item Analysis; Statistical Analysis; Sample Size; Research Design; Decision Making; Graphs; Scores
Abstract:
There are numerous statistical procedures for detecting items that function differently across subgroups of examinees that take a test or survey. However, in endeavouring to detect items that may function differentially, selection of the statistical method is only one of many important decisions. In this article, we discuss the important decisions that affect investigations of differential item functioning (DIF) such as choice of method, sample size, effect size criteria, conditioning variable, purification, DIF amplification, DIF cancellation, and research designs for evaluating DIF. Our review highlights the necessity of matching the DIF procedure to the nature of the data analysed, the need to include effect size criteria, the need to consider the direction and balance of items flagged for DIF, and the need to use replication to reduce Type I errors whenever possible. Directions for future research and practice in using DIF to enhance the validity of test scores are provided. (Contains 2 tables, 3 figures, and 1 note.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Kline, Rex B. |
Source: |
Educational Research and Evaluation, v19 n2-3 p204-222 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Factor Analysis; Social Justice; Psychometrics; Test Bias; Group Membership; Structural Equation Models; Culture Fair Tests; Error of Measurement; Statistical Analysis; Scores
Abstract:
Test fairness and test bias are not synonymous concepts. Test bias refers to statistical evidence that the psychometrics or interpretation of test scores depend on group membership, such as gender or race, when such differences are not expected. A test that is grossly biased may be judged to be unfair, but test fairness concerns the broader, more subjective evaluation of assessment outcomes from perspectives of social justice. Thus, the determination of test fairness is not solely a matter of statistics, but statistical evidence is important when evaluating test fairness. This work introduces the use of the structural equation modelling technique of multiple-group confirmatory factor analysis (MGCFA) to evaluate hypotheses of measurement invariance, or whether a set of observed variables measures the same factors with the same precision over different populations. An example of testing for measurement invariance with MGCFA in an actual, downloadable data set is also demonstrated. (Contains 4 tables, 1 figure, and 4 notes.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Zhou, Mingming |
Source: |
Educational Psychology, v33 n1 p1-13 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Academic Achievement; Item Analysis; Undergraduate Students; Goal Orientation; Prediction; Futures (of Society); Pretests Posttests; Scores; Reading Tests; Correlation; Statistical Analysis; Profiles
Abstract:
In this study, undergraduate students provided confidence ratings to predict future performance in answering questions drawn from the text before reading the text, after reading the text and after rereading the text. Self-reports of achievement goal orientations during reading and posttest scores were also collected. Student's calibration index was the comparison between their predicted posttest performance and actual performance in the posttest. Correlational analyses did not reveal any statistically detectable relationships between self-reported goal orientations and monitoring accuracy, except that bias scores were marginally related to goal orientations. Further cluster analyses and analyses of variance (ANOVA) also showed that student's multiple goal profiles failed to clearly differentiate the groups in terms of their calibration accuracy, yet performance-approach goals did distinguish overconfident from underconfident students. Plausible reasons for the finding were provided and implications for future research were also discussed. (Contains 3 tables.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-01-00 |
Pub Type(s): |
Journal Articles; Reports - Research |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Sexuality; Females; Well Being; Sexual Identity; Homosexuality; Depression (Psychology); Self Esteem; Least Squares Statistics; Interpersonal Attraction; Social Support Groups; Anxiety; Correlation; Scores; Prediction; Rating Scales
Abstract:
Identity-based conceptualizations of sexual orientation may not account adequately for variation in young women's sexuality. Sexual minorities fare worse in psychosocial markers of wellbeing (i.e., depressive symptoms, anxiety, self esteem, social support) than heterosexual youth; however, it remains unclear whether these health disparities exclusively affect individuals who adopt a sexual minority identity or if they also may be present among heterosexually-identified youth who report same-sex attractions. We examined the relationship between sexual attraction, sexual identity, and psychosocial wellbeing in the female only subsample (weighted, n = 391) of a national sample of emerging adults (age 18-24). Women in this study rated on a scale from 1 (not at all) to 5 (extremely) their degree of sexual attraction to males and females, respectively. From these scores, women were divided into 4 groups (low female/low male attraction, low female/high male attraction, high female/low male attraction, or high female/high male attraction). We explored the relationship between experiences of attraction, reported sexual identity, and psychosocial outcomes using ordinary least squares regression. The results indicated sexual attraction to be predictive of women's psychosocial wellbeing as much as or more than sexual identity measures. We discuss these findings in terms of the diversity found in young women's sexuality, and how sexual minority status may be experienced by this group.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Opinion Papers |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Evidence; Ethics; Validity; Theories; Test Interpretation; Test Use; Scores; Beliefs
Abstract:
According to Kane (this issue), "the validity of a proposed interpretation or use depends on how well the evidence supports" the claims being made. Because truth and evidence are distinct, this means that the validity of a test score interpretation could be high even though the interpretation is false. As an illustration, we discuss the case of phlogiston measurement as it existed in the 18th century. At face value, Kane's theory would seem to imply that interpretations of phlogiston measurement were valid in the 18th century (because the evidence for them was strong), even though amounts of phlogiston do not exist and hence cannot be measured. We suggest that this neglects an important aspect of validity and suggest various ways in which Kane's theory could meet this challenge.
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Kane, Michael T. |
Source: |
Journal of Educational Measurement, v50 n1 p115-122 Spr 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Opinion Papers |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Validity; Test Interpretation; Test Use; Scores; Inferences; Generalization; Test Results; Decision Making; Beliefs; Ethics; Evidence
Abstract:
This response to the comments contains three main sections, each addressing a subset of the comments. In the first section, I will respond to the comments by Brennan, Haertel, and Moss. All of these comments suggest ways in which my presentation could be extended or improved; I generally agree with their suggestions, so my response to their comments is brief. In the second section, I will respond to suggestions by Newton and Sireci that my framework be simplified by employing only one kind of argument, a validity argument, and dropping the interpretation/use argument (IUA); I am sympathetic to their desire for greater simplicity, but I see considerable value in keeping the IUA as a framework for the validation effort and will argue for keeping both the IUA and the validity argument. In the third section, I will respond to Borsboom and Markus, who raise a fundamental objection to my approach to validation, suggesting that I give too much attention to justification and too little to truth as a criterion for validity; I don't accept their proposed conception of validity, and I will indicate why. (Contains 1 note.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
Author(s): |
Kane, Michael T. |
Source: |
Journal of Educational Measurement, v50 n1 p1-73 Spr 2013 |
|
Pub Date: |
2013-00-00 |
Pub Type(s): |
Journal Articles; Reports - Evaluative |
Peer Reviewed: |
Yes |
|
|
|
Descriptors:
Test Interpretation; Validity; Scores; Test Use; Test Results; Construct Validity; Content Validity; Generalization; Performance Tests; Item Response Theory; Sampling; Inferences; Reliability; Evidence; Theories
Abstract:
To validate an interpretation or use of test scores is to evaluate the plausibility of the claims based on the scores. An argument-based approach to validation suggests that the claims based on the test scores be outlined as an argument that specifies the inferences and supporting assumptions needed to get from test responses to score-based interpretations and uses. Validation then can be thought of as an evaluation of the coherence and completeness of this interpretation/use argument and of the plausibility of its inferences and assumptions. In outlining the argument-based approach to validation, this paper makes eight general points. First, it is the proposed score interpretations and uses that are validated and not the test or the test scores. Second, the validity of a proposed interpretation or use depends on how well the evidence supports the claims being made. Third, more-ambitious claims require more support than less-ambitious claims. Fourth, more-ambitious claims (e.g., construct interpretations) tend to be more useful than less-ambitious claims, but they are also harder to validate. Fifth, interpretations and uses can change over time in response to new needs and new understandings leading to changes in the evidence needed for validation. Sixth, the evaluation of score uses requires an evaluation of the consequences of the proposed uses; negative consequences can render a score use unacceptable. Seventh, the rejection of a score use does not necessarily invalidate a prior, underlying score interpretation. Eighth, the validation of the score interpretation on which a score use is based does not validate the score use. (Contains 1 figure and 1 note.)
Note:The following two links
are not-applicable for text-based browsers or screen-reading software.
Show
Hide
Full Abstract
Related Items: Show Related Items
Full-Text Availability Options:
More Info:
Help |
Tutorial
Help Finding Full Text
|
More Info:
Help
Find in a Library
|
Publisher's website
|
|