Monday, June 11, 2007

composite reliability-the reliability of each composite

Weakness of Cronbach's alpha

  • traditional reliability measures (e. g., Cronbach's alpha is obtained under the assumption of parallelity, i. e. all factor loadings are constrained to be equal, and all error variances are constrained to be equal). equal factor loadings for indicators and in some cases equal error variances,
  • reliability represents the portion of the item variance that is systematic
  • Coefficient alpha comes out of the classical test theory model. This model assumes uncorrelated errors of measurement
  • measures of reliability (and, by extension, validity) are population specific
  • Alpha underestimates the reliability of congeneric measures.
  • Cronbach’s α provides the lower-bound estimate for the composite score reliability.
  • alpha equals the classical reliability coefficient [Rho] (alpha = RhO). This conditions satisfy the assumptions of parallelism, tau-equivalence. Alpha is suspect as an estimate of reliability in the presence of correlated error.
  • Cronbach’s α may over- or underestimate reliability (Raykov, 1997, 1998a).
  • α is usually interpretable as a lower bound for reliability (with uncorrelated errors).
    Thus, a method is needed that would allow the researcher to evaluate more accurately reliability of the composite score, eg., methods allow obtaining a standard error for the composite’s reliability and a confidence interval for it, which represents a range of plausible values of its population counterpart. Raykov & Shrout (2002)'s paper suggested bootstrapping method
  • coefficient alpha could be computed using standard formulae after first obtaining anMLestimate of the covariance matrix
  • Cronbach’s α, its use with multidimensional measures is limited: Only if the error terms are uncorrelated and scale indicators are essentially τ equivalent will Cronbach’s α estimate the reliability of a scale score correctly (Raykov, 1998).
  • underestimation of the true reliability may become serious when the test is not unidimensional. This would be a concern when one is interested in estimating the reliability of multidimensional scale scores. For example, one might have a multidimensional scale to measure the self-efficacy for collegiate athletes (Czerniack, 2002), where the scale is composed of three dimensions of the self-efficacy (athletic, academic, and social life). In this example, one might be interested in measuring the overall self-efficacy as a summation of the scores from all three dimensions, as well as in measuring the self-efficacy for three dimensions separately. It is clear that Coefficient α for the total scale scores will underestimate the true reliability, because of the multidimensionality nature of the scale. when the scale was multidimensional, most of the reliability coefficients, including Coefficient-alpha, underestimated the true reliability of the scale. This underestimation was more prominent when the correlation between dimensions was lower. Therefore, it is not appropriate to use Coefficient-alpha as an estimate of the reliability of a multidimensional composite scale score, unless the correlation between dimensions is high.
    Akihito Kamata, Ahmet Turhan, Eqbal Darandari (2003) Estimating Reliability for
    Multidimensional Composite Scale Scores. Paper presented at the annual meeting of American Educational Research Association, Chicago, April 2003.
  • Alpha is not a "desirable" estimate of reliability of a scale. The main reason that Bollen opposes alpha is that it "makes no allowances for correlated error of measurements, nor does it treat indicators influenced by more than one latent variable" (Bollen, 1989, p.221).
  • Bollen recommends using the Coefficient of Determination as an alternative good measure of the reliability of a composite scale comprised of a number of manifest variables, squared multiple correlation for xi, Not many publications that actually use the alternative that Bollen has advocated to measure reliability.
  • alpha is a lower bound to the reliability of an unweighted scale of N items. Cronbach's alpha is only a lower bound on reliability. So the actual reliability of a set of congeneric measures can be higher than alpha. In other words, alpha provides a conservative estimate of a measure's reliability. Cronbach alpha yields an unbiased estimate of reliability only if the loadings on the common factor are equal. Cronbach alpha never overestiamtes the reliability of the composite score. Cronbach alpha equals composite score reliability only for essentially tau-equivalent tests. Cronbach alpha is an underestimate of composite reliability for more general congeneric tests. With alpha, the reseacher is gnerally left only with a lower bound of (thus possibly not sufficient information about) the reliability coefficient of interest with no evaluation of the extent of the bias. Bollen (1989, p217-218) discussed an alternative reliability formula that is not directly applicable in empirical research.
  • Alpha isn't really good for much in applications except rough estimation of a linear
    composite's classic reliability or, more generally, its common-factor communality.
  • Coefficient alpha, the most commonly used estimate of internal consistency, is often
    considered a lower bound estimate of reliability, though the extent of its underestimation is not typically known. Many researchers are unaware that coefficient alpha is based on the essentially tau-equivalent measurement model. It is the violation of the assumptions
    required by this measurement model that are often responsible for coefficient alpha’s
    underestimation of reliability.
    The most commonly used measure of internal consistency, coefficient alpha, is
    based on the essentially tau-equivalent measurement model, a measurement model
    that requires a number of assumptions to be met for the estimate to accurately reflect
    the data’s true reliability (Raykov, 1997a).Violation of these assumptions causes coefficient alpha to underestimate the true reliability of the data (Miller, 1995).
    Estimates of reliability within Classical test theory assume that all observed variables measure a single latent true variable. Many researchers erroneously believe that reliability provides a measure of test unidimensionality. In actuality, reliability assumes that unidimensionality exists (Miller, 1995). Failure to meet the assumption of unidimensionality will result in an inaccurate and often misleading estimate of reliability.
    Coefficient alpha underestimates the reliability of test scores when the test violates the assumption of tau-equivalence. Specifically, the larger the violation of tau-equivalence that occurs, the more coefficient alpha underestimates score reliability. Both the present example and previous work (Raykov, 1997b) have demonstrated that the presence of even a single item that is not tau-equivalent to the other items can have a dramatic impact on the accuracy of coefficient alpha; however, the impact that violating the assumption of tau-equivalence can have is also dependent on a number of other factors.
    Tests with a greater number of items are less vulnerable to underestimation when tau-equivalence is violated than tests with only a small number of items (Raykov, 1997b). This is due to the fact that, when a single item violates tau-equivalence, the proportion of true score variance that is congeneric to the other item true scores is smaller when one has a greater number of items than when one has fewer items.
    James M. Graham (2006) Congeneric and (Essentially) Tau-Equivalent Estimates of
    Score Reliability :What They Are and How to Use Them. Educational and Psychological Measurement, 66 (6), 930-944
  • Amos output gives the squared multiple correlations (SMC), for the observed and for the latents variables. Amos user's guide define "a variable's squared multiple correlation is the proportion of its variance that is accounted for by its predictors" and "...it as to be regarded as a lower bound estimate of the reliability". SMC is different from composite reliability. SMC = 1 - ( error variance / total variance ). SMC in the conventional model may be biased and somewhat uninterpretable, in some situations. Square multiple correlations of the indicators are a measure of the reliabilityof them. It is like the square of the factor loading of a exploratory factorial analysis, i.e, the common variance between the indicator and the factor (in this case, the lantent variable). We want high values of square multiple correlations of the items, because we will have higher values of composite
    reliability and AVE.
  • The square multiple correlation (SMC) for the latent variable has the same interpretation
    as in regression analysis. It is a measure of the capability of your model to explain the dependent variable.
  • If you have low value of SMC for a endogenous variable, your model explainlow variance of the dependent variable.
  • Convergent validity also requires that SMCs be equal to or greater than .5 along with pattern coeffieicnts equal to or greater than .7.
  • Reliability (as assessed by Cronbach's alpha) is based on indicators intercorrelations: the higher they are, the higher alpha is. This is the reason why it has been assumed that alpha might be an index of unidimensionality, actually it is NOT. High alpha doesn't guarantee unidimensionality.
  • Raykov (2001) has demonstrated that Cronbach's alpha can be both an under- or over-estimate of reliability, contingent upon the amount of residual covariance amongst the congeneric indicators. Thus, the more unmodelled residual covariance, the more Cronbach's alpha will be an overestimate of reliability.
  • We must reverse the scoring of that reversed item before assessing reliability with alpha.
  • In SEM terms, the reliability of an indicator is defined as the variance in that indicator that is not accounted for by measurement error. It is commonly represented by the squared multiple correlation coefficient, which ranges from 0 to 1 (Bollen, 1989; Jöreskog & Sörbom, 1993a). However, because these coefficients are standardized, they are not useful for comparing reliability across subpopulations.

Composite reliability

  • composite reliability is a measure of the overall reliability of a collection of heterogeneous
    but similar items
  • individual item reliability (test the reliability of the items using Croinbach Alpha )vs. composite reliability (of the construct, the latent variable)
  • The factor loadings are simply the correlation of each indicator with the composite
    (construct factor), and the factor correlations are oblained by correlating the composites.
  • calculate composite relaibility for the latent variables, LISREL does not output the "composite reliability" directly. You have to calculate it by hand.
  • SEM approach for reliability analysis, the reliability estimate from the SEM approach tends to be higher than Cronbach’s α. Structural equation model for estimating
    the reliability for the composite consisting of congeneric measures.
  • Composite reliability--- a measure of scale reliability, Composite reliability assesses the internal consistency of a measure, 2 means square, see Fornell & Larcker (1981)

(sum of standardized loading) 2 / [(sum of standardized loading) 2 + sum of indicator measurement error (the sum of the variance due to random measurement
error for each loading-- 1 minus the square of each loading ]

Let A be the standardized loadings for the indicators for a particular latent variable. Let B be the corresponding error terms, where error is 1 minus the reliability of the indicator; the reliability of the indicator is the square of the indicator's standardized loading.
The reliability of a measure is that part containing no purely random error
(Carmines & Zeller, 1979). In SEM terms, the reliability of an indicator is defined
as the variance in that indicator that is not accounted for by measurement error. It is
commonly represented by the squared standardized multiple correlation coefficient, which
ranges from 0 to 1 (Bollen, 1989; Jöreskog & Sörbom, 1993a). However, because
these coefficients are standardized, they are not useful for comparing reliability
across subpopulations.

composite reliability = [SUM(A)] 2 /[(SUM(A)] 2 + SUM(B).

  • Example, Suppose I have a construct with three indicators, i1, i2 and i3. When I run this construct in AMOS I get as standardized regression weights: 0.7, 0.8 and 0.9. For computing the composite reliability, I just make:
    CR = (sum of standardized loading) 2 / (sum of standardized loading) 2 + sum of indicator measurement error)
    CR = (0.7 + 0.8 + 0.9)2 / ((0.7 + 0.8 + 0.9)2 + (1-0.49 + 1-0.64 + 1-0.81)
    CR = (5.76)/(5.76 + 1.06)
    CR = 0.844
  • Average variance extracted (AVE), see Fornell & Larcker (1981),
    The variance extracted estimate, which measures the amount of variance captured by a construct in relation to the variance due to random measurement error

sum of squared standardized loading / sum of squared standardized loading + sum of indicator measurement error--sum of the variance due to random measurement error in each loading=1 minus the square of each loading )

variance extracted = [(SUM(A 2)]/[(SUM(A 2) + SUM(ei))].

  • Example,
    AVE = (sum of squared standardized loading) / (sum of squared standardized loading + sum of indicator measurement error)
    AVE = (0.49 + 0.64 + 0.81)/((0.49 + 0.64 + 0.81) + (1-0.49 + 1-0.64 + 1-0.81) AVE = (1.94)/(1.94+1.06)
    AVE = 0.647
  • The composite reliability will certainly be higher than alpha, unless your items are tau-equivalent.
  • To test the reliability of the constructs, reviewers suggested us to report composite reliability (CR) and Avergage Variance extracted (AVE) instead of Cronbach-alpha (internal consistency of measures). Composite reliability is like the reliability of a summated scale and average variance extracted is the variance in the indicators explained by the common factor, average trait-related variance extracted. AVE's above 0.5 are treated as indications of convergent validity. This use of CR and AVE emanates from the two-step procedure recommended in Anderson and Gerbing (1988), Psych. Bulletin article.
  • AVE varies from 0 to 1, and it represents the ratio of the total variance that is due to the latent variable. According to Dillon and Goldstein and Bagozzi (1991), a variance extracted of greater than 0.50 indicates that the validity of both the construct and the individual variables is high.
  • The composite reliability estimates the extent to which a set of latent construct indicators share in their measurement of a construct, whilst the average variance extracted is the amount of common variance among latent construct indicators (Hair et al., 1998).
  • composite reliability is like the reliability of a summated scale and average variance extracted is the variance in the indicators explained by the common factor.
  • In assessing reliability it is worth also reporting composite reliability, average variance extracted and unidimensionality (if it is stated in theory - some consider unidimensionality a validity test though).
  • Construct reliability (i.e., the degree to which the scale indicators reflect an underlying factor) ; the degree to which the scale score reflects one particular factor
  • Composite reliability (i.e., the total amount of true score variance in relation to the total scale score variance); the amount of scale score variance that is accounted for by all underlying factors; Composite reliability thus corresponds to the conventional notion of reliability in terms of classical test theory
  • Both construct reliability and composite reliability can be computed using the pattern coefficients estimated by exploratory or confirmatory factor analyses.
  • Confidence intervals for composite reliability can be calculated by the bootstrap method to represent a range of plausible values of their population counterparts, thus allowing the hypothesis to be tested that the reliability coefficient in question is “generated” by a specific population value. null hypothesis that composite reliability was smaller than or equal to .80
    For example, it is possible to test the hypothesis that the reliability estimate obtained belongs to a population in which the reliability is larger than .70 or .80, which Nunnally and Bernstein (1994) regard as the benchmark for acceptable reliability values. The corresponding null hypothesis is that the population value is smaller than or equal to the respective benchmark value of .70 or .80.
    Martin Brunner and Heinz-Martin SÜß (2005) Analyzing the Reliability of Multidimensional Measures: An Example from Intelligence Research, Educational and Psychological Measurement. 65, 227-240
  • compute three types of reliability : individual item reliability, composite reliability of the overall scale, and the average variance extracted from the subscales
  • The individual item reliability of the subscales is "squared standardized factor loading",the overall reliability of the whole scale is the "composite reliability"
  • Composite reliability should be equal to or greater than .7 and AVE should be greater than .5.
  • Composite reliability above the 0.70 threshold and an extracted variance above the 0.50 threshold recommended by Hair et al. (1998).
    Hair, J.F. Jr., Anderson, R.E., Tatham, R.L., & Black W. C. (1998). Multivariate Data
    Analysis (5th ed.). Upper Saddle River, New Jersey: Prentice Hall. Dillon, W., & Goldstein, M. (1984). Multivariate analysis: Methods and applications. New York:vWiley.
    Bagozzi, R. P. (1991). Further thoughts on the validity of measures of elation, gladness, and joy. Journal of Personality and Social Psychology, 61, 98–104.
    AVE values greater than .50 are considered satisfactory in that they indicate that at least
    50% of the variance in a measure is due to the hypothesized underlying trait (eg., Fornell & Larcker, 1981).
  • TSENG et al., (2006) suggested that composite reliability should be great than 0.6. The average variance extracted should be greater than 0.5.
  • The chi-square difference test is simply a way of checking the discriminant validity between two factors.
  • reliable Alpha coefficients exceeded 0.7, the minimum cutoff score (Nunnally 1978; Nunnally and Bernstein 1994). Composite reliability is also used to check the internal consistency, which should be greater than the benchmark of 0.7 to be considered adequate (Fornell and Larcker 1981).
  • To evaluate discriminant validity, the average variance extracted (AVE) is used. All constructs have an AVE of at least 0.5 (Fornell and Larcker 1981).
  • PLS-Graph reports composite reliability (CR) and average variance extracted (AVE)
    for content validity and discriminant validity.
    Dan J. Kim & Yujong Hwang (2006) A Study of Mobile Internet Usage from Utilitarian and Hedonic User Tendency Perspectives, Proceedings of the Twelfth Americas Conference on Information Systems, Acapulco, Mexico August 04th-06th 2006
  • standard formula for calculating the reliability of a composite from the reliabilities of the subtests
  • You can take into account the unreliability of single indicator-latent variables by fixing the error variance to (1-reliability) x variance of the measured variable.
  • It is possible to have a poor variance extrated, yet have a high construct reliability.
  • A composite is acombination of variables (measurement error and all). Items with lower reliability are required in greater numbers to make a reliable summated scale; items with higher reliability are required in fewer numbers.
  • "Fit" in the chi-square sense and composite reliability have very little connection. Composite reliability assesses the strength of relations, while fit assesses the *pattern* of relations. The composite reliability formula *assumes* that fit is good (so that parameter estimates are interpretable), while fit assessment actually evaluates fit.
  • You may obtain good composite reliability numbers when individual R2's are low.

Useful links:

Reading lists:

  • Bacon, D.R., Sauer, P.L., & Young, M. (1995), Composite reliability in structural equations modeling. Educational and Psychological measurement, 55 (3), 394-406
  • Bagozzi, R.P. & Heatherton, T.F. (1994). A general approach to representing multifaceted personality constructs: Application to state self-esteem. Structural Equation Modeling, 1, 35-67.
  • Bagozzi, R.P. (1994) Structural equation models in marketing research: Basic principles. In R.P. Bagozzi (Ed.), Principles of marketing research (pp. 317–385). Oxford: Blackwell.
  • Chen, Jingxian & Nozer D. Singpurwalla (1996), The Notion of "Composite Reliability" and Its Hierarchical Bayes Estimation, Journal of the American Statistical Association, Vol. 91, No. 436. (Dec., 1996), pp. 1474-1484.
  • Cortina, J. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104
  • Dillon, W., & Goldstein, M. (1984). Multivariate analysis: Methods and applications.
    New York:vWiley.
  • ENDERS, CRAIG K. (2004). THE IMPACT OF MISSING DATA ON SAMPLE RELIABILITY ESTIMATES: IMPLICATIONS FOR RELIABILITY REPORTING PRACTICES. Educational and Psychological Measurement, Vol. 64 No. 3, June 2004, 419-436
  • Fornell and Larker (1981) Evaluating Structural Equation Models with Unobservable Variables and Measurement Error. Journal of Marketing.
  • Green, S.B., Lissitz, R.W., & Mulaik, S.A. (1977). Limitations of coefficient alpha as an index of test unidimensionality. Educational and psychological measurement, 37, 827-838.
  • Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis. Englewood Cliffs, NJ: Prentice-Hall.
  • Hancock, G. R., & Mueller, R. 0. (2001). Rethinking construct reliability within latent variable systems. In R. Cudeck, S. du Toit, & D. Srbom (Eds.), Structural Equation Modeling: Present and Future -- Festschrift in honor of Karl Jreskog (pp. 195-216). Lincolnwood, IL: Scientific Software International, Inc.
  • Komaroff, E. (1997). Effect of simultaneous violations of essential tau-equivalence and uncorrelated error on coefficient alpha. Applied Psychological Measurement, 21(4)
  • Linda Crocker and Jim Algina, Introduction to Classical and Modern Test Theory
  • John Campbell, Frank Landy,& Sheldon Zedeck, Measurement theory for the Behavioral Sciences
  • Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling. Structural Equation Modeling, 2(3), 255-273.
  • Netemeyer, R.G., & Johnston, M.W., & Burton, S. 1990. Analysis of role conflict and role ambiguity in a structural equations framework. Journal of Applied Psychology, 75: 148-157. Netemeyer et al.'s procedure was as follows: with only factors that have low VE (less than .5), they conducted a chi-square difference test between a unconstrained model and a constrained model (with a unity correlation between factors).
  • Nunnally, J.C. Psychometric Theory, (2nd ed.) McGraw-Hill, New York, 1978.
  • Nunnally, J.C., and Bernstein, I.H. Psychometric Theory, (3rd ed.) McGraw-Hill, New York, 1994.
  • Okazaki, Shintaro (2007), Lessons learned from i-mode: What makes consumers click wireless banner ads? Computers in Human Behavior 23 (2007) 1692–1719
  • Raykov, T. (2004). Estimation of maximal reliability: A note on a covariance structure modelling approach. British Journal of Mathematical & Statistical Psychology, 57, 21-27
  • Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173-184. Raykov (1997) paper shows how to obtain (by using structural equation modeling) an estimator of reliability that does not possess the
    general underestimation property of Cronbach's Alpha. Tenko Raykov describes how to do composite reliability in any program.
  • Raykov, T. (1997). Scale reliability, Cronbach's Coefficient Alpha, and violations of essential tau-equivalence with fixed congeneric components. Multivariate Behavioral
    Research, 32, 329-353.
  • Raykov, Tenko (1998), A method for obtaining standard erros and confidence intervals of composite reliability for congeneric items. Applied Psychological Measurement, 22 (4), 369-374. bootstrap method to obtain standard error and confidence interval for the reliability of composites of congeneric tests
  • Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated non-homogeneous items. Applied Psychological Measurement, 22, 375-385.
  • Raykov, Tenko, (2001) Bias of coefficient alpha for fixed congeneric measures with correlated errors. Applied Psychological Measurement, Vol 25(1), pp. 69-76.
  • Raykov,Tenko & Shrout,Patrick E. (2002) Reliability of Scales With General
    Structure: Point and Interval Estimation Using a Structural Equation Modeling
    Approach. STRUCTURAL EQUATION MODELING, 9(2), 195–212
  • Cronbach, L., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391-418.
  • Reuterberg, S.-E., & Gustafsson, J.-E. (1992). Confirmatory factor
    analysis and reliability: Testing measurement model assumptions. Educationaland Psychological Measurement, 52, 795-811
  • Streiner, D.L. (2003a). Being inconsistent about consistency: When coefficient alpha does and doesn’t matter. Journal of Personality Assessment, 80, 217-222.
  • Streiner, D.L. (2003b). Starting at the beginning: An introduction to coefficient alpha and
    internal consistency. Journal of Personality Assessment, 80, 99-103.
  • Werts, C. E., Rock, D. A., Linn, R. L.,&Joreskog, K. G. (1978). A general method of estimating the reliability of a composite. Educational and Psychological Measurement, 38, 933-938.
  • WEN-TA TSENG, ZOLTA´N DO¨ RNYEI, and NORBERT SCHMITT (2006)
    A New Approach to Assessing Strategic Learning: The Case of Self-Regulation
    in Vocabulary Acquisition. Applied Linguistics 27/1: 78–102
  • Zimmerman, D.W., Zumbo, B.D., & Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 33-49.

9 comments:

Wahyu said...

Wow! Good Articles..
If researchers read this articles, may be they will think twice to use alpha as psychometric property their instrument measure..

Wahyu said...

Wow! Good Articles..
If researchers read this articles, may be they will think twice to use alpha as psychometric property their instrument measure..

Unknown said...

very interesting I invite readers to know more about Item-response theory
which is much more interesting.

Danny said...

You must be a genius.
I had a very productive evening reading your post. May this comment find you in good health!

Anonymous said...

Hi! You quote 'Czerniack, 2002'. Could I know the entire reference?
Thanks!

Anonymous said...

Thank you for sharing this information!:D

Question:
With regards to examining composite reliability in AMOS, where could find the the values for the measurement error in AMOS output? Does the "Estimate" or the "S.E." heading represent the measurement error?

I would appreciate your help as I am stuck and cant continue with the formula.

Thank you in advance!

Anonymous said...

I have calculated composite reliability using composite calculator from the website. The composite reliability score in my case looks identical with coefficient alpha. Is there an explanation for that. Thanks

Anonymous said...

Thanks for the post - very good read.

Wondering - have you ever seen a case where the composite reliability score was lower than Cronbach's alpha?

Many thanks again

Joki said...

Thank you!

See
http://statwiki.kolobkreations.com/wiki/Main_Page

for the

Stats Tool Package

which automatically calculates CR and others:

http://www.kolobkreations.com/Stats%20Tools%20Package.xlsm