Thursday, April 19, 2007

Assess whole SEM model--chi square and fit index

  • Global model fit tests produced by AMOS, these test statistics are still computed under the assumption of joint multivariate normality. In other words, these values will remain unchanged whether you use bootstrapping or not.
  • a chi square probability value greater than .05 indicates acceptable model fit
  • for a good model fit, we want the chi square to be insignificant
  • Penalty of model complexity--For a given set of data and variables, the goodness of fit of a more complex, highly parameterized model tends to be greater than for simpler models because of the loss of degrees of freedom of the complex model. Thus, a good model fit indicated by fit measures may result from 1) a correctly specified model that adequately represents the sample data or 2) a highly overparamerized model that accounts for the fit of the mdoel in the sample, regardless of whether there is a match between the specified model and the population covariance matrix.
  • chi square test functions as a statistical method for evaluating models, the fit indexes is more descriptive thatn statistical. Fit indexes describe and evaluate hte residuals that result from fitting a model to the data.
  • Hoyle and Panter (1995) recommend some indexes of overall model fit, unadjusted chi-squre, Satorra-Benter scaled chi squre, GFI, TLI (NNFI), IFI, CFI, RNI
Chi square test
  • The null hypothesis is -- the postulated model holds in the population, i.e., the implied (sample)covariance matrix = population covriance matrix. The researcher hopes NOT to reject the null hypothesis, in contrast to traditional statistical procedures. In contrast to traditional significance testing, the researcher usually prefers a nonsignificant chi-square (such a finding indicates that the predicted model is congruent with the observed data.). In practice, only the central chi square distribution is used to test the null hypothesis.
  •  The null hypothesis under test is that the model fits the data, so you hope to find a small, non-significant chi-square value for this test. 
  • Chi-squre and p-value-- the higher the probability level (p value) associated with chi square, the better the fit. Amos reports the value of chi-squre as CMIN. A significant chi-square indicates lack of satisfactory model fit. For example, based on a level of .o5, if the hypothized SEM model output shows p=.000, then suggesting the hypothesized model should be rejected, i.e., the hypothesized model is not adequate. If model chi-square < .05, the researcher's model is rejected. The smaller the chi-square, the better the model fit. If probability level of the analysis output is 0.05 or less, the departure of the data from the model is significantly at the .05 level. The chi square test offers only a dichotomous decision strategy implied by a statistical decision rule and can't be used to quantify the degree of fit along a continuum with some prespecified boundary.
  • Chi-square Statistics.  The adequacy of fit of a model is a messy issue in structural equation modeling at this time.  One possibility is to use the chi-square statistic.  The chi-square is a function of the differences between the observed covariances and the covariances implied by the model.  
  • The decision rule which might be applied is:  If the chi-square statistic is NOT significant, then the model fits the data adequately.  But if the chi-square statistic IS significant, then the model does not fit the data adequately.  So EFAer, CFAer, and SEMers hope for nonsignificance when measuring goodness-of-fit using the chi-square statistic.
  • Unfortunately, many people feel that the chi-square statistic is a poor measure of overall goodness-of-fit.  The main problem with it is that with large samples, even the smallest deviation of the data from the model being tested will yield a significant chi-square value. Thus, it’s not uncommon to ALWAYS get a significant chi-square.  
  • suppose the propose model has chi square 12.1; checking the statistic table, suppose that with the appropriate degrees of freedom the chi square required to reject the null hypothesis at the 0.01 level is 11.34; 12.1 is larger than 11.34--means reject the null hypothesis (H0: the implied correlations and the observed correlations are from the same population and that any differences are due to sampling error). --- thus the mdoel does not fit the data --------if chi square value exceeds the appropriate figure in the statistical tables then the model fails to fit the data; If the proposed model's chi square value is 2.9, this means that the proposed model can't be rejected, it doesn't mean that the proposed model is right, rather the proposed mdoel has not been shown that it is wrong
  • The analysis result of fit indexes are the same for unstandardized estimates and standardized estimates.
  • Reports of chi squre should be accompanied by degrees of freedom, sample size, and p-value. Example, χ2 (48, N=500)= 303.80, p < .001, TLI=.86, CFI=.90; or χ2 (15, N=2232)=10.91,p=.77 (some people recommend this because this provides more accurate information about the p value)
  • The χ2 associated with the model # is significant, χ2 (df, N=2232)=#, p=0.000, which suggests that the model is not consistent with the observed data.
    Nonsignificant— χ2 (15, N=2232)=10.91,p=.77, suggesting that the proposed model is consistent with the observed data
  • If p-value is smaller than .05, we reject the proposed model
    If p-value is higher than .05, we accept the proposed model
  • The χ2 associated with the model is significant, χ2 (df, N=2232)=###, p=0.000, which suggests that the model is not consistent with the observed data.
  • Nonsignificant— χ2 (15, N=2232)=10.91,p=.77, suggesting that the proposed model is consistent with the observed data.
  • chi-square statistic is used more as a descriptive index of fit, rather than as a statistical test. Smaller χ2 value indicates better fitting models and an insignificant χ2 is desirable.
  • Chi square is highly sensitive to departures from multivariate normality.
  • χ2 is sensitive to sample size. With large sample size, the chi-square values will be inflated (statistically significant), thus might erroneously implying a poor data-to-model fit (Schumacker & Lomax, 2004).
  • With small sample sizes, there may not be enough power to detect the differences betweeen several competing models using the chi square statistic for model selection or evaluation. At larger sample sizes, power is so high that even models with only trival misspecifications are likely to be rejected. As sample size increases, even very minor misspecifications can lead to poor model fit. Conversely, with small samples, models will tend to be accepted even in the face of considerable misspecification. In large, complex problems (i.e., problems in which there are many variables and degrees of freedom), the observed chisquare will nearly always be statistically significant, even when there is a reasonably good fit to the data. Chi-square test is strongly influenced by sample size. A poor fit based on a small sample size may result in a nonsignificant chi-square, whereas a good fit based on a large sample size will result in a significant chi-square. Thus, most applications of confirmatory factor analysis require a subjective evaluation of whether or not a statistically significant chi-square is small enough to constitute an adequate fit.
  • Relative chi-square, also called normal chi-square, is the chi-square fit index divided by degrees of freedom, in an attempt to make it less dependent on sample size.AMOS lists relative chi-square as CMIN/DF (chi squre/degree of freedom ratio). Wheaton (1987) advocated CMIN/DF not be used. In the range of 2 to 1 or 3 to 1 indicate acceptable fit between the hypothetical model and the sample data (Carmnines&McIver,1981). Different researchers have recommended using ratio as low as 2 or as high as 5 to indicate a reasonable fit (Marsh&Hocevar,1985). A chi-squre/df ratio larger than 2 indicates an inadequte fit (Byrne,1989). chi-square/df ratio values lower than 2 are widely considered to represent a minimally plausible model (Byrne,1991, The Maslach Burnout Inventory: validating factorial structure and invariance across intermediates, secondary, and university educators. Multivariate Behavioral Research, 26 (4), 583-605)
  • the smaller the Chi-square, the better the fit of the model. It has been suggested that a Chi-square two or three times as large as the degrees of freedom is acceptable (Carmines
    & McIver, 1981), but the fit is considered better the closer the Chi-square value
    is to the degrees of freedom for a model (Thacker, Fields & Tetrick, 1989). In
    the present sample, it was suggested that a ratio of 5 to 1 was “a useful rule
    of thumb” (Jackson et al., 1993, p. 755). -- cf Timothy R. Hinkin (1995)
    A Review of Scale Development Practices in the Study of Organizations.
    Journal of Management, Vol. 21, No. 5.967-988
  • However, Chi-square test may be misleading. 1) The more complex the model, the more likely a good fit (i.e., the closer the researcher's model is to being just-identified, the more likely good fit will be found). 2) The larger the sample size, the more likely the rejection of the model and the more likely a Type II error (rejecting something true). In very large samples, even tiny differences between the observed model and the perfect-fit model may be found significant. 3) The chi-square fit index is also very sensitive to violations of the assumption of multivariate normality. When this assumption is known to be violated, the researcher may prefer Satorra-Bentler scaled chi-square, which adjusts model chi-square for non-normality.
  • Unfortunately, many people feel that the chi-square statistic is a poor measure of overall goodness-of-fit.  The main problem with it is that with large samples, even the smallest deviation of the data from the model being tested will yield a significant chi-square value. Thus, it’s not uncommon to ALWAYS get a significant chi-square.  
  • Other Measures.  For this reason, researchers have resorted to examining a collection of goodness-of-fit statistics.  Byrne discusses the RMR and the standardized RMR, SRMR.  This is simply the square root of the differences between actual variances and covariances and variances and covariances generated assuming the model is true - the reconstructed variances and covariances.  The smaller the RMR and standardized RMR, the better.
AMOS output
  • You should always examine the Notes for Model section of the AMOS output after each AMOS analysis finishes because AMOS will display most errors and warnings in this section of the output. In the output shown above, AMOS reports that the minimum was achieved with no errors or warnings. The chi-square test of absolute model fit is reported, along with its degrees of freedom and probability value. The absence of errors or warnings in this section of the output means that it is safe for you to proceed to the next output section of interest, the Fit Measures output. 
  • Default model, contains the fit statistics for the model you specified in your AMOS Graphics diagram.
  • Saturated model and Independence model, refer to two baseline or comparison models automatically fitted by AMOS as part of every analysis. 
  • The Saturated model contains as many parameter estimates as there are available degrees of freedom or inputs into the analysis.The Saturated model is thus the least restricted model possible that can be fit by AMOS.
  •  By contrast, the Independence model is one of the most restrictive models that can be fit: it contains estimates of the variances of the observed variables only. In other words, the Independence model assumes all relationships between the observed variables are zero.
  • for final report -- Check Default model --- CMIN (chi-square= 76.1018), p (value), DF (degree of freedom)
  • If p= .000, the probability value of the chi-square test is smaller than the .05 level used by convention, you would reject the null hypothesis that the model fits the data. This conclusion is not good news for the researcher who hopes to fit this model to the dataset used in the example.
  • Because the chi-square test of absolute model fit is sensitive to sample size and non-normality in the underlying distribution of the input variables, investigators often turn to various descriptive fit statistics to assess the overall fit a model to the data. In this framework, a model may be rejected on an absolute basis, yet a researcher may still claim that a given model outperforms some other baseline model by a substantial amount. Put another way, the argument researchers make in this context is that their chosen model is substantially less false than a baseline model, typically the independence model. A model that is parsimonious, and yet performs well in comparison to other models may be of substantive interest.
  • Commonly reported fit statistics are the chi-square (shown above), its degrees of freedom (DF), its probability value (P), the Tucker-Lewis Index (TLI), and the Root Mean Square Error of Approximation (RMSEA) and its lower and upper confidence interval boundaries. There is also a Standardized Root Mean Residual (Standardized RMR) available through the Tools, Macro menu, but it is important to note that this fit index is only available for complete datasets (it will not be printed for databases containing incomplete data)
  • The chi-square test is an absolute test of model fit: If the probability value (P) is below .05, the model is rejected. The other measures of fit are descriptive. Hu and Bentler (1999) recommend RMSEA values below .06 and Tucker-Lewis Index values of .95 or higher. Since the RMSEA for this model is .11 and the Tucker-Lewis Index value is .92, the model does not fit well according to the descriptive measures of fit.
  • It is rare that a model fits well at first. Sometimes model modification is required to obtain a better-fitting model. AMOS allows for the use of modification indices to generate the expected reduction in the overall model fit chi-square for each possible path that can be added to the model. To request modification index output, select the Modification Indices check box in the Output tab in the Analysis Properties window. how to do this, check https://stat.utexas.edu/images/SSC/Site/AMOS_Tutorial.pdf
Absolute fit indexes--directly assess how well a priori model reproduces the sample data
  • To address the limitations of chi-squre test, goodness-of-fit indexes as adjuncts to the chi-squre statistic are used to assess model fit
  • Model with many variables and small samples may be more inclined to experience degradation in absolute fit indexes than models with many variables and large sample size.
  • RMR(root mean square residual), the smaller the RMR, the better the model. An RMR of zero indicates a perfect fit. The closer the RMR to 0 for a model being tested, the better the model fit. RMR smaller than 0.05 indicates good fit.
  • SRMR (standardized RMR, root mean square residual)-- SRMR < = .05 means good fit, The smaller the SRMR, the better the model fit. SRMR = 0 indicates perfect fit. A value less than .08 is considered good fit. SRMR tends to be lower simply due to larger sample size or more parameters in the model. To get SRMR in AMOS, select Analyze, Calculate Estimates as usual. Then Select Plugins, Standardized RMR: this brings up a blank Standardized RMR dialog. Then re-select Analyze, Calculate Estimates, and the Standardized RMR dialog will display SRMR.
  • GFI should by equal to or greater than .90 to indicate good fit. GFI is less than or equal to 1. A value of 1 indicates a perfect fit. GFI tends to be larger as sample size increases. GFI> 0.95 indicates good fit. GFI index is roughly analogous to the multiple R square in multiple regression in that it represents the overall amount of the covariation among the observed variables that can be accounted for by the hypothesized model.
  • AGFI (adjusted GFI), AGFI adjusts the GFI for degree of freedom, resulting in lower values for models with more parameters. AGFI should also be at least .90, close to 1 indicates good fit. AGFI may underestimate fit for small sample sizes. AGFI's use has been declining and it is no longer considered a preferred measure of goodness of fit. AGFI > 0.9 indicates good fit.
  • CI (centrality index)--CI should be .90 or higher to accept the model.
  • CAK
  • CK (single sample cross-validation index)
  • MCI (centrality index
  • CN
Incremental fix index(comparative fi index)-- measure the proportionate improvement in fit by comparing a target model with a more restricted, nested baseline model. A null model in which all the observed variabels are uncorrelated is the most typically used baseline model
Baseline Comparisons-- comparing the given model with an alternative model
  • CFI (comparative fix index), close to 1 indicates a very good fit, > 0.9 or close to 0.95 indicates good fit, by convention, CFI should be equal to or greater than .90 to accept the model. CFI is independent of sample size. CFI is more appropriate than NFI in finite samples. NFI behaves erratically across ML and GLS, wheresas CFI behaved consistenly across the two estimation methods. CFI is recommended for routine use. Gerbing and Anderson (1993) recommended RNI and CFI, DELTA2 (IFI). When the sample size is small, both the CFI and TLI decrease as we increase the number of vairables in the models.
  • RNI, RNI is recommended for routine use. RNI is generally preferred over TLI. RNI> 0.95 indicates good fit.
  • BBI (Bentler-Bonett index), should be greater than .9 to consider fit good.
  • IFI (incremental fit index,also known as DELTA2), IFI should be equal to or greater than .90 to accept the model. IFI value close to 1 indicates good fit. IFI can be greater than 1.0 under certain circumstances. IFI is not recommended for routine use.
  • NFI (normed fit index, also known as the Bentler-Bonett normed fit index,DELTA1), 1 = perfect fit. NFI values above .95 are good, between .90 and .95 acceptable, and below .90 indicates a need to respecify the model. NFI greater than or equal to 0.9 indicates acceptable model fit. NFI less than 0.9 can usually be improved substantially. Some authors have used the more liberal cutoff of .80. NFI may underestimate fit for small samples. NFI does not reflect parsimony: the more parameters in the model, the larger the NFI coefficient, which is why NNFI (TLI) below is now preferred (NNFI incorporates a correction for model complexity, whereas the NFI does not). NFI depends on sample size, values of the NFI will be higher for larger sample sizes. NFI behaves erratically across estimation methods under conditions of small sample size. NFI is not a good indicator for evaluating model fit when the sample size is small.
    NFI suggested relatively poorer model fit as missing data increased, with the bias generally more pronounced when data were MAR than when they were MCAR. Whereas NFI is still widely used, it is typically not among the recommended indices in recent reviews. Marsh et al., (1988) recommended against using NFI and in favor of TLI, because NFI, not TLI, is sensitive to sample size. When the sample size is small, both the CFI and TLI decrease as we increase the number of variables in the model.
  • NNFI(non-normed fit index,also called the Bentler-Bonett non-normed fit index, the Tucker-Lewis index, TLI,RHO2), NNFI is similar to NFI, but penalizes for model complexity. NNFI is not guaranteed to vary from 0 to 1. It is one of the fit indexes less affected by sample size. NNFI close to 1 indicates a good fit. TLI greater than or equal to 0.9 indicates acceptable model fit. By convention, NNFI values below .90 indicate a need to respecify the model. TLI less than 0.9 can usually be improved substantially. Some authors have used the more liberal cutoff of .80 since TLI tends to run lower than GFI. However, more recently, Hu and Bentler (1999) have suggested NNFI >= .95 as the cutoff for a good model fit. TLI is not associated with sample size. NNFI is recommended for routine use. NNFI is a more useful index than NFI. Hu and Bentler (1998,1999) support the continued use of TLI because TLI is relatively insensitive to sample size; TLI is sensitive to model missipecifications; is relatively insensitive to violations of assumptions of multivariate normality; is relatively insensitive to estimation method (maximum likelihood vs alternaitve methods). RNI is generally preferred over TLI.
  • NTLI, NTLI is recommended for routine use.
  • RFI (relative fit index, RHO1) is not guaranteed to vary from 0 to 1. RFI close to 1 indicates a good fit. Neither the NFI nor the RFI are recommended for routine use.
Parsimony-Adjusted Measures-- measures penalize for lack of parsimony.
  • PRATIO (parsimony ratio)
  • RMSEA (root mean square error of approximation),there is good model fit if RMSEA less than or equal to .05. There is adequate fit if RMSEA is less than or equal to .08. More recently, Hu and Bentler (1999) have suggested RMSEA <= .06 as the cutoff for a good model fit. RMSEA is a popular measure of fit. Less than .05 indicates good fit, =0.0 indicates exact fit, from .08 to .10 indicates mediocre fit, greater than .10 indicates poor fit. RMSEA is judged by a value of .05 or less as an indication of a good fit. A value of .08 or less is indicative of a “reasonable” error of approximation such that a model should not be used if it has an RMSEA greater than .1. Hu and Bentler (1995) suggested values below .06 indicate good fit. The RMSEA values are classified into four categories: close fit (.00–.05), fair fit (.05–.08), mediocre fit (.08–.10), and poor fit (over .10). RMSEA smaller than 0.05 indicates good fit. RMSEA tends to improve as we add variables to the model, expecially with larger sample size. One limitation of RMSEA is that it ignores the complexity of the model. The lack of fit of the hypothesized model to the population is known as the error of approximation. The RMSEA is a standardized measure of error of approximation. RMSEA value of .05 or less indicates a close approximation, values of up to .08 suggests a reasonable fit of the model in the population.
  • PCLOSE tests the null hypothesis that RMSEA is no greater than .05. If PCLOSE is less than .05, we reject the null hypothesis and conclude that the computed RMSEA is greater than .05, indicating lack of a close fit.
  • PGFI (parsimony goodness of fit index)
  • PNFI (parsimony normed fit index),There is no commonly agreed-upon cutoff value for an acceptable model.
  • PCFI (parsimony comparative fit index),There is no commonly agreed-upon cutoff value for an acceptable model.
Absolute fit indexes--directly assess how well a priori model reproduces the sample data
Information criteriosn index, goodness of fit measures based on information theory (do not have cutoffs like .90 or .95. Rather they are used in comparing models, with the lower value representing the better fit.)
  • CAK
  • CK
  • MCI (McDonald's centrality index)
  • CN(Hoelter's ctritical N)
  • AIC (Akaike Information Criterion, single sample cross-validation index), the lower the AIC measure, the better the fit.
  • AIC0, AMOS Specification Search tool by default rescales AIC so when comparing models, the lowest AIC coefficient is 0. For the remaining models, AIC0 <= 2, no credible evidence the model should be ruled out; 2 - 4, weak evidence the model should be ruled out; 4 - 7, definite evidence; 7 - 10 strong evidence; > 10, very strong evidence the model should be ruled out.
  • CAIC (Consistent AIC),the lower the CAIC measure, the better the fit.
  • BCC (Browne-Cudeck criterion, also called the Cudeck & Browne single sample cross-validation index) It should be close to .9 to consider fit good. BCC penalizes for model complexity (lack of parsimony) more than AIC.
  • ECVI (Expected cross-validation index, single sample cross-validation index), in its usual variant is equivalent to BCC, and is useful for comparing non-nested models, lower ECVI is better fit. EVIC can be used to compare non-nested models and allows the determination of which model will cross-validate best in anohter sample of the same size and simliarly selected. Choose the model that has the lowest ECVI.
  • MECVI,a variant on BCC,except for a scale factor, MECVI is identical to BCC
  • BIC (Bayesian Information Criterion, also known as Akaike's Bayesian Information Criterion (ABIC) and the Schwarz Bayesian Criterion (SBC).compared to AIC, BCC, or CAIC, BIC more strongly favors parsimonious models with fewer parameters. BIC is recommended when sample size is large or the number of parameters in the model is small. Recently, however, the limitations of BIC have been highlighted.
  • BIC0,the AMOS Specification Search tool by default rescales BIC so when comparing models, the lowest BIC coefficient is 0. For the remaining models, the Raftery (1995) interpretation is: BIC0 <= 2, weak evidence the model should be ruled out; 2 - 4, positive evidence the movel should be ruled out; 6 - 10, strong evidence; > 10, very strong evidence the model should be ruled out.
  • BICp. BIC can be rescaled so Akaike weights/Bayes factors sum to 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICp values represent estimated posterior probabilities if the models have equal prior probabilities. Thus if BICp = .60 for a model, it is the correct model with a probability of 60%. The sum of BICp values for all models will sum to 100%, meaning 100% probability the correct model is one of them, a trivial result but one which points out the underlying assumption that proper specification of the model is one of the default models in the set. Put another way, "correct model" in this context means "most correct of the alternatives."
  • BICL. BIC can be rescaled so Akaike weights/Bayes factors have a maximum of 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICL values of .05 or greater in magnitude may be considered the most probable models in "Occam's window," a model-filtering criterion advanced by Madigan and Raftery (1994).
  • Quantile or Q-Plots
  • IES (Interaction effect size),IES is a measure of the magnitude of an interaction effect (the effect of adding an interaction term to the model). In OLS regression this would be the incremental change in R-squared from adding the interaction term to the equation. In SEM, IES is an analogous criterion based on chi-square goodness of fit. Recall that the smaller the chi-square, the better the model fit. IES is the percent chi-square is reduced (toward better fit) by adding the interaction variable to the model.
residual as a measure of overall fit
  • residual is the difference between the sample matrix (S) and population matrix ( ∑ ). Standardized residuals are residuals that have been standardized to have a mean of zero and a standard deviation of one, making them easier to interpret. Standardized residuals larger than absolute value 2.0 are considered to be suggestive of a lack of fit.

21 comments:

clive said...

Thanks for a great site. As a newish (and lazy) user of SEM I've been looking for ages for a clear and concise overview of model fit indices.

Imam Saleh said...

can I cite your site on my paper?
;-)

Amandeep said...

Hi, I am impressed by the range and depth of your writing. I came across your blog searching for AMOS help. A doctoral student, I am testing a model with attitudes and behaviours and in need of guidance. If its possible please share your email address so that I can write in greater detail. Thanks.

Anonymous said...

I was reading the AMOS help to perform the SRMR, but it was impossible. You gave me the solution! Thank you!

pete said...

Your discussion on SRMR is briliant! even many authors of Amos textbooks dont know that!

Pooja said...

O its awesome I have been out of place til the time I read this.

Pooja
(IITD)

quest said...

If i got a model with RMSEA= 0.1 but not possible to get this less, i tried to refit the model but can only do that with error correlation, what would you suggest

being said...

to quest -- check other indexes
to other people -- you have the freedom to cite, but I am not responsible for any possible mistakes on this blog. This blog is just my study note.

Anonymous said...

Thank you for the information. I have been searching for how to get the SRMR on AMOS with no luck until I read your post.

Masood Kalyar said...

You did really an excellent and awsome work. I was looking for a comprehensive article on model fitness and after the struggle of several days i finally explored this site and read the article contains invaluable infomation at a single page. great.....!!! Keep it on buddy

malka said...

Hi thnks its really very informative....bt I hve an issue cmin of my data is .096 so he to resolve ds I used maximum likelihood estimation. Plz do reply n can u quote any article refering if cmin can b till 1 to 5...

Andreas said...

Great site. I needed one text written by Marsh for my thesis.

Anonymous said...

my chi sq value is 20, my sample size contain 120 company with eight factors each is chi sq value is ok or not

hubshank said...

Awesome. I had not noticed that That 1999 paper by Hu and Bentler suggests an RMSEA p-value 0.06 as cutoff point for good fit! My research fall between 0.05 and 0.06. This makes arguing a bit easier ;)

Marek Skorsepa said...

Is it possible to create a list of references you used in this article? Thank you.

raman said...


Hello Sir,

I am trying to implement SEM in my project work. The value of CFI is 0.487 and RMR is =0.046. the value of CFI is very less what does it indicate the model is rejcted? What should i do. Kindly guide

Anonymous said...

Are all fit indices considered standardized? Or only the two where noted above?

Jeanette said...

This page is, by far, the most helpful, understandable, and comprehensive online resource for CFA that I have found (and I have spent several hours searching)! Thank you so much for helping an all-but-dissertation (ABD) doctoral psychology student finish the dissertation!! Rock on!

ultraanizado said...

Absolutely fantastic Webseite, danke. I just have one question. My RMSEA is .055 but the PCLOSE is .181 so since the PCLOSE is not bigger than 0.5 that would mean that despite having the RMSEA in that "fair fitting" result, is not valid?

Josh M. said...

Hooper et al 2008 SEM Fit, and Ho 2006 Structural Equation Modeling are two great resources for understanding model fit indices.

Quest- check your other indices. If the model looks like it doesn't fit the data, tinkering away at it might not be the best option. You might have to revamp your model.

Chi square value (CMIN) alone is not enough to determine fit. Hell, the Chi Square test isnt even enough. Look at the Relative Chi Square statistic (X^2/df) and see if its in the range of 2-5 (from what I remember). The X^2 test gets screwey with sample size, so too large or too small of N will skew the result of the test drastically. I think the magic number is somewhere around N = 200.

Hubshank- Your RMSEA value looks good, but check your other indicies to determine fit. There are cuttoff for most (if not all) of the standardized fit indexes. The cuttoffs are more like guidelines, but they're something to cite.

Ultraanizando- Your RMSEA value looks good, but check its confidence intervals. You want the upper bound 90% CI to be less than .08 and the lower bound to be very close to 0. None of these cuttoffs are set in stone or the absolute rule which leads me to an important point that most people here seem to miss:

These indicies should not be used in isolation. Just becuase you have one index that signifies bad fit (even horrible fit) does not mean you have a bad model. You have to triangulate between the indicies knowing each other limitations. If you have N = 100000 you probably dont want to put too much stock in the outcome of the x^2 significance test. The key, once again, is to look at many of the indicies knowing each of their limitations and see if the ones that should be looking good are looking good. There are more issues than just model fit. You dont want to mess with something to deviate heavily from theory just to fit the model better. Most of that stuff is in the articles I cited.

Josh M. said...

Also check your assumptions of sem: random sampling, multivariate normal distribution, linear correlation between each endogenous and each exogenous variable being included in the model, and independence of obvervations. If your variables are not linearly related or normally distributed then problems can arise and SEM may not work well.