Monday, April 30, 2007

penalty of model complexity

  • Penalty of model complexity
    For a given set of data and variables, the goodness of fit of a more complex, highly parameterized model tends to be greater than for simpler models because of the loss of degrees of freedom of the complex model. Thus, a good model fit indicated by fit measures may result from 1) a correctly specified model that adequately represents the sample data or 2) a highly overparamerized model that accounts for the fit of the mdoel in the sample, regardless of whether there is a match between the specified model and the population covariance matrix (Hu&Benterl,1995).

Sunday, April 29, 2007

missing data, one paragraph in dissertation

  • Missing data
    Assuming that data values are missing at random (MAR), the probability that data are missing on X1 may depend on the value of X2, but does not depend on the value of X1, holding X2 constant, the missing data mechanism is ignorable. With ignorable MAR data, there is no need to model the missing data mechanism as part of the estimation process (Allison, 2002). In stead of using ad hoc based approaches to handle missing data, including listwise deletion, pairwise deletion, and single imputation that have no theoretical justification, this research uses full information maximum likelihood (FIML, direct ML) estimation implemented in Amos 6.0 , a theory-based approach based on the direct maximization of the likelihood of the observed data, because FIML yield the low rate of convergence failures and provides estimates that are efficient, consistent, and asymptotically unbiased (Allison, 2003; Arbuckle, 1996; Arbuckle,2005; Byrne, 2001;Enders, 2001ab; Enders& Bandalos, 2001; Newman, 2003; Raykov, 2005; Savalei& Bentler,2005; Schafer&Graham,2002; Wiggins&Sacker,2002).

ordered categorical data

  • ML has been recommended for use with ordered categorical data when item-level characteristics are approximately normal (Skewness/Kurtosis ranging from -1 to +1)(Muthen&Kaplan,1985); if not, then we shoule use weighted Least Squares (WLS; with polychoric correlation input), not Maximum Likelihood (ML; with Pearson Product–Moment input).
  • Categorization increases the kurtosis of the variables.
  • that reliability increases as a function of the number of response categories,
    untilvthe number of categories reaches five or seven, at which point reliability increases
    level off.
  • When categorical data show small skewness and kurtosis values (in the range from -1.5 to +1.5), normal theory can be used (Randall et al., 2004).
  • if both variables are continuous--Pearson correlation
  • if both variables are ordinal--polychoric correlation
  • if both variables are dichotomous--tetrachoric correlation
  • if one variable is ordinal and the other is continuous--a polyserial correlation
  • use polychoric/polyserial correlations matrix as input, use weighted least squares (WLS) estimation method
  • first use PRELIS to analyze a matrix of polychoric correlations, and from this analysis, produce an estimate of the asymptotic (large sample) covariance matrix of the estimated sample variances and covariances. The estimated covariance matrix from PRELIS was analyzed in LISREL, using generally-weighted-least-squares method of estimation
  • When multivariare normality is not met, the appropriate estimation technique is weighted least square (WLS), not ML. Calculaitons of WLS are based on polychoric correlation matrix, rather than covariance matrix

degree of freedom in SEM

  • The degrees of freedom for a LISREL problem is the difference between the number of elements in the covariance matrix and the number of parameters to be estimated.
  • Problems based on a large number of variables will tend to have large chi-square values and many degrees of freedom, so taking a ratio of the two provides a more meaningful summary.
  • the smaller the chi-square value, the better the fit
  • adding more paths--better fit (lower chi squre value)
  • For a given problem (i.e., a given covariance matrix) when more parameters are estimated, the chi-square will tend to be smaller (better fit), and there will be fewer degrees of freedom.
  • because the degrees of freedom for a LISREL problem do not reflect the sample size, the chi-square/degree of freedom ratio is as dependent on sample size as the chi-square statistic itself.

EFA vs CFA

  • EFA is typically conducted with correlation matrices, which makes comparing the parameters across samples problematic, whereas CFA is performed on covariance
    matrices.
  • Factor rotation is irrelevant in CFA because the latter based on a priori model specify simple structure already.
  • Simple structure in CFA--no measured variable is allowed to function as an indicator for more than one factor
  • CFA provides a chi-square test and goodness-of-fit indicators of the ability of the same factor solution to fit data from different samples; no such test is available for EFA.
  • CFA allows the researcher to formulate a specific model and test the invariance of specific parameters in the factor solution, whereas the researcher has relatively little control over the model to be tested in EFA.
  • unlike the common factor analytic model which requires that all measured variables
    load on all latent factors, the present factor loadings are restricted so that each measured
    variable loads only on the latent factor that it is hypothesized to represent; ps, in CFA,
    it is not necessary to require that each variable load on only one factor in order for the model to be identified.

Reading list

  • James Stevens (2002) Exploratory and confirmatory factor analysism, in Applied Multivariate Statistics for the social sciences. 4th eds. Lawrence.


Friday, April 27, 2007

ML robustness to violation of multivariate normality

  • maximum likelihood (ML) is one of the most frequently used methods for model evaluation. The ML method is based on the assumption that data are continuous
    and normally distributed, an assumption which is frequently violated
    especially when categorical data are analyzed. Furthermore, ML cannot provide
    a reliable inference when the number of variables in an analysis becomes
    excessively large.
  • robustness of normal theory estimators to violations of normality, in the presence of nonnormality, parameter estimates are typically unbiased, but values of the chi-square test statistic and other fit indexes are adversely affected, and standard errors become attenuated. Under coarse categorization, chi-square values are typically found to be inflated when only two response categories are used, but this bias decreases with increasing numbers of categories. This bias is exacerbated for situations in which the distributions of the categorized variables are nonnormal, with opposite skew producing the worst results.
  • Although the measurement parameter, structural disturbance, and coefficient estimates, including λs (factor loadings for exogenous λx ,and endogenous variables λy), θs (measurement errors for exogenous variables θε ,and endogenous variables θδ), φs (covariances among latent exogenous variables ξ ξ), ψs (covariances among structural disturbances ζ ζ), γs (causal path, structural parameters relating a latent exogenous to a latent endogenous variable), βs (causal path, structural parameters relating a latent endogenous variable to another latent endogenous variable), produced by maximum likelihood (ML) estimation approach are robust to variables with departures from normality (Bollen, 1989, Chou, Bentler& Satorra,1991), the chi-square and standard errors for significance test statistics from ML may not be robust to departures from normality (Bollen, 1989).

Unidimensional measurement

  • unidimensional measurement, or homogeneous measurement, or congeneric measurement
  • correlated error is not allowed in unidimensional measurement

online research discussion group

Asymptotic (large-sample) distribution-free (ADF) estimation method

  • ADF requires very very large sample size

Tuesday, April 24, 2007

common method variance

  • Common method variance, variance that is attributed to the measurement method rather then the constructs of interest, may cause systematic measurement error and further bias the estimates of the true relationship among theoretical constructs. Method variance can either inflate or deflate observed relationships between constructs, thus leading to both Type I and Type II errors.
  • The variance of every measured variable can be partitioned into three components--trait variance, method variance(systematic error), error variance (random error of measurement, nonsystematic influences on measured variables)
  • Total variance= true variance + systematic variance (common method variance) + random variance (measurement error)
  • method variance is referred to as systematic bias
  • In general, independent variables should be less influenced by common method bias

Harman single-factor test

  • whether or not in your data you have common method bias as in your EFA most of the variance is captured by the first factor.
  • factoring all indicators in the study to see if a single common factor emerges, indicative of common method variance

a split sample design

  • half your sample responds to the indepedent variable measures and the other to the dependent variable measures. The trick here is now that the groups becomes the unit of analysis (and thus your sample size is drastically reduced)

A latent common method variable into the SEM model

  • If you want to incorporate a "method" factor which is linkedto indicators via factor loadings, you must assume that this factor isuncorrelated with the substantive factors.
  • check for underidentification, both of the model and of individual parameters. this is, more often than not, the cause of such problems,that something that *must* be freed up for identification is fixed, or that something that *cannot* be freed is left free.
  • Billiet & McClendon (2000) suggest a procedure to identify a method effectif you have positive and negative items, and they should not be recoded intothe same direction. The method effect would be a single factor with loadingsof 1 and the variance should be free. The method factor should load on all items in your model that are observed with the same method. Billiet J. and M. J. McClendon (2000). Modeling Acquiescence in MeasurementModels for Two Balanced Sets of Items. Structural Equation Modeling, 7(4),608–628.
  • structural equation modeling with first-order latent variable “method effect” added to test the significance of the method effect. The latent variable “method effect” is loaded on indicators of both exogenous and endogenous latent variables to control for the portion of variance in the indicators that is attributed to obtaining the measures from the same source, i.e., systematic variance, thus any shared variance based on the method effect would be controlled when assessing the significance of the structural paths. Total variance of each indicator = true variance (due to theoretical constructs) + systematic measurement error (common method variance) + random measurement error (unique variance) (Bagozzi&Phillips,1982; Doty&Glick, 1998). Assuming common method has a consistent effect across the items, i.e., the response to all items is equally affected by the common method, all the loadings from the common method latent variable on each indicator is fixed at 1. The common method latent variable is uncorrelated with the substantive three other latent variables (e.g., Billiet & McClendon, 2000). To obtain an overall test of the significance of the method effects, M4 represents the model without method effects through fixing to zero the 11 factor-loading paths. (OR To set up the scale for the latent common method variance variable, the variance of the CMV in M4 is fixed at 1). M5 represents a model with common method effect. The impact of the method-effect latent variance was allowed to be systematic for each of the indicators of the substantive constructs. The variance of CMV in model 5 is a free parameter.
  • Nested SEM model comparisons via sequential chi-square differences tests are used to test the presence of common method variance. In model 4, the paths from the method effect to different indicators of substantive constructs are free parameters, resulting in a model that takes common method variance into account. In model 3, these paths are fixed to zero, resulting in a model without method effect. Model 3 and model 4 are compared through a chi-square difference test to test the null hypothesis that no method effect exists. If common method effect exists, model 4 should fit the data significantly bettern than model 3. We found a significant difference between model 3 and model 4, thus, common method variance in present in the study.
  • In addition, to assess the impact of common method biases on structural parameter estimates, a further comparison of the standardized parameter estimates when common method variance is and is not controlled for is made (Bettencourt, Gwinner,& Meuter, 2001; Carlson & Kacmar, 2000; Carlson & Perrewe, 1999; Conger, Kanungo, & Menon, 2000; Elangovan & Xie, 1999; Facteau, Dobbins, Russell, Ladd, & Kudisch, 1995; MacKenzie, Podsakoff,& Fetter, 1991,1993; MacKenzie, Podsakoff, & Paine, 1999; Moorman & Blakely, 1995; Netemeyer, Boles, McKee,&McMurrian, 1997; Podsakoff, MacKenzie, Lee, & Podsakoff, 2003; Podsakoff, MacKenzie, Moorman, & Fetter, 1990; Podsakoff&Organ, 1986; Williams&Anderson,1994).
  • When analyzing M4, Heywood case was encountered, e8 has a negative variance= -.227, the most common type of improper solutions. Nonpositive error variance estimates are frequently encountered (cf. Dillon, Kumar,&Mulani,1987 ). To test whether the negative variance was a result of random sampling error, rather than model misspecification, the maximum likelihood estimated asymptotic standard error of the negative error variance was used to form 95 % confidence intervals. If the 95 % confidence interval includes zero, indicating that the negative variance is not significantly different from zero, we can conclude that the negative variance is due to sampling error. The 95% confidence interval ranges from -1.078 to 0.624 and includes zero. Moreover, the magnitude of the estimated standard error of e8 variance is roughly the same as the estimated standard errors of other error variance, and the model provides a reasonable fit, which further suggesting that the Haywood case would be attributed to sampling variations. A negative error variance estimate can occur due to sampling fluctuations around a positive and near zero error variance in the population. To deal with the negative variance, the e8 variance parameter was constrained at an arbitrarily small positive number 0.005 in model 5, which is consistent with the commonsense belief that virtually all empirical data have some random error, and re-estimate the model 5. ( Bentler & Chou, 1987; Chen, Bollen, Paxton, Curran,& Kirby, 2001; Dillon, Kumar,&Mulani,1987; Gerbing & Anderson, 1987; Van Driel, 1978). Chi-square test statistics and some overall goodness-of-fit indices were largely unaffected by improper solutions (Chen et al., 2001; Gerbing&Anderson, 1987).
    If the confidence interval includes zero, indicates that the population variance is positive but near zero and that the negative estimate is due to chance.
  • Ottosson, T.(1997). Motivation for orienteering: An exploratory analysis using confirmatory factor analytic techniques. Scandinavian Journal of Psychology, 38(2), 111-120. --- a general factor on which all 20 items had loadingswhich were constrained to be equal. The interesting thing was that introduction of this general factor caused the general motivation-for-orienteering factor to vanish. This allowed the conclusion that the general factor found in the 14-item model should not be interpreted as reflecting a general attitude towards orienteering, but thatit rather is an effect of item format

MTMM

  • Mount, M. K., & Scullen, S. E. (2001). Multisource feedback ratings: What do they really measure? In M. London (Ed.), How people evaluate others in organizations (pp. 155-176). Mahwah, NJ: Lawrence Erlbaum.
  • Campbell and O'Connell have shown that method effects can be highly complicated. (Campbell, D.T., and O'Connell, E.J. (1982).Methods as diluting trait relationships rather than adding irrelevantsystematic variance. In D. Brinberg and L. Kidder (Eds.), New directionsfor methodology of social and behavioral science: Forms of validity inresearch (pp 93-111). San Francisco: Jossey-Bass.)
  • Bollen, K., Medrano, J. D. (1998). Who are the Spaniards? Nationalism and identification in Spain. Social Forces, 77(2), 587-622

CTCU--correlated-trait-correlated-uniqueness model

  • The standard CTCU model explicitly incorporatesthe traits, but not the method factors. However, the EFFECTS of the methodf actors are included in the model through the correlated uniquenesses of the relevant indicators.
  • In a standard CTCU model the traits are explicitly modeled.
  • The expression of "correlated-uniqueness" gives theimpression that the method effect is not defined as common factors in the CTCU.
  • Since only the effects of the method factors are modeled (i.e., as error covariances) their structure is NOT explicitly stated. The structure of the method factors is not incorporated in the model. Method structure is not explicitly modeled There is NO assumption that the method factor is the same for all measures employing the SAME method. Rather, the CTCU model assumes that each observed indicator has a unique method effect, and that the degree of covariation between measures usingt he same method suggests the extent to which a common method factor is plausible.
  • The formal modeling of the latents tructure of method factors in an ordinary MTMM analysis may be too restrictive in some instances, and result in Heywood cases, poor fit, or non-convergence. I believe that this is why Marsh developed the CTCU modelin the first place.
  • If you believe there are three method factors, and that these methodf actors are correlated, the CTCU model doesn't help you test that proposition. The model won't enable you to assess whether one factor underpins one of the set of method items or whether there are x method factors for that set of items. It also won't enable you to assess whether the 3 method factors are correlated (although the modification index may suggest this).
  • CUCU model may well be useful when researchers (for some reason) are happy to accept the limitations it places on the interpretation of the structure of either the traits or the method factors. So, by moving toward CU-type models we areprobably giving something up since less is being posited.

Reading list

  • Lance, Charles E.;Noble, Carrie L.; Scullen, Steven E.; Psychological Methods, Vol 7(2),Jun 2002. pp. 228-244. A critique of the correlated trait-correlated method and correlated uniqueness models for multitrait-multi method data.

Disconformability of the (SEM) model

  • to be objective, models must be disconfirmable. Just-identified models can't be disconfirmed by tests of lack of fit because they will always perfectly reproduce the data. This doesn't mean that there is no place for just-identified models in hypothesis testing, because they can be used as alternative models against whihc more constrainted models are compared in tests of fit. Once a just-identified model is specified, a more constrainted, disconfirmable model may be constructed from it by introducing overidentified constraints into the model. Introducing overidentified constraints results in fewer parameters to estimate than data points available from which to estimate them. Thus, estimates of parameters can be inconsistent if estimable by different aspects of the data, and this will indicate inappropriateness of the constraints. The residual data, representing the difference between the data and the reproduced data based on the model, contains the evidence for disconfirming the model. Mulaik,S.A. and James, L.R.(1995) Objectivity and reasoning in science and structural equation modeling. In R.H. Hoyle (Ed.) (1995) Structural Equation Modeling: concepts, issues, and applications. pp, 118-137

Measurement error

  • Biemer, P.P., Groves, R.M., Lyberg, L.E., Mathiowetz, N.A. & Sudman,S. (Eds.). (1991).
    Measurement Errors in Surveys. New York: John Wiley&Sons, Inc.

Monday, April 23, 2007

Level of measurement

  • Some statistical techniques are "robust" and that treating ordinal variables as though they were interval variables doesn't affect results. In practice, this relaxed approach is frequently adpoted with scales and other ordinal variables with a large number of values (de Vaus, 2002)

Validity

  • There is no conclusive way of establish validity
  • validity is traditionally tested using a numer of methods including criterion, construct, content, convergent, and discriminant methods.

robust statistics

  • robust statistics is a body of knowledge, partly formalized into "theory of robustness", relating to deviations from idealized assumptions in statistics. Robust statistics is aimed at yielding reliable results in case where classical assumptions like normality, independence or linearity are violated.

Exploratory factor analysis

  • EFA in practice is usually performed on a correlation matrix, mainly because measurements in social and behavioral sciences are of arbitrary scales.
    One advantage of using the correlation matrix is that many rule of thumb criteria for judging the significance of factor loadings or the number of factors exist.
  • For EFA with several interesting rotations, there is no need to worry about the arbitrariness of the measurement scales if inference is based on z-scores. In practice, a rule of thumb such as factor loading > .3 for simple structure may be easy to use with a sample correlation matrix.
  • Routinely used classical procedures for determining the number of factors and the
    significance of factor loadings are very fragile in the presence of outliers.

Outlier

  • If the variable is approximately normally distributed, then z scores around 3 in absolute value should be considered as potential outliers.
  • obtain the z score for each variable in SPSS--data--split file--analyze all cases, do not create groups--descriptives--save standardized values as variables
  • Outliers should not necessarily be regarded as bad. It has been argued that outliers can provide some of the most interesting cases for further study.
  • outliers create a nonnormal sample, but not all nonnormal samples contain outliers
  • With nonnormal data, the sample covariance matrix is no longer the most efficient estimate of the population covariance matrix. If the nonnormality is created by outliers, analysis based on the sample covariance matrix can be misleading to a greater or lesser degree, depending on the influence of the outliers.
  • There are two ways to deal with outliers. One is to identify the influential cases through
    some analytical procedure and make a subjective decision whether to keep them or remove them. Another way is to use a robust approach. Whether they are outliers or just influential cases, their effect will be automatically downweighted through this approach.
    The influence of outliers in a robust method is generally much smaller.
  • When there are extreme values (outliers), we should use median as the measure of central tendency, not mean, because the median is unaffected by outliers. So, median is robust measure of central tendency.

Sunday, April 22, 2007

multilevel modeling

Latent class analysis

forecast

data source

Latent growth curve model

  • Duncan,T.E., Duncan,S.C., and Strycker, L.A. (2006) An introdution to latent vairiable growth curve modeling: concepts, issues, and application. 2nd. Lawrence Erlbaum.

Maximum likelihood method for missing data

  • ML is a model-based estimation procedure.
  • direct maximum likelihood (FIML) and the expectation maximization (EM) algorithm, can be used to obtainMLparameter estimates for structural equation models with missing data.
  • approaches to ML estimation of these parameters when some data are missing: factoring the likelihood, the EM algorithm, and direct ML.
  • EM Algorithm-- many SEM analysts have used the means and covariance matrix produced by the EM algorithm as input to SEM software. However, this two-step approach is less than ideal, for two reasons. When the SEM to be estimated is just-identified (i.e., the model implies no restrictions on the covariance matrix), then the resulting parameter estimates are true ML estimates. But in the more usual case when
    the SEM is overidentified, the resulting estimates are not true ML estimates and are generally less efficient (although the loss of efficiency is likely to be small). Moreover,
    the standard errors reported by SEM software using this two-step method will not be consistent estimates of the true standard errors.
  • Direct ML-- “raw” ML (because it requires raw data as input) or “full information” ML, direct ML solves the problems that arise in the two-step EM method. Arbuckle (1996)
    proposed the use of direct ML for general missing data patterns and implemented the method in the Amos program. Since then, other SEM programs have also introduced direct ML for missing data, including LISREL, M-PLUS, and MX. In Amos, the default is to use direct ML whenever the data set has missing data. Direct ML appears to be the best method for handling missing data for most SEM applications

Missing data

  • a distinction between missing values on independent variables (predictors) and missing values on dependent variables (outcomes). These two do not fundamentally differ. It is true that, under certain assumptions, missing values on a dependent variable may be efficiently handled by a very simple method such as case deletion, whereas good missing-data procedures for independent variables can be more difficult to implement.
    we caution our readers not to believe general statements such as, “Missing values on a dependent variable can be safely ignored,” because such statements are imprecise and generally false.

Ad Hoc approach

  • Ad hoc approach--list-wise deletion, pairwise deletion, single imputation,
  • Older methods--list-wise deletion (complete-case analysis), pairwise deletion (available-case analysis), weighting, averaging the items, single imputaton (eg, imputed with mean).
  • Single imputation methods--- traditional ideas that can be viewed as precursors of MI. These are methods by which each missing value in the data set is replaced by a plausible value and then analyses are carried out using the usual statistical techniques assuming that one has information on all variables for all individuals. For example, in mean substitution, all the missing values for a variable are replaced by the mean value of that variable. In regression substitution, all the missing values of a data set are replaced by the predicted value of that variable from a regression analysis based only on the complete
    cases. stochastic regression imputation, hot-deck imputation
  • ML and MI techniques generally outperform ad hoc approaches in producing accurate parameter estimates.

  • Likelihood-Based Estimation Procedures: two types---
    1) EM-type algorithms (use sufficient statistics to estimate a complete data matrix, which can then serve as input for a latent variable model); 2) FIML (direct parameter estimation). FIML uses direct ML estimation approach.
  • Because the EM algorithm only produces correlation and mean parameters that must subsequently serve as input for the structural equation model, this technique is considered an indirect ML procedure, in contrast with the FIML approache, which can estimate latent variable models directly from raw data.
  • ML (maximum likelihood) and MI (multipe imputation) are now becoming standard because of implementations in free and commercial software. ML and MI under the MAR assumption represent the practical state of the art for missing data.
  • mean substitution—replacing each missing value for a variable with the average of the observed values—may accurately predict missing data but distort estimated variances and correlations.
  • The Distribution of Missingness-- In modern missing-data procedures missingness is
    regarded as a probabilistic phenomenon. We treat R as a set of random variables having
    a joint probability distribution. We may not have to specify a particular distribution for R, but we must agree that it has a distribution. In statistical literature, the distribution of R is sometimes called the response mechanism or missingness mechanism. R as
    the distribution of missingness or the probabilities of missingness.
  • Longitudinal modeling by ML can be a highly efficient way to use the available data.
  • The advantage of MI over maximum likelihood estimation (and Bayesian estimation) is that it is computationally much simpler for most practical situations. The maximum likelihood estimation method is problem specific and may require totally different
    computational procedures to integrate out the missing data for different models applied to the same data set. By contrast, in MI the same imputed data sets may be used for different types of analyses by different users using any popular statistical software, without any need for these users to worry about addressing the missing data problem.
  • The principal advantage of MI is that it can be used in almost any situation, whereas ML is much more restricted in its applications. For example, if you do not have access to a program that does direct ML, you can still use MI to handle the missing data, or if you want to estimate your SEM using some method other than ML, such as two-stage least squares (Bollen, 1995), then MI would be a good choice.
  • Weakness of multiple imputation-- First, because random variation is deliberately introduced into the imputation process, MI does not yield a determinate result: Every time you use it, you get different answers. Second, unlike ML, there are many different ways to implement MI for a particular application. Deciding among the various approaches can be a daunting task, especially for the novice user.
  • How to differentiate MCAR, MAR,MNAR
    Joseph L. Schafer and John W. Graham (2002) Missing Data: Our View of the State of the Art. Psychological Methods. Vol. 7, No. 2, 147–177

post hoc approach vs theoretical approach to missing data

  • Missing data are usually dealt with by listwise or pairwise deletion methods which aim to fix up the data so they can be analyzed by methods designed for complete data. This kind of approach is ad-hoc and has little theoretical justification.
  • Theory-based approach to the treatment of missing data under the assumption of multivariate normality, based on the direct maximization of the likelihood of the observed data, has long been known. The theoretical advantages of this full-information method are widely recognized, and its applicability in principle to structural equation modeling has been noted.

Report paramets estiamtes in SEM

  • All parameter estimates, including error variances and variances of latent variables, should be reported.
  • Either stanard errors of estimates (critical ratios, i.e., estimate/standard error) for estimates , or notation that indicates p-value associated with the estimates (*p< .05, **p<.01) should be presented. The latter is least preferred because it discloses the least about the parameter estimates. When unstandardized estiamtes are presented, then either standard errors or critical ratios should be presented.
  • The critical value of the test statistic (usually + - 1.96) should be noted explicitly prior to presentation and interpretation of estimates.
  • Figures and tables should incidcate clearly whether unstandardized or standardized estimtes are presented.

Remedies for multivariate nonnormality in SEM

Nonnormality influence chi squre goodness-of-fit statistics, parameter estimates, and standard errors
Estimation-based redeies
  • ADF--doesn't assume multivariate normality of the measured variable, ADF estimator produces asymtotically (large sample) unbiased estimates of the chi squre goodness-of-fit test, parameter estimates, and standard errors. With more than 20 to 25 measured variables, implemetation of ADF becomes impractical, even give modern high speed computers. ADF requires a large sample size. ADF can be trusted only at the largest (>5000) sample size. Hoyle and Panter (1995) recommend against ADF in favor of distribution-based adjustments to results of ML estimation.
  • Satorra-Bentler scaled chi square statistic--scaled chi square statistic corrct or rescale the ML chi square statistic. However, it has a tendency to overreject true models at smaller sample size.
  • Robust standard errors-- robust standard erros adjust standard error for the degree of multivariate kurtosis.
  • Bootstrapping-- the bootstrap distribution will follow a noncentral chi squre distribution, rather than the usual central ch squre distribution specified by statistical theory
Reexpression of variables
  • item parcels--Item parcels usually exhibit distributions that more closely approach a normal distribution than the original items. Item parcels produce composite variabels that more closely approximate normal distributions. Fewer parameters will need to be estimated in the measurement model. However, parcels may obscure the fact that more than on factor may underlie any given item parcel. The use of too few parcels as indicators of construct yields less stringent test of the proposed structure of confirmatory factor model. Identification problems are also more likely to occur if too few item parcels are used per factor,i.e, fewer than 3.
  • Transformation of variables-- power function, logarithimic, squre root. Transformed data should be examined for univariate and multivariate normality to compare with the original data to see the improvement of normality.

Saturday, April 21, 2007

distributional assumption of different estimation methods in SEM

  • Maximum likelidhood (ML), generalized least squares (GLS) assume that the measured variabels are continuous and have a multivariate normal distribution. When the data are multivariate normally distributed and when the sample size is large enough, the ML and GLS methods are preferred. However, their test statistics may not have the chi squre distribution if the distributiional assumption, multivariate normality, is false. When data become increasingly nonnormal, ML and GLS estimators produce chi squre values that become too large. Scaled statistic is to modify the standard test statistics to make them more approximately chi squre distributed.
  • Nonnormality leads to modest underestimation of fit indexes such as NFI, TLI, and CFI.
  • Nonnormality leads to moderate to severe underestimation of standard errors of parameter estimates.
  • When sample size is large, the SEM model is correct, and the distributional assumptions are satisfied, standard error estimates are good estimated derived from the parameter estimates and the fitting function. However, ML and GLS standard errors may be substantially off the mark. Thus, robust standard errors need to be derived.
  • Scaled test statisitic and robust standard error yield satisfatory results, regardless of the distribution of the variables, i.e., when data are not multivariate normal. EQS provides both of them.

report data in SEM

  • Correlation matrix with standard deviations of the observed variables, rounding to three decimal place (Hoyle&Panter,1995) -- the most generous report
  • covariance matrix + means of the observed variables. Even if your SEM doesn't concern means, the means of all observed variables should be reported in written sumary for other researchers to re-run the SEM
  • to get covariance matrix in Amos, view -- text output--sample moment (sample covariance matrix
  • correlation matrix + standard deviation of all observed variables + (means)
  • It is highly recommended to provide means for covariance matrix or correlation matrix+ standard deviation
  • When data are not interval-level data, the data need be preprocessed so that a tetrachoric matrix (for dichotomous data) or polychoric matrix (for ordered-categorical variables) is estimated and analyzed in SEM.
  • When additional assumptions are made about the measurement level of the raw data in the model, such assumptions should be explicitly reported.
  • Information about univariate and multivariate distribution should be presented and interpreted.

Thursday, April 19, 2007

Modification index

  • The modification index is a lower bound estimate of the expected chi square decrease that would result when a particular parameter is left unconstrained (making it become free parameter, or adding it as an extra path). Joreskog suggested that a modification index
    should be at least five before the researcher considers modifying the hypothesized model.
  • Joreskog suggested that a modification index should be at least five before the researcher
    considers modifying the hypothesized model.
  • MI values lower than 5 indicate little appraciable improvement in fit.
  • Correlated errors of measurement are among the most problematic types of post hoc modifications becasue they are rarely theoretically justified and are unlikely to replicate. The need for correlated measurement error is an indication that the factor model has been unable to account for all the covariation among the variables. This may occur if, for example, more factors are needed or if method variabe is present.
  • http://www2.chass.ncsu.edu/garson/pa765/structur.htm#extract

LISREL

  • modification index (ML)-- the estimate of the decrease in chi-square value that would result if a given parameter were to be added to the model, MI are available for all parameters that were constrained to be zero in the original model. MI are accompanied by expected parameter change (EPC)
  • EPC--the value a given parameter would have if it were added to the model

EQS

  • Lagrange multiplier (LM)=MI in LISREL
  • Wald test--values of the Wald test represent the amount by which the overall chi-square value will increase if a parameter were to be dropped from the model

Assess individual parameters in SEM

  • The test statistic is critial ratio (c.r.) , the parameter estimate divided by its standard error, c.r. operates as a z-statistic, the null hypothesis is that the estimate equals zero
  • Researchers hope reject the null hypothesis. Base on a level of .05, c.r. needs to be larger than + - 1.96 in order to reject null hypothesis. Rejcecting null hypothesis (i.e., the parameter estimate=0) means that the parameter estimate is different from zero. The parameter estiamte is important in the SEM model.

Assess whole SEM model--chi square and fit index

  • Global model fit tests produced by AMOS, these test statistics are still computed under the assumption of joint multivariate normality. In other words, these values will remain unchanged whether you use bootstrapping or not.
  • a chi square probability value greater than .05 indicates acceptable model fit
  • Penalty of model complexity--For a given set of data and variables, the goodness of fit of a more complex, highly parameterized model tends to be greater than for simpler models because of the loss of degrees of freedom of the complex model. Thus, a good model fit indicated by fit measures may result from 1) a correctly specified model that adequately represents the sample data or 2) a highly overparamerized model that accounts for the fit of the mdoel in the sample, regardless of whether there is a match between the specified model and the population covariance matrix.
  • chi square test functions as a statistical method for evaluating models, the fit indexes is more descriptive thatn statistical. Fit indexes describe and evaluate hte residuals that result from fitting a model to the data.
  • Hoyle and Panter (1995) recommend some indexes of overall model fit, unadjusted chi-squre, Satorra-Benter scaled chi squre, GFI, TLI (NNFI), IFI, CFI, RNI

Chi square test

  • The null hypothesis is -- the postulated model holds in the population, i.e., the implied (sample)covariance matrix = population covriance matrix. The researcher hopes NOT to reject the null hypothesis, in contrast to traditional statistical procedures. In contrast to traditional significance testing, the researcher usually prefers a nonsignificant chi-square (such a finding indicates that the predicted model is congruent with the observed data.). In practice, only the central chi square distribution is used to test the null hypothesis.
  • Chi-squre and p-value-- the higher the probability level (p value) associated with chi square, the better the fit. Amos reports the value of chi-squre as CMIN. A significant chi-square indicates lack of satisfactory model fit. For example, based on a level of .o5, if the hypothized SEM model output shows p=.000, then suggesting the hypothesized model should be rejected, i.e., the hypothesized model is not adequte. If model chi-square < .05, the researcher's model is rejected. The smaller the chi-square, the better the model fit. If probability level of the analysis output is 0.05 or less, the departure of the data from the model is significantly at the .05 level. The chi square test offerrs only a dichotomous decision strategy implied by a statistical decision rule and can't be used to quantify the degree of fit along a continuum with some prespecified boundary.
  • suppose the propose model has chi squre 12.1; checking the statistic table, suppose that with the appropriate degrees of freedom the chi square required to reject the null hypothesis at the 0.01 level is 11.34; 12.1 is larger than 11.34--means reject the null hypothesis (H0: the implied correlations and the observed correlations are from the same population and that any differences are due to sampling error). --- thus the mdoel does not fit the data --------if chi square value exceeds the appropriate figure in the statistical tables then the model fails to fit the data; If the proposed model's chi square value is 2.9, this means that the proposed model can't be rejected, it doesn't mean that the proposed model is right, rather the proposed mdoel has not been shown that it is wrong
  • The analysis result of fit indexes are the same for unstandardized estimates and standardized estimates.
  • Reports of chi squre should be accompanied by degrees of freedom, sample size, and p-value. Example, χ2 (48, N=500)= 303.80, p < .001, TLI=.86, CFI=.90; or χ2 (15, N=2232)=10.91,p=.77 (some people recommend this because this provides more accurate information about the p value)
  • The χ2 associated with the model # is significant, χ2 (df, N=2232)=#, p=0.000, which suggests that the model is not consistent with the observed data.
    Nonsignificant— χ2 (15, N=2232)=10.91,p=.77, suggesting that the proposed model is consistent with the observed data
  • If p-value is smaller than .05, we reject the proposed model
    If p-value is higher than .05, we accept the proposed model
  • The χ2 associated with the model is significant, χ2 (df, N=2232)=###, p=0.000, which suggests that the model is not consistent with the observed data.
  • Nonsignificant— χ2 (15, N=2232)=10.91,p=.77, suggesting that the proposed model is consistent with the observed data.
  • chi-square statistic is used more as a descriptive index of fit, rather than as a statistical test. Smaller χ2 value indicates better fitting models and an insignificant χ2 is desirable.
  • Chi square is highly sensitive to departures from multivariate normality.
  • χ2 is sensitive to sample size. With large sample size, the chi-square values will be inflated (statistically significant), thus might erroneously implying a poor data-to-model fit (Schumacker & Lomax, 2004).
  • With small sample sizes, there may not be enough power to detect the differences betweeen serveral competing mdoels using the chi squre statistic for model selection or evaluation. At larger sample sizes, power is so high that even models with only trival misspecifications are likely to be rejected. As sample size increases, even very minor misspecifications can lead to poor model fit. Conversely, with small samples, models will tend to be accepted even in the face of considerable misspecification. In large, complex problems (i.e., problems in which there are many variables and degrees of freedom), the observed chisquare will nearly always be statistically significant, even when there is a reasonably good fit to the data. Chi-square test is strongly influenced by sample size. A poor fit based on a small sample size may result in a nonsignificant chi-square, whereas a good fit based on a large sample size will result in a significant chi-square. Thus, most applications of confirmatory factor analysis require a subjective evaluation of whether or not a statistically significant chi-square is small enough to constitute an adequate fit.
  • Relative chi-square, also called normal chi-square, is the chi-square fit index divided by degrees of freedom, in an attempt to make it less dependent on sample size.AMOS lists relative chi-square as CMIN/DF (chi squre/degree of freedom ratio). Wheaton (1987) advocated CMIN/DF not be used. In the range of 2 to 1 or 3 to 1 indicate acceptable fit between the hypothetical model and the sample data (Carmnines&McIver,1981). Different researchers have recommended using ratio as low as 2 or as high as 5 to indicate a reasonable fit (Marsh&Hocevar,1985). A chi-squre/df ratio larger than 2 indicates an inadequte fit (Byrne,1989). chi-square/df ratio values lower than 2 are widely considered to represent a minimally plausible model (Byrne,1991, The Maslach Burnout Inventory: validating factorial structure and invariance across intermediates, secondary, and university educators. Multivariate Behavioral Research, 26 (4), 583-605)
  • the smaller the Chi-square, the better the fit of the model. It has been suggested that a Chi-square two or three times as large as the degrees of freedom is acceptable (Carmines
    & McIver, 1981), but the fit is considered better the closer the Chi-square value
    is to the degrees of freedom for a model (Thacker, Fields & Tetrick, 1989). In
    the present sample, it was suggested that a ratio of 5 to 1 was “a useful rule
    of thumb” (Jackson et al., 1993, p. 755). -- cf Timothy R. Hinkin (1995)
    A Review of Scale Development Practices in the Study of Organizations.
    Journal of Management, Vol. 21, No. 5.967-988
  • However, Chi-square test may be misleading. 1) The more complex the model, the more likely a good fit (i.e., the closer the researcher's model is to being just-identified, the more likely good fit will be found). 2) The larger the sample size, the more likely the rejection of the model and the more likely a Type II error (rejecting something true). In very large samples, even tiny differences between the observed model and the perfect-fit model may be found significant. 3) The chi-square fit index is also very sensitive to violations of the assumption of multivariate normality. When this assumption is known to be violated, the researcher may prefer Satorra-Bentler scaled chi-square, which adjusts model chi-square for non-normality.

Absolute fit indexes--directly assess how well a priori model reproduces the sample data

  • To address the limitations of chi-squre test, goodness-of-fit indexes as adjuncts to the chi-squre statistic are used to assess model fit
  • Model with many variables and small samples may be more inclined to experience degradation in absolute fit indexes than models with many variables and large sample size.
  • RMR(root mean square residual), the smaller the RMR, the better the model. An RMR of zero indicates a perfect fit. The closer the RMR to 0 for a model being tested, the better the model fit. RMR smaller than 0.05 indicates good fit.
  • SRMR (standardized RMR, root mean square residual)-- SRMR < = .05 means good fit, The smaller the SRMR, the better the model fit. SRMR = 0 indicates perfect fit. A value less than .08 is considered good fit. SRMR tends to be lower simply due to larger sample size or more parameters in the model. To get SRMR in AMOS, select Analyze, Calculate Estimates as usual. Then Select Plugins, Standardized RMR: this brings up a blank Standardized RMR dialog. Then re-select Analyze, Calculate Estimates, and the Standardized RMR dialog will display SRMR.
  • GFI should by equal to or greater than .90 to indicate good fit. GFI is less than or equal to 1. A value of 1 indicates a perfect fit. GFI tends to be larger as sample size increases. GFI> 0.95 indicates good fit. GFI index is roughly analogous to the multiple R square in multiple regression in that it represents the overall amount of the covariation among the observed variables that can be accounted for by the hypothesized model.
  • AGFI (adjusted GFI), AGFI adjusts the GFI for degree of freedom, resulting in lower values for models with more parameters. AGFI should also be at least .90, close to 1 indicates good fit. AGFI may underestimate fit for small sample sizes. AGFI's use has been declining and it is no longer considered a preferred measure of goodness of fit. AGFI > 0.9 indicates good fit.
  • CI (centrality index)--CI should be .90 or higher to accept the model.
  • CAK
  • CK (single sample cross-validation index)
  • MCI (centrality index
  • CN

Incremental fix index(comparative fi index)-- measure the proportionate improvement in fit by comparing a target model with a more restricted, nested baseline model. A null model in which all the observed variabels are uncorrelated is the most typically used baseline model

Baseline Comparisons-- comparing the given model with an alternative model

  • CFI (comparative fix index), close to 1 indicates a very good fit, > 0.9 or close to 0.95 indicates good fit, by convention, CFI should be equal to or greater than .90 to accept the model. CFI is independent of sample size. CFI is more appropriate than NFI in finite samples. NFI behaves erratically across ML and GLS, wheresas CFI behaved consistenly across the two estimation methods. CFI is recommended for routine use. Gerbing and Anderson (1993) recommended RNI and CFI, DELTA2 (IFI). When the sample size is small, both the CFI and TLI decrease as we increase the number of vairables in the models.
  • RNI, RNI is recommended for routine use. RNI is generally preferred over TLI. RNI> 0.95 indicates good fit.
  • BBI (Bentler-Bonett index), should be greater than .9 to consider fit good.
  • IFI (incremental fit index,also known as DELTA2), IFI should be equal to or greater than .90 to accept the model. IFI value close to 1 indicates good fit. IFI can be greater than 1.0 under certain circumstances. IFI is not recommended for routine use.
  • NFI (normed fit index, also known as the Bentler-Bonett normed fit index,DELTA1), 1 = perfect fit. NFI values above .95 are good, between .90 and .95 acceptable, and below .90 indicates a need to respecify the model. NFI greater than or equal to 0.9 indicates acceptable model fit. NFI less than 0.9 can usually be improved substantially. Some authors have used the more liberal cutoff of .80. NFI may underestimate fit for small samples. NFI does not reflect parsimony: the more parameters in the model, the larger the NFI coefficient, which is why NNFI (TLI) below is now preferred (NNFI incorporates a correction for model complexity, whereas the NFI does not). NFI depends on sample size, values of the NFI will be higher for larger sample sizes. NFI behaves erratically across estimation methods under conditions of small sample size. NFI is not a good indicator for evaluating model fit when the sample size is small.
    NFI suggested relatively poorer model fit as missing data increased, with the bias generally more pronounced when data were MAR than when they were MCAR. Whereas NFI is still widely used, it is typically not among the recommended indices in recent reviews. Marsh et al., (1988) recommended against using NFI and in favor of TLI, because NFI, not TLI, is sensitive to sample size. When the sample size is small, both the CFI and TLI decrease as we increase the number of variables in the model.
  • NNFI(non-normed fit index,also called the Bentler-Bonett non-normed fit index, the Tucker-Lewis index, TLI,RHO2), NNFI is similar to NFI, but penalizes for model complexity. NNFI is not guaranteed to vary from 0 to 1. It is one of the fit indexes less affected by sample size. NNFI close to 1 indicates a good fit. TLI greater than or equal to 0.9 indicates acceptable model fit. By convention, NNFI values below .90 indicate a need to respecify the model. TLI less than 0.9 can usually be improved substantially. Some authors have used the more liberal cutoff of .80 since TLI tends to run lower than GFI. However, more recently, Hu and Bentler (1999) have suggested NNFI >= .95 as the cutoff for a good model fit. TLI is not associated with sample size. NNFI is recommended for routine use. NNFI is a more useful index than NFI. Hu and Bentler (1998,1999) support the continued use of TLI because TLI is relatively insensitive to sample size; TLI is sensitive to model missipecifications; is relatively insensitive to violations of assumptions of multivariate normality; is relatively insensitive to estimation method (maximum likelihood vs alternaitve methods). RNI is generally preferred over TLI.
  • NTLI, NTLI is recommended for routine use.
  • RFI (relative fit index, RHO1) is not guaranteed to vary from 0 to 1. RFI close to 1 indicates a good fit. Neither the NFI nor the RFI are recommended for routine use.

Parsimony-Adjusted Measures-- measures penalize for lack of parsimony.

  • PRATIO (parsimony ratio)
  • RMSEA (root mean square error of approximation),there is good model fit if RMSEA less than or equal to .05. There is adequate fit if RMSEA is less than or equal to .08. More recently, Hu and Bentler (1999) have suggested RMSEA <= .06 as the cutoff for a good model fit. RMSEA is a popular measure of fit. Less than .05 indicates good fit, =0.0 indicates exact fit, from .08 to .10 indicates mediocre fit, greater than .10 indicates poor fit. RMSEA is judged by a value of .05 or less as an indication of a good fit. A value of .08 or less is indicative of a “reasonable” error of approximation such that a model should not be used if it has an RMSEA greater than .1. Hu and Bentler (1995) suggested values below .06 indicate good fit. The RMSEA values are classified into four categories: close fit (.00–.05), fair fit (.05–.08), mediocre fit (.08–.10), and poor fit (over .10). RMSEA smaller than 0.05 indicates good fit. RMSEA tends to improve as we add variables to the model, expecially with larger sample size. One limitation of RMSEA is that it ignores the complexity of the model. The lack of fit of the hypothesized model to the population is known as the error of approximation. The RMSEA is a standardized measure of error of approximation. RMSEA value of .05 or less indicates a close approximation, values of up to .08 suggests a reasonable fit of the model in the population.
  • PCLOSE tests the null hypothesis that RMSEA is no greater than .05. If PCLOSE is less than .05, we reject the null hypothesis and conclude that the computed RMSEA is greater than .05, indicating lack of a close fit.
  • PGFI (parsimony goodness of fit index)
  • PNFI (parsimony normed fit index),There is no commonly agreed-upon cutoff value for an acceptable model.
  • PCFI (parsimony comparative fit index),There is no commonly agreed-upon cutoff value for an acceptable model.

Absolute fit indexes--directly assess how well a priori model reproduces the sample data

Information criteriosn index, goodness of fit measures based on information theory (do not have cutoffs like .90 or .95. Rather they are used in comparing models, with the lower value representing the better fit.)

  • CAK
  • CK
  • MCI (McDonald's centrality index)
  • CN(Hoelter's ctritical N)
  • AIC (Akaike Information Criterion, single sample cross-validation index), the lower the AIC measure, the better the fit.
  • AIC0, AMOS Specification Search tool by default rescales AIC so when comparing models, the lowest AIC coefficient is 0. For the remaining models, AIC0 <= 2, no credible evidence the model should be ruled out; 2 - 4, weak evidence the model should be ruled out; 4 - 7, definite evidence; 7 - 10 strong evidence; > 10, very strong evidence the model should be ruled out.
  • CAIC (Consistent AIC),the lower the CAIC measure, the better the fit.
  • BCC (Browne-Cudeck criterion, also called the Cudeck & Browne single sample cross-validation index) It should be close to .9 to consider fit good. BCC penalizes for model complexity (lack of parsimony) more than AIC.
  • ECVI (Expected cross-validation index, single sample cross-validation index), in its usual variant is equivalent to BCC, and is useful for comparing non-nested models, lower ECVI is better fit. EVIC can be used to compare non-nested models and allows the determination of which model will cross-validate best in anohter sample of the same size and simliarly selected. Choose the model that has the lowest ECVI.
  • MECVI,a variant on BCC,except for a scale factor, MECVI is identical to BCC
  • BIC (Bayesian Information Criterion, also known as Akaike's Bayesian Information Criterion (ABIC) and the Schwarz Bayesian Criterion (SBC).compared to AIC, BCC, or CAIC, BIC more strongly favors parsimonious models with fewer parameters. BIC is recommended when sample size is large or the number of parameters in the model is small. Recently, however, the limitations of BIC have been highlighted.
  • BIC0,the AMOS Specification Search tool by default rescales BIC so when comparing models, the lowest BIC coefficient is 0. For the remaining models, the Raftery (1995) interpretation is: BIC0 <= 2, weak evidence the model should be ruled out; 2 - 4, positive evidence the movel should be ruled out; 6 - 10, strong evidence; > 10, very strong evidence the model should be ruled out.
  • BICp. BIC can be rescaled so Akaike weights/Bayes factors sum to 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICp values represent estimated posterior probabilities if the models have equal prior probabilities. Thus if BICp = .60 for a model, it is the correct model with a probability of 60%. The sum of BICp values for all models will sum to 100%, meaning 100% probability the correct model is one of them, a trivial result but one which points out the underlying assumption that proper specification of the model is one of the default models in the set. Put another way, "correct model" in this context means "most correct of the alternatives."
  • BICL. BIC can be rescaled so Akaike weights/Bayes factors have a maximum of 1.0. In AMOS Specification Search, this is done in a checkbox under Options, Current Results tab. BICL values of .05 or greater in magnitude may be considered the most probable models in "Occam's window," a model-filtering criterion advanced by Madigan and Raftery (1994).
  • Quantile or Q-Plots
  • IES (Interaction effect size),IES is a measure of the magnitude of an interaction effect (the effect of adding an interaction term to the model). In OLS regression this would be the incremental change in R-squared from adding the interaction term to the equation. In SEM, IES is an analogous criterion based on chi-square goodness of fit. Recall that the smaller the chi-square, the better the model fit. IES is the percent chi-square is reduced (toward better fit) by adding the interaction variable to the model.

residual as a measure of overall fit

  • residual is the difference between the sample matrix (S) and population matrix ( ∑ ). Standardized residuals are residuals that have been standardized to have a mean of zero and a standard deviation of one, making them easier to interpret. Standardized residuals larger than absolute value 2.0 are considered to be suggestive of a lack of fit.

Correlated error terms

  • Correlated error terms in measurement models represent the hypothesis that the unique variances of the associated indicators overlap; that is, they measure something in common other than the latent constructs that are represented in the model. Another way to describe an error correlation is as an unanalyzed association, which means that the specific nature of the shared "something" is unknown. Some possibilities include a common methodof measurement (e.g., two indicators are self-report scales that are both susceptible to some response set) or measurement of a construct that is no tdirectly represented in the model (i.e., the two indicators are each multidimensional in that they assess more than one factor). I suppose even another possibility for items concerns very similar wording, as per yourexamples.Given a theoretical or practical reason (e.g., two indicators share the measurement method), the inclusion of correlated measurement error terms may test an interesting hypothesis. One must be careful, though, because adding error correlations increases the complexity of the model, which itself mayimprove the model's fit. Of course, adding bunches of errorc orrelations until the model is saturated will result in perfect fit, so error correlations should not be specified on solely on ad hoc bases. Also,the addition of error correlations to some models may result in a non-identification, which is likely to lead to estimation problems.S ome refernces are listed below. Note that some of these concern the evaluation of multitrait-multimethod data for which the specification of correlated error terms is one way to estimate common method effects.
  • correlated within-factor measurement errors might imply a number of things. They could imply the presence of another common factor, or directcausal relations among indicators (e.g., a response bias set where one itempartly "causes" the response to the next item in a survey). In any event,any model modifications based on finding statistically reliable measurement error covariances should be done carefully and considerately. If you have good a piori reasons to do it, then go ahead. The fairly recent debate between Les and Stan pertained to the errors of reciprocally causal variables. Stan and some other people were arguing that they "must" becorrelated, I think just due to the basis of some quirks in the math/pathtracing rules. Les was arguing that if the correlation's true population value is zero, then there is no reason why the sample estimate of the correlation would necessarily be nonzero, and demonstrated this in a simulation.
  • Freeing all of the correlations between the disturbances may end up with an underidentified model,because some of those 0 correlations are necessary for identification. It would be better to fix them all to zero, and then do a specificationsearch if you get a significant chi-square, to see if you can't free some of the correlations between disturbances having large modificationi ndices (or significant chi-squares for Lagrange multiplier tests, as inEQS). However, as you go about freeing up parameters, do it one at atime, reanalyzing the whole model after each time you free one of theparameters. You can leave free anything that improved the fitconsiderably, as you go along, but each new parameter freed should be freed one-at-a-time. You also should coordinate this process with theory, justifying doing so by some substantive rationale for thepair of variables in question. Don't go on and on, if you reach apoint where you can't provide good a priori substantive reasons forfreeing further parameters. If the resulting model is a good approximation, you can report that also as a basis for further theorizing and research in the area in question.What you are doing is rejecting hypotheses about the 0 correlationsbetween the disturbances. You need to report what was rejected.
  • Do you have highly similar items that are adjacent to eachother? This is fairly common in scale development. If you have nearly identical items that means that the errors (ie, what's left over after thecommon variance is taken out) will be correlated. A pair or trio of itemsmay cluster together more tightly than other items on the samelatent. Setting error covariance to zero (ie, the default) probably isn'ta good idea in this case.

SEM Fit index

construct reliability, varianece extracted, SMC in SEM

  • Construct reliability, by convention, should be at least .70 for the factor loadings. Let sli be the standardized loadings for the indicators for a particular latent variable. Let ei be the corresponding error terms, where error is 1 minus the reliability of the indicator, which is the square of the indicator's standardized loading.reliability = [(SUM(sli))2]/[(SUM(sli))2 + SUM(ei))].
  • Variance extracted, by convention, should be at least .50. Its formula is a variation on construct reliability: variance extracted = [(SUM(sli2)]/[(SUM(sli2) + SUM(ei))].
  • R-squared, the squared multiple correlation. There is one R-squared or squared multiple correlation (SMC) for each endogenous variable in the model. It is the percent variance explained in that variable. In Amos, enter $smc in the command area to obtain squared multiple correlations. In the AMOS Analysis Properties dialog box check squared multiple correlation if in the graphical mode.

specification search in Amos

suppression

Standardized coefficients greater than 1

Not Positive Definite Matrices

  • N.p.d. also happens when you have high collinearity. With many factors, you can have this problem even when individual correlations are not very big. With fewer factors, though, to get an n.p.d. message due to collinearity, you will need individual factor correlations near 1.0 in absoluet value. These are easier to spot in the standardized parameter estimates. If you have created one factor as a composite of other factors, then you will certainly get this message--but in that case, it isn't an "error" and shouldn't be a surprise.
  • http://www2.gsu.edu/~mkteer/npdmatri.html,

Wednesday, April 18, 2007

websites with reading list for SEM

formative indicators

  • formative variables are not commonlyrepresented by the underlying construct but causing it. Therefore,internal reliability is not examined in this regard but nomological andcontent aspects of construct validity (Diamantopoulos and Winklhofer,2001; MacKenzie, Podsakoff, & Jarvis, 2005).
  • Bollen Kenneth and Lennox Richard, “Conventional wisdom on measurement: A structural equation perspective”, Psychological Bulletin, Vol. 110, No. 2, 1991, pp. 305 – 314.
  • MacCallum Robert C. and Browne Michael W., “The use of causal indicators in covariance structure models: Some practical issues”, Psychological Bulletin, Vol. 114, No. 3, 1993, pp. 533 – 541.
  • Jarvis Cheryl Burke, Mackenzie Scott B. and Podsakoff Philip M., “A critical review of construct indicators and measurement model misspecification in marketing and consumer research”, Journal of Consumer Research, Inc., Vol. 30, September 2003, pp. 199 – 218.
  • Diamantopoulos Adamantios and Winklhofer Heidi M., “Index construction with formative indicators: An alternative to scale development”, Journal of Marketing Research, Vol. 38, pp. 269 – 277, May 2001.
  • Diamantopoulos, A., & Winklhofer, H. M. (2001). Index construction withformative indicators: An alternative to scale development. Journal ofMarketing Research, 38(2), 269-277.
  • Law, K. S., & Wong, C.-S. (1999). Multidimensional constructs instructural equation analysis: An illustration using the job perceptionand job satisfaction constructs. Journal of Management, 25(2), 143-160.
  • Law, K. S., Wong, C. S., & Mobley, W. H. (1998). Toward a taxonomy ofmultidimensional constructs. Academy of Management Review, 23(4),741-755.
  • MacKenzie, S. B., Podsakoff, P. M., & Jarvis, C. B. (2005). The problemof measurement model misspecification in behavioral and organizationalresearch and some recommended solutions. Journal of Applied Psychology,90(4), 710-730.
  • Yi, M. Y., & Davis, F. D. (2003). Developing and validating anobservational learning model of computer software training and skillacquisition. Information Systems Research, 14(2), 146-169.

Tuesday, April 17, 2007

Latent interaction, moderation in SEM

  • http://www2.kuas.edu.tw/prof/fred/vpls/moderating.htm
  • Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage.
  • Arnold, H.J. & Evans, M.G. (1979). Testing multiplicative models does not require ratio scales. Organizational Behavior and Human Performance, 24, 41-59.
  • Cortina, J.M. (1993). Interaction, Nonlinearity, and Multicollinearity - Implications for Multiple-Regression. Journal of Management, 19, 915-922.
  • Jaccard, J. & Wan, C. K.(1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple-regression - multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348-357.
  • Evans, M.G. (1985). A Monte-Carlo study of the effects of correlated method variance in moderated multiple regression analysis. Organizational Behavior and Human Decision Processes, 36, 305-323.
  • Evans, M.G. (1991). The problem of analyzing multiplicative composites: Interactions revisited. American Psychologist, 46, 6-15.
  • Evans, M.G. (1991). On the use of moderated regression. Canadian Psychology, 32, 116-119.
  • Jaccard, J. & Wan, C. K. (1996). LISREL Approaches to Interaction Effects in Multiple Regression. Quantitative Applications in the Social Sciences, Vol. 114, Thousand Oaks, Calif.: Sage Publications. ISBN: 0-8039-7179-6.
  • Jöreskog, K. G., & Yang, F. (1996). Nonlinear structural equation models: The Kenny-Judd model with interaction effects. In G. A. Marcoulides and R. E. Schumacker (Eds.), Advanced structural equation modeling (pp. 57-88). Mahwah, NJ: Lawrence Erlbaum.
  • Kanetkar, V., Evans, M.G., Everell, S.A., Irvine, D., & Millman, Z. (1995). The effect of scale changes on meta-analysis of multiplicative and main effects models. Educational & Psychological Measurement, 55, 206-224.
  • Kenny, D. A., & Judd, C. M. (1984). Estimating the nonlinear and interactive effects of latent variables. Psychological Bulletin, 96, 201-210.
  • Li, F.Z., Harmer, P., Duncan, T.E., Duncan, S.C., Acock, A., & Boles S. (1998). Approaches to testing interaction effects using structural equation modeling methodology. Multivariate Behavioral Research, 33, 1-39.
  • McClelland, G.H., & Judd, C.M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological bulletin , 114, 376-390.
  • Ping, R.A. (1995). A parsimonious estimating technique for interaction and quadratic latent variables Journal of Marketing Research, 32, 336-347.
  • Ping, R.A. (1996). Improving the detection of interactions in selling and sales management research. Journal of Personal Selling & Sales Management, 16(1), 53-64
  • Ping, R.A. (1996). Latent variable interaction and quadratic effect estimation: A two-step technique using structural equation analysis. Psychological Bulletin, 119, 166-175.
  • Ping, R.A. (1996). Estimating latent variable interactions and quadratics: The state of this art Journal of Management, 22, 163-183.
  • Ping, R.A. (1996). Latent variable regression: A technique for estimating interaction and quadratic coefficients Multivariate Behavioral Research, 31, 95-120.
  • Rosnow & Rosenthal (1995). Some things you learn aren't so: Cohen's paradox, Asch's paradigm, and the interpretation of the interaction. Psychological Science, 6, 3-9.

Tetrad

  • Tetrad, designed to find models among sets of variables. Such aprogram may help you to avoid endlessly adding and deletingpredictors. You'll find a link to the Tetrad site (free betaversions can be downloaded from the Web) (click on the "SEM links" or just scroll down it):http://www.gsu.edu/~mkteer/

Monday, April 16, 2007

Multicollinearity

  • check correlation matrix of the latent variables,
  • Multicolliinearity increase the standard errors of the coefficients of the collinear variables.
  • Only *extreme* multicollinearity will produce an input covariance matrix which is not positivede finite. The usual symptoms of multicollinearity are inflated standard errors (or reduced t-values) and strange parameter estimates, as when if you have two variables X1 and X2 predicting Y, and all correlations are positive, but the parameter estimate for X1 is very large and positive while the parameter estimate for X2 is very large and negative.
  • Multicollinearity in multiple regression
  • Kaplan, David (1994), Estimator Conditioning Diagnostics for Covariance Structure Models., Sociological Methods Research, 23 (November), 200-29.
  • Leahy, Kent (2000), "Multicollinearity: When the Solution is the Problem," in Data Mining Cookbook, Olivia Parr Rud, Ed. New York: Wiley.Multicollinearity in Regression Models is an unacceptably high level of intercorrelation among the independents, such that the effects of the independents cannot be separated. Under multicollinearity, estimates are unbiased but assessments of the relative strength of the explanatory variables and their joint effect are unreliable. (That is, beta weights and R-squares cannot be interpreted reliably even though predicted values are still the best estimate using the given independents). As a rule of thumb, intercorrelation among the independents above .80 signals a possible problem. Likewise, high multicollinearity is signalled when high R-squared and signficant F tests of the model occur in combination with non-significant t-tests of coefficients. That is, whereas perfect multicollinearity leads to infinite standard errors and indeterminant coefficients, the more common situation of high multicollinearity leads to large variances and covariances, large confidence intervals, and insignificant significance coefficients. Power is low (the chance of Type II errors is high - thinking you do not have a relationship when in fact one exists - failure to reject the null hypothesis that the coefficients are not different from zero). R-square is high. The coefficients and their standard errors will be sensitive to changes in just a few observations. Tolerance is defined as 1 - R-squared, where R-squared is the multiple R of a given independent regressed on all other independent variables. If the tolerance value is less than some cutoff value, usually .20, the independent should be dropped from the analysis due to multicollinearity. This is better than just using simple r > .80 since tolerance looks at the independent variable in relation to all other independents and thus takes interaction effects into account as well as simple correlations. In SPSS 13, select Analyze, Regression, linear; click Statistics; check Collinearity diagnostics. Variance inflation factor, VIF. Note, the variance-inflation factor, VIF, may be used in lieu of tolerance as VIF is simply the reciprocal of tolerance. The rule of thumb is that VIF > 4.0 when multicollinearity is a problem. Some authors use the more lenient cut-off of VIF >= 5 when multicollinearity is a problem. In SPSS 13, select Analyze, Regression, linear; click Statistics; check Collinearity diagnostics.
  • Condition indices. Discussed more extensively in the section on regression, condition indices over 15 indicate possible multicollinearity problems and over 30 indicate serious multicollinearity problems. In SPSS 13, select Analyze, Regression, linear; click Statistics; check Collinearity diagnostics.

Multicollinearity in Structural Equation Models (SEM):

  • Standardized regression weights: Since all the latent variables in a SEM model have been assigned a metric of 1, all the standardized regression weights should be within the range of plus or minus 1. When there is a multicollinearity problem, a weight close to 1 indicates the two variables are close to being identical. When these two nearly identical latent variables are then used as causes of a third latent variable, the SEM method will have difficulty computing separate regression weights for the two paths from the nearly-equal variables and the third variable. As a result it may well come up with one standardized regression weight greater than +1 and one weight less than -1 for these two paths.
  • Standard errors of the unstandardized regression weights: Likewise, when there are two nearly identical latent variables, and these two are used as causes of a third latent variable, the difficulty in computing separate regression weights may well be reflected in much larger standard errors for these paths than for other paths in the model, reflecting high multicollinearity of the two nearly identical variables. Covariances of the parameter estimates: Likewise, the same difficulty in computing separate regression weights may well be reflected in high covariances of the parameter estimates for these paths - estimates much higher than the covariances of parameter estimates for other paths in the model.
  • Variance estimates: Another effect of the same multicollinearity syndrome may be negative variance estimates. In the example above of two nearly-identical latent variables causing a third latent variable, the variance estimate of this third variable may be negative.

Useful links:

Reading list:

  • Latent variable multicollinearity: Grewal, Rajdeep, Joseph A. Cote, and Hans Baumgartner (2004), "Multicollinearity and measurement error in structural equation models: implications for theory testing," Marketing Science, 23(4), 519-529.
  • Kaplan, D. (1994). Estimator Conditioning Diagnostics for Covariance Structure Models. Sociological Methods & Research, Vol. 23, No. 2, 200-229
  • Marsh, Dowson, Pietsch, & Walker (2004) "Why multicollinearity matters: a reexamination of relations between self-efficacy, self-concept, and achievement" in Journal of Educational Psychology

multivariate normality and outliers

  • Multivariate normality is when each variable under consideration is normally distributed with respect to each other variable. Multivariate normal distributions take the form of symmetric three-dimensional bells when the x axis is the values of a given variable, the y axis is the count for each value of the x variable, and the z axis is the values of any other variable under consideration. Structural equation modeling and certain other procedures assume multivariate normality.
  • Mardia's statistic is a test for multivariate normality. Based on functions of skewness and kurtosis, Mardia's PK should be less than 3 to assume the assumption of multivariate normality is met. PRELIS (companion software for LISREL) outputs PK. SPSS does not yet support Mardia's PK.
  • Multivariate normal distribution of the indicators: Each indicator should be normally distributed for each value of each other indicator. Even small departures from multivariate normality can lead to large differences in the chi-square test, undermining its utility. In general, violation of this assumption inflates chi-square but under certain circumstances may deflate it. Use of ordinal or dichotomous measurement is a cause of violation of multivariate normality. Note: Multivariate normality is required by maximum likelihood estimation (MLE), which is the dominant method in SEM for estimating structure (path) coefficients. Specifically, MLE requires normally distributed endogenous variables.
    The Bollen-Stine bootstrap and Satorra-Bentler adjusted chi-square (this is an adjustment to chi-square which penalizes chi-square for the amount of kurtosis in the data. That is, it is an adjusted chi-square statistic which attempts to correct for the bias introduced when data are markedly non-normal in distribution. As of 2006, this statistic was only available in the EQS model-fitting program, not AMOS) are used for inference of exact structural fit when there is reason to think there is lack of multivariate normality or other distributional misspecification. Other non-MLE methods of estimation exist, some (like ADF) not requiring the assumption of multivariate normality. See also Bollen (1989).
    In general, simulation studies (Kline, 1998: 209) suggest that under conditions of severe non-normality of data, SEM parameter estimates (ex., path estimates) are still fairly accurate but corresponding significance coefficients are too high. Chi-square values, for instance, are inflated. Recall for the chi-square test of goodness of fit of the model as a whole, the chi-square value should not be significant if there is a good model fit: the higher the chi-square, the more the difference of the model-estimated and actual covariance matrices, hence the worse the model fit. Inflated chi-square could lead researchers to think their models were more in need of modification than they actually were. Lack of multivariate normality usually inflates the chi-square statistic such that the overall chi-square fit statistic for the model as a whole is biased toward Type I error (rejecting a model which should not be rejected). The same bias also occurs for other indexes of fit beside model chi-square. Violation of multivariate normality also tends to deflate (underestimate) standard errors moderately to severely. These smaller-than-they-should-be standard errors mean that regression paths and factor/error covariances are found to be statistically significant more often than they should be. Many if not most SEM studies in the literature fail to concern themselves with this assumption in spite of its importance.
  • Testing for normality and using transforms to normalize data . How do you test for normality and outliers in AMOS? http://www2.chass.ncsu.edu/garson/pa765/structur.htm#normtest For example, if the multivariate kurtosis value of 3.102 is Mardia's coefficient. Values of 1.96 or less mean there is non-significant kurtosis. Values > 1.96 mean there is significant kurtosis, which means significant non-normality. The higher Malanobis d-squared distance for a case, the more it is improbably far from the solution centroid under assumptions of normality.
  • Note, however, SEM is still unbiased and efficient in the absence of multivariate normality if residuals are multivariate normally distributed with means of 0 and have constant variance across the independents, and the residuals are not correlated with each other or with the independents. PRELIS, a statistical package which tests for multivariate normality, accompanies LISREL and provides a chi-square test of multivariate normality.
  • OUTLIERS can radically alter the outcome of analysis and are also violations of normality. Outliers arise from four different causes, requiring different courses of action:
    Errors of data entry: proofread your data for out-of-range entries and other errors.
    Not defining missing values: check in SPSS or other statpack to make sure don't know, not home, and other missing values are not being treated as real values.
    Unintended sampling: eliminate non-population members from the sample (ex., eliminate unintentionally sampled out-of-town house guests from a sample for the population of city residents).
    True non-normal distribution: For a true non-normal distribution with extreme values, the researcher may transform the data to pull in outlier values or may choose to analyse extreme cases separately.
    Simple outliers are cases with extreme values with respect to a single variable. It is common to define outliers as cases which are more than plus or minus three standard deviations from the mean of the variable.
    Multivariate outliers are cases with extreme values with respect to multiple variables. Multivariate outliers are operationally defined as cases which have a Cook's Distance greater than some cutoff (some use a cutoff of 1; some use 4/[n - p], where p is the number of parameters in the model; some use 4/[n - k - 1], where n is the number of cases and k is the number of independents.) Leverage is another related way of defining multivariate outliers, with outliers defined as having a leverage value greater than some cutoff (some use .5; others use 2p/n, where p is the number of parameters including the intercept). Mahalanobis distance is a third and very common measure for multivariate outliers. Cases with the highest Mahalanobis D-square values are the most likely candidates to be considered outliers and should be examined.
    In SPSS 13, select Analyze, Regression, Linear; click the Save button; check Cook's, Mahalanobis, and/or leverage values. http://www2.chass.ncsu.edu/garson/pa765/assumpt.htm

Normality and handling non-normality

http://www2.chass.ncsu.edu/garson/pa765/assumpt.htm

http://www.utexas.edu/its/rc/answers/amos/amos7.html

  • A normal distribution is assumed by many statistical procedures. Various transformations are used to correct non-normally distributed data. Correlation, least-squares regression, factor analysis, and related linerar techniques are relatively robust against non-extreme deviations from normality provided errors are not severely asymmetric. Severe asymmetry might arise due to strong outliers. Log-linear analysis, logistic regression, and related techniques using maximum likelihood estimation are even more robust against moderate departures from normality.Monte Carlo simulations show the t-test is robust against moderate violations of normality.
  • Skewness is the tilt (or lack of it) in a distribution. The more common type is right skew, where the smaller tail points to the right. Less common is left skew, where the smaller tail is points left. A common rule-of-thumb test for normality is to run descriptive statistics to get skewness and kurtosis, then divide these by the standard errors. Skew should be within the +2 to -2 range when the data are normally distributed. Some authors use +1 to -1 as a more stringent criterion when normality is critical. Negative skew is left-leaning, positive skew right-leaning. In SPSS 13, one of the places skew is reported is under Analyze, Descriptive Statistics, Descriptives; click Options; select skew.
  • Kurtosis is the peakedness of a distribution. A common rule-of-thumb test for normality is to run descriptive statistics to get skewness and kurtosis, then divide these by the standard errors. Kurtosis also should be within the +2 to -2 range when the data are normally distributed (a few authors use the more lenient +3 to -3, while other authors use +1 to -1 as a more stringent criterion when normality is critical). Negative kurtosis indicates too many cases in the tails of the distribution. Positive kurtosis indicates too few cases in the tails. Note that the origin in computing kurtosis is 3 and a few statistical packages center on 3, but the foregoing discussion assumes that 3 has been subtracted to center on 0, as is done in SPSS and LISREL. The version with the normal distribution centered at 0 is Fisher kurtosis, while the version centered at 3 is Pearson kurtosis. SPSS uses Fisher kurtosis. Various transformations are used to correct kurtosis: cube roots and sine transforms may correct negative kurtosis. In SPSS 13, one of the places kurtosis is reported is under Analyze, Descriptive Statistics, Descriptives; click Options; select kurtosis.
  • Shapiro-Wilks W test is a formal test of normality offered in the SPSS EXAMINE module or the SAS UNIVARIATE procedure. This is the standard test for normality. W may be thought of as the correlation between given data and their corresponding normal scores, with W = 1 when the given data are perfectly normal in distribution. When W is significantly smaller than 1, the assumption of normality is not met. That is, a significant W statistic causes the researcher to reject the assumption that the distribution is normal. Shapiro-Wilks W is recommended for small and medium samples up to n = 2000. For larger samples, the Kolmogorov-Smirnov test is recommended by SAS and others. In SPSS 13, Shapiro-Wilks test is found under Analyze, Descriptive Statistics, Explore; select Both or Plots in the Display group; click Plots and select at least one plot.
  • Kolmogorov-Smirnov D test or K-S Lilliefors test, is an alternative test of normality for large samples, available in SPSS EXAMINE and SAS UNIVARIATE. This is sometimes called the Lilliefors test as a correction to K-S developed by Lilliefors is now normally applied. SPSS (as of Version 9), for instance, automatically applies the Lilliefors correction to the K-S test for normality in the EXAMINE module (but not in the NONPAR module). This test, with the Lilliefors correction, is preferred to the chi-square goodness-of-fit test when data are interval or near-interval. When applied without the Lilliefors correction, K-S is very conservative: that is, there is an elevated likelihood of a finding of non-normality. Note the K-S test can test goodness-of-fit against any theoretical distribution, not just the normal distribution. Be aware that when sample size is large, even unimportant deviations from normality may be technically significant by this and other tests. For this reason it is recommended to use other bases of judgment, such as frequency distributions and stem-and-leaf plots. In SPSS 13, Shapiro-Wilks test is found under Analyze, Descriptive Statistics, Explore; select Both or Plots in the Display group; click Plots and select at least one plot. The Kolmogorov-Smirnov test in SPSS 13, is found under Analyze, Descriptive Statistics, Explore; select Both or Plots in the Display group; click Plots and select at least one plot.
  • Prof. DeCarlo has written a SPSS macro which compute several indices of univariate/multivariate normality. Macro called "normtest" for SPSS (author Lawrence T. DeCarlo) which compute several indices of univariate/multivariate normality. It can be downloaded from the author's web page URL: http://www.columbia.edu/~ld208/ "normtest" macro into Stata programming language. It can be freely downloaded from: http://ideas.uqam.ca/ideas/data/Softwares/bocbocodeS413101.html
  • Boxplot tests of the normality assumption: The SPSS boxplot output option (see the EXAMINE command in SPSS Base) produces charts in which the the Y axis is the interval dependent and categories of the independent are arrayed on the X axis. Inside the graph, for each X category, will be a rectangle indicating the spread of the dependent's values for that category. If these rectangles are roughly at the same Y elevation for all categories, this indicates little difference among groups. Within each rectangle is a horizontal dark line, indicating the mean. If most of the rectangle is on one side or the other of the mean line, this indicates the dependent is skewed (not normal) for that group (category). Further out than the rectangle are the "whiskers," which mark the smallest and largest observations which are not outliers (defined as observations greater than 1.5 inter-quartile ranges [IQR's = boxlengths] from the 1st and 3rd quartiles). Note you can display boxplots for two factors (two independents) together by selecting Clustered Boxplots from the Boxplot item on the SPSS Graphs menu
  • Graphical methods.
    A histogram of a variable shows rough normality, and a histogram of residuals, if normally distributed, is often taken as evidence of normality of all the variables.
    A graph of empirical by theoretical cumulative distribution functions (cdf's) simply shows the empirical distibution as, say, a dotted line, and the hypothetical distribution, say the normal curve, as a solid line.
    A P-P plot is found in SPSS 13 under Graphs, P-P plots. One may test if the distribution of a given variable is normal (or beta, chi-square, exponential, gamma, half-normal, Laplace, Logistic, Lognormal, Pareto, Student's t, Weibull, or uniform). he P=P plot plots a variable's cumulative proportions against the cumulative proportions of the test distribution.The straighter the line formed by the P-P plot, the more the variable's distribution conforms to the selected test distribution (ex., normal). Options within this SPSS procedure allow data transforms first (natural log, standardization of values, difference, and seasonally difference).
    A quantile-by-quantile or Q-Q plot forms a 45-degree line when the observed values are in conformity with the hypothetical distribution. Q-Q plots plot the quantiles of a variable's distribution against the quantiles of the test distribution.From the SPSS menu, select Graphs, Q-Q. The SPSS dialog box supports testing the following distributions: beta, chi-square, exponential, gamma, half-normal, Laplace, Logistic, Lognormal, normal, pareto, Student's t, Weibull, and uniform.
  • Resampling is a way of doing significance testing while avoiding parametric assumptions like multivariate normality. The assumption of multivariate normality is violated when dichtomous, dummy, and other discrete variables are used. In such situations, where significance testing is appropriate, researchers may use a resampling method (bootstrapping) uses brute computer power to estabish confidence intervals for any statistical procedure, based not on assumptions such as multivariate normal distributions but rather based on repeated samples from the researcher's own data. That is, rather than use generic distribution tables to compute approximate p probability values, resampling generates a unique sampling distribution based on the actual data at hand and uses experimental rather than analytic methods. Unlike approximation with generic distribution tables, resampling yields unbiased estimates because it is based on unbiased samples of all possible outcomes in the data being studied.
  • How to interprer Amos output when bootstrap method is used http://www2.chass.ncsu.edu/garson/pa765/structur.htm#normtest
  • Normalizing Transformations. Various transformations are used to correct skew. Transformations should make theoretical sense. Transforms in SPSS: Select Transform - Compute - Target Variable (input a new variable name) - Numeric Expression (input transform formula)
  • MULTIVARIATE NORMALITY
    Multivariate normality is when each variable under consideration is normally distributed with respect to each other variable. Multivariate normal distributions take the form of symmetric three-dimensional bells when the x axis is the values of a given variable, the y axis is the count for each value of the x variable, and the z axis is the values of any other variable under consideration. Structural equation modeling and certain other procedures assume multivariate normality.
    Mardia's statistic is a test for multivariate normality. Based on functions of skewness and kurtosis, Mardia's PK should be less than 3 to assume the assumption of multivariate normality is met. PRELIS (companion software for LISREL) outputs PK. SPSS does not yet support Mardia's PK.