for dealing with incomplete data is to eliminate from the analysis any observation for which some data value is missing. For example, if a person fails to report his income, you would eliminate that person from your study and proceed with a conventional analysis based
on complete data but with a reduced sample size. This method is unsatisfactory inasmuch as it requires discarding the information contained in the responses that the person did give because of the responses that he did not give. If missing values are common, this method may require discarding the bulk of a sample.
2. pairwise deletion
Another standard approach, in analyses that depend on sample moments, is to calculate each sample moment separately, excluding an observation from the calculation only when it is missing a value that is needed for the computation of that particular moment. For example, in calculating the sample mean income, you would exclude only persons whose incomes you do not know. Similarly, in computing the sample covariance between age and income, you would exclude an observation only if age is missing or if income is missing. This approach to missing data is sometimes called pairwise deletion
3. data imputation
- A third approach is data imputation, replacing the missing values with some kindof guess, and then proceeding with a conventional analysis appropriate for complete data. For example, you might compute the mean income of the persons who reported their income, and then attribute that income to all persons who did not report their income (mean imputation). Beale and Little (1975) discuss methods for data imputation, which are implemented in many statistical packages.
- Multiple Imputation (MI). Just like the old-fashioned imputation methods, Multiple Imputation fills in estimates for the missing data. But to capture the uncertainty in those estimates, MI estimates the values multiple times. Because it uses an imputation method with error built in, the multiple estimates should be similar, but not identical. The result is multiple data sets with identical values for all of the non-missing values and slightly different values for the imputed values in each data set. The statistical analysis of interest, such as ANOVA or logistic regression, is performed separately on each data set, and the results are then combined. Because of the variation in the imputed values, there should also be variation in the parameter estimates, leading to appropriate estimates of standard errors and appropriate p-values. Multiple Imputation is available in SAS, S-Plus, R, and now SPSS 17.0 (but you need the Missing Values Analysis add-on module).
- EM algorithm. EM does impute the unobserved data.using Graham's EM program (freeware)
- The Missing Value Analysis module in SPSS can do EM.
Amos does not use any of these above methods.
- to analyze the full, incomplete data set using maximum likelihood estimation. This method does not impute any data, but rather uses each cases available data to compute maximum likelihood estimates. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data.
- Amos does not use imputations. Instead, the model is estimated by full information maximum likelihood (FIML) from the observed portion of the data.
- When data are missing, we can factor the likelihood function. The likelihood is computed separately for those cases with complete data on some variables and those with complete data on all variables. These two likelihoods are then maximized together to find the estimates. Like multiple imputation, this method gives unbiased parameter estimates and standard errors. One advantage is that it does not require the careful selection of variables used to impute values that Multiple Imputation requires. It is, however, limited to linear models.
- Analysis of the full, incomplete data set using maximum likelihood estimation is available in AMOS. AMOS is a structural equation modeling package, but it can run multiple linear regression models. AMOS is easy to use and is now integrated into SPSS, but it will not produce residual plots, influence statistics, and other typical output from regression packages.
- The Maximum Likelihood does not have to "replace" the missing values. For instance, Mx and Amos assume that the data are missing at random (MAR), and then compute the likelihood of the parameter values given the observed data of each case.
- Unlike many other methods, Amos's full information maximum likelihood (FIML) estimation uses all information of the observed data.
- Even in the presence of missing data, it computes maximum likelihood estimates (Anderson, 1957).
- Amos assumes that data values that are missing are missing at random. It is not always easy to know whether this assumption is valid or what it means in practice (Rubin, 1976). On the other hand, if the missing at random condition is satisfied, Amos provides estimates that are efficient and consistent.
- From the menus, choose View → Analysis Properties. In the Analysis Properties dialog box, click the Estimation tab. Select Estimate means and intercepts (a check mark appears next to it). Maximum likelihood estimation with missing values works only when you estimate means and intercepts, so you have to estimate them even if you are not interested in the estimates
- Computing some fit measures requires fitting the saturated and independence models in addition to your model. This is never a problem with complete data, but fitting these models can require extensive computation when there are missing values. The saturated model is especially problematic. In addition, some missing data value patterns can make it impossible in principle to fit the saturated model even if it is possible to fit your model.
- With incomplete data, Amos Graphics tries to fit the saturated and independence models in addition to your model. If Amos fails to fit the independence model, then fit measures that depend on the fit of the independence model, such as CFI, cannot be computed. If Amos cannot fit the saturated model, the usual chi-square statistic cannot be computed.
- If Amos succeeds in fitting both the saturated and the independence model. Consequently, all fit measures, including the chi-square statistic, are reported.
- Check example 17