WEIGHTED LEAST SQUARES REGRESSION
Also called multiway frequency analysis (MFA), log-linear analysis is a special case of the general linear model (GLM, which includes regression and ANOVA models) created to better treat the case of dichotomous and categorical variables. It is a method of analyzing the distribution of cases in a table when all the variables of interest are categorical. Usually there is no "dependent variable" as in regression, though the special case of logit log-linear analysis, discussed below, can handle dependent variables. Ordinarily, however, what is predicted is not a variable but instead is the distribution of values in the table formed by categorical variables. The table is not limited to the usual two-way table but may be of any order (any number of categorical variables).
Thus log-linear analysis deals with association of categorical or grouped variables, looking at all levels of possible main and interaction effects, comparing this saturated model with reduced models. The primary purpose is to find the most parsimonious model which can account for cell frequencies in the table being analyzed. While log-linear analysis is a non-dependent procedure for accounting for the distribution of cases in a crosstabulation of categorical variables, it is closely related to such dependent procedures as logit and logistic, probit, and tobit regression.
Log-linear analysis is different from logistic regression in three ways:
1. The expected distribution of the categorical variables is Poisson, not binomial or multinomial. 2. The link function is the natural log of the dependent variable, not the logit of the dependent as in logistic regression. (A logit is the natural log of the odds, which is the probability the dependent equals a given value [usually 1, indicating an event has occurred or a trait is present] divided by the probability it does not). 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y. That is, the cell count is the dependent variable in log-linear analysis.
Log-linear methods also differ from multiple regression by substituting maximum likelihood estimation of a link function of the dependent for regression's use of least squares estimation of the raw dependent variable itself. The link function transforms the dependent variable and it is this transform, not the raw variable, which is linearly related to the predictor side of the model.
There are several possible purposes for undertaking log-linear modeling, the primary being to determine the most parsimonious model which is not significantly different from the saturated model, which is a model that fully but trivially accounts for the cell frequencies of a table. Log-linear analysis also is used to determine if variables are related, to predict the expected frequencies (table cell values) of a dependent variable, the understand the relative importance of different independent variables in predicting a dependent, and to confirm models using a goodness of fit test (the likelihood ratio). Residual analysis can also determine where the model is working best and worst. Often researchers will use hierarchical log-linear analysis (in SPSS, the Model Selection option under Log-linear) for exploratory modeling, then use general log-linear analysis for confirmatory modeling.
SPSS supports these related procedures, among others:
The full content is now available from Statistical Associates Publishers. Click here.
Below is the unformatted table of contents.
Log-linear Analysis Table of Contents Overview 8 Key Concepts and Terms 10 Types of log-linear analysis 10 General log-linear analysis 10 Hierarchical log-linear analysis 11 Types of variables 11 Factors 12 Covariates 12 Cell structure variables/cell weight variables 12 Contrast variables 12 Types of models 12 Saturated models and effects 12 Parsimonious models 14 The complete independence model 15 The one factor independence model 15 The conditional independence model 16 The homogenous association model 18 The symmetry model 19 The conditional symmetry model 19 General log-linear modeling: SPSS user interface 20 The "Model" button 21 The "Options" button 23 The "Save" button 24 General log-linear analysis compared to crosstabulation (SPSS) 24 Log-linear effects as categorical control variables in crosstabulation 24 General log-linear analysis of the crosstab example 26 Goodness of fit in log-linear analysis 28 Types of goodness of fit measures 28 Likelihood ratio 28 Pearson chi-square 29 Factor list warning 29 A simple goodness of fit example 29 General log-linear analysis using SPSS 30 Overview 30 Example 31 The saturated model 32 The independence model 34 Model dropping the highest level of interaction 36 The conditional independence model 37 General log-linear analysis using SAS 39 Example 39 SAS syntax 39 SAS output for the saturated model 41 SAS output for the independence model 41 SAS output for the homogenous association model 42 SAS output for the conditional independence model 43 Residual analysis 45 Overview 45 Residuals depend on the model 45 Residuals of the most parsimonious model 46 Adjusted residuals plots 47 Normal probability (Q-Q) plots 48 Deviance residual plots 50 Normal probability (Q-Q) plots for deviance 51 Parameter estimates and odds ratios 51 Overview 51 Parameter estimates 52 Standardized parameter estimates (Z scores) 54 Model equations in log-linear analysis 54 Predicted frequencies 55 Odds ratios 57 Example 57 Hierarchical log-linear analysis 61 Overview 61 The SPSS user interface for hierarchical linear modeling 61 The initial "Model Selection Loglinear Analysis" dialog 61 The "Model" button dialog 62 The "Options" button dialog 63 Statistical output for hierarchical log-linear analysis in SPSS 64 The "Cell Counts and Residuals" table 64 The "Step Summary" table 65 The "Goodness of Fit Tests" table 67 The "Parameter Estimates" table 68 "Tests of K-Way and Higher-Order Effects" table 70 The "Partial Associations" table 71 Ordinal log-linear models 73 Overview 73 Linear-by-linear association models 73 Linear-by-linear modeling in SPSS 73 Example 73 Data setup 74 Statistical output for the linear-by-linear ordinal model 75 Row-effects models 76 Overview 76 Data setup 76 Statistical output for the row-effects ordinal model 77 Column-effects models 78 Logit log-linear models and logit regression 79 Overview 79 Example 79 The SPSS user interface for logit log-linear analysis 79 The main logit log-linear user interface 79 The "Model" button dialog 81 The "Options" button dialog 83 The "Save" button dialog 84 Logit log-linear statistical output in SPSS 84 Model 84 The "Goodness-of-fit Tests" table 84 The "Analysis of Dispersion" and "Measure of Association" tables 85 The "Parameter Estimates" table 86 The "Cell Counts and Residuals" table 88 START HERE 89 Conditional logit regression models 89 Matched pairs or panel data 89 Conditional logit regression in SPSS 90 Choice models 90 Statistical output for conditional logit regression in SPSS 91 Assumptions of log-linear models 91 Not assumed 91 Well-populated tables 91 Small models with few variables 92 Adequate sample size 92 No zero cells 92 No important outliers 93 Normally distributed residuals 93 No binned interval-level data 93 Evenly distributed categories 93 Independence 93 Data distribution assumptions 94 Appropriate dispersion 95 Absence of endogenous regressors 95 Frequently Asked Questions 95 Why not just use regression with dichotomous dependents? 95 Why not just use crosstabulation and ordinal measures of association rather than ordinal log-linear analysis? 96 What computer packages implement log-linear analysis? 96 What are second-order and partial odds ratios? 96 What are structural zeros and sampling zeros in the SPSS "Data Information" table? 97 Since logit and probit generally lead to the same statistical conclusions, when is one better than the other? 97 Do I really need to do multinomial logit (multinomial logistic regression) or multinomial probit? Could I just apply M different logit or probit models for a variable with M levels? 98 What if my variables are multiple-response type? 98 Explain "partial odds". 98 Explain coding in saturated vs. nonsaturated models. 98 What is log-linear analysis with latent variables? 99 Bibliography 99 Pagecount: 103