Statistical Associates Publishers
Canonical Correlation: 10 Worst Pitfalls and Mistakes
- Inappropriate observed variables.
The dependent and covariate sets of measured variables should each contain variables which intercorrelate. Correlating arbitrarily composed sets will yield arbitray results.
- Interpreting only the first canonical correlation.
The dependent and covariate sets of variables may be related along more than one significant dimension. While the canonical correlation for the first dimension is always the most important and might be the only significant one, it is quite possible that there will be more than one.
- Having only one significant variable in a set.
Canonical correlation is intended for many-to-many relationships. If a set has only one significant measured variable, it is not appropriate.
- Violating linearity.
Canonical correlation is a member of the general linear model family and assumes linear relationships. However, nonlinear canonical correlation is available.
- Treating ordinal data as interval in level.
This is the same violation as is common in multiple linear regression. Nominal and ordinal variables are often best treated using nonlinear canonical correlation.
- Failing to undertake redundancy analysis .
The redundancy coefficient, which measures the percent of variance in one set of measured variables that may be predicted by the canonical variable of the other set, should be reported along with the canonical correlation. The reason for this recommendation is because it is possible for the canonical variates to correlate highly, yet each variate may not extract significant proportions of variance from their respective sets of original variables. Redundancy analyis assesses the magnitude of relationships.
- Using canonical weights to interpret and label canonical dimensions .
Canonical structure coefficients should be used along with canonical weights. Of the two, the former is primary for imputing labels to dimensions.
- Not assessing the model using the canonical variate adequacy coefficient .
The canonical variate adequacy coefficient is the average of all the squared structure coefficients for one set of variables with respect to a given canonical variable. It is a measure of how well a given canonical variable represents the original variance in that set of original variables.
- Inadequate sample size .
Canonical correlation assumes large sample size to control the chances of Type II error (false negatives). Canonical correlation is not recommended for small samples. Also, when categorical variables are used, as in nonlinear canonical correlation, as with all categorical variable procedures, a rule of thumb is that for the cells formed by the categorical variables, no cell should be 0 count and 80% of cells should have a count greater than 5.
- Not meeting the assumptions of multiple linear regression in linear canonical correlation, including multivariate normality, linearity, homoscedasticity, and data independence. Other assumptions apply to nonlinear canonical correlation. .
Our book, listed below, enumerates many assumptions of canonical correlation, clearly listed in the "Assumptions" section.
Want to learn more about all this and much more?
"GLM Multivariate, MANOVA, & Canonical Correlation" on Amazon, Kindle format. After purchase, send your receipt to sa.publishers@gmail.com to get also a free pdf version if you wish.
"GLM Multivariate, MANOVA, & Canonical Correlation" Preview, PDF format
"GLM Multivariate, MANOVA, & Canonical Correlation" Information and table of contents
"Statistical Associates Library" of 50 Statistics E-books on Amazon, no-password .PDF format