Chapter Five
PREDICTIVE ANALYSIS
Descriptive analysis of correlation matrices yields to predictive analysis and inference formulation at the point that questions of the internal validity of the table give way to issues of its external validity in relation to a larger, hypothetical class of points.
In this regard we may distinguish between a larger class of ponts which are construed as "normally distributed", as "random", as "multiple" or as hypothetical.
In general, any significant correlational matrix exhibits a minimal hypothetical pattern or structure which is representative of a larger universe of patterning.
Any correlation comprises a number of hidden variables and interrelationships--of these variables and implicit interrelationships, only a subset can be considered to be deterministic in a causal sense.
The order implicit to any correlational matrix can be said to be composed of the following hypothetical variables:
1. direct, deterministic, linear values.
2. indirect, nonlinear, interdependent values.
3. random or nondeterministic values.
Such that for any given correlation or set of correlations, d * di + nd= 1 and 1 - nd = d* di, and nd = 1 - (d*di) such that d * di equals the correlation coefficient.
We may assume also a reflexive non-isomorphism between deterministic and indeterministic values, such that nondeterministic values do not necessarily equal to one minus the correlation such that 1= (d * di + ud)+ nd.
If we convert the correlational values of a matrix to render the origin equal to a perfect absolute correlation, then any correlation equals a vector from the orgin. The precise locale of the abcissa and ordinate cannot be known, because the vector may intersect a circle around the orgin at an infinite number of possible points. But the vector's value can be transcribed from the correlation coefficient to the variables of the square root (x squared plus y squared). The correlation thus gives the strength of the vector, but not its directionality.
It is also the case that negative values will naturally fall into separate quadrants than the positive values--the general direction of the vector of the correlation will be determined by the relative sign of the correlation.
Predictive cross correlational analysis involves several possible levels of inference and deduction.
First, we wish to evaluate the distribution of values in the matrix to determine the likelihood that such a distribution could have been distributed by chance.
Secondly, we wish to compare the values of two or more matrices to determine if they are significantly different in pattern or to see if they share a similar underlying structure.
Third, it is possible to use several forms of regression analysis upon the values of the data table and the correlational tables, if we assume that the values of the same field do not differ fundamentally in magnitude or scale of measurement.
Fourth, we wish to infer a reasonable underlying structure of one or more correlation tables given that they are significantly nonrandom distributions and that some form of regression relationship has been determined as possible for the patterning. Because the underlying inferential correlational structure is held always to be a simpler model than the manifest matrix, it is asked what possible variables may systematically account for one or more values of the manifest matrix. In this we might assume a minimal parsimony of the underlying structure--such that the minimum number of structural variables can consistently account for the maximum number of manifest values.
Fifth, we wish to use our understanding of the correlational matrix to interrelate other matrices and to systematically alter certain values of the original data or of a matrix to produce expected outcomes. Because the underlying correlational structure is held to be only a part of a larger, unknown system of information, the comparison and interrelation of two or more matrices in terms of their hypothetical correlational structures allows us to construct a more elaborate model of a larger context of information.
The topical value of a correlation matrix is the function of the set of correlations of that value with every other value within the matrix.
Changing a single value within the original data table, alters the entire matrix of correlational values--it can be said that the values of data are evenly distributed within a correlational matrix. This offers the possibility of exploiting a correlational matrix/table for the purpose of creating a distributed processing system, such that a "search" for values can be contained within the space of the correlational matrix.
A bimodal correlation matrix has the characteristics of high similar values in both tails and high opposite values in the central region. When squared, this results in clustering in the diagonal quadrants of positive and negative groups of values. A unimodal correlation matrix will have only one cluster of significant positive values in a tail. We can thus describe different patterns of clustering within correlational matrices. Step subclusters appear to create a more complex pattern of clustering, usually in larger matrices, and then "islands" small clusters within large matrices, especially if these occur with a meaningful periodicity represent an even more complex pattern.
This patterning within correlational matrices may not become apparent until after the orginal data sets have been regrouped in some order.
A bi-valent matrix is one which contains significant clustering or arrangement of both opposite signs. A mono-valent matrix is one which is entirely or mostly one or the other sign. An "ambivalent" matrix is one in which both signs are apparent, but without a sense of order.
Controversy exists over the rigor and strictures in interpreting the X ordinate in linear regression--a strict functional (predictive) model assumes that x must always always be an exactly varied value. Less restrictive structural (descriptive) linear regression models do not assume that X must be predetermined, but can represent practically any variate value. Correlation works on the premises that neither X nor Y are determined, and thus is the least restrictive model--measuring only the strength of relationship or interdepedency, but neither the direction nor the actual dependency of the relationship between x and y. While data used in linear regression can be used directly to determine the correlation coefficient, the correlation coefficient alone cannot directly be used to determine linear regression.
Thus we may imagine a undetermined area of an X-Y coordinates anywhere within which the data points of a correlation may be found to lie--the direction and precise location of these points cannot be known, only the relative size and linearity of the area as represented by the strength of the correlation. We may then imagine a smaller semi-determined subset of this space upon which we can plot model 2 type regression lines--of this subsection of the total correlational area, we may imagine an even smaller subset of points that represent the actual line fit plot of a linear regression equation given the precise values of X. Working backward, indirectly through model two regression techniques, we may approximate a functional linear regression model on the basis of a correlation.
Frequently, it is desirable to compare and intercorrelate different correlational matrices or even different subsections of the same correlational matrix. This can be accomplished in a number of ways--gross descriptors such as the total relative average or total absolute average may be used to compare matrices of any size. As long as the topical dimensions are the same or are theoretically united, matrices of the same size can be rearranged and intercorrelated in a more direct manner. Sometimes two equal sized matrices are generated from the rows and columns of the same table--it is possible to intercorrelate these two matrices to determine the degree of relationship between the sets of variables. The relative and absolute descriptors can also be compared, and this comparison will give us insight into the nature of the functional relationship between the dimensions represented in the table.
A correlational matrix must be squared before any linear analysis can be conducted upon its values.
Blanket Copyright, Hugh M. Lewis, © 2005. Use of this text governed by fair use policy--permission to make copies of this text is granted for purposes of research and non-profit instruction only.
Last Updated: 04/19/05