Introduction

CROSS-CORRELATIONAL ANALYSIS

by Hugh M. Lewis

 

Cross-correlational analysis represents a species of quantitative analysis linked to a specific form of data. Cross-correlational analysis involves tabular data arranged in arrays of rows and columns. This type of data is common in social scientific and psycho-cultural research, and should be considered to be a natural form of knowledge representation of information, especially of certain categories and classes of data. This type of data is referred to as "paradigmatic" in the sense that it is theoretically unified and underlying this paradigmatic presumption is the implication that the data contained within the table is somehow inter-linked in a theoretic unity to some partial and imperfect degree.

Cross-correlational analysis therefore involves the systematic, quantitative elucidation of what may be referred to as underlying correlational structures that are implicit to the organization and patterning of data in such matrix tables, and that are held to be possibly "meta-paradigmatic" in the sense that this structural patterning may be repeatable across alternative matrices and may encompass and serve to explain a number of different alternative data-tables drawn from the same host population and their implicitly related paradigms.

Cross-correlational analysis involves descriptive statistics, but is itself not strictly statistical in the sense of the basic presuppositions of randomness underlying statistical testing does not underlie cross-correlational analysis in the same way. Cross-correlational analysis picks up where conventional statistics leaves off, with the presupposition of underlying structure influencing the patterning of the data distribution. Cross-correlational analysis is therefore not a means of testing hypothesis in a statistical manner, but for exploring the data in alternative ways that render hypothesis formulation more available and explicit in the first place.

The underlying correlational structures of data tables are always (a) implicit and always (b) hypothetical. Thus they are never available for direct determination, but must always be inferred in a probabilistic manner. Because they are always hypothetical, they are subject to the conditional constraints of statistical description and statements of probability. Hence they are always relative and nonabsolute in a mathematical sense, though the mathematical procedures that are involved in their analysis are true and correct.

The type of solution that cross-correlational analysis represents for complex informational systems is therefore only partial and semi-deterministic. It does not represent a total or net solution to the problem of underlying order in the sense of resolving a Von Neuman type bottleneck by reducing the "information explosion" algorythmically induced by a search for a simplifying solution to complexity problems. But by being linked to tabular data that represents supposedly real relations upon a theoretical level, cross-correlational analysis offers the possibility of partially determining significant underlying structures that are empirically based, and upon which discrimination tables and other forms of inference-drawing frameworks can be subsequently constructed. These processes may prove to be heuristically very useful to the construction of expert systems and other forms of A.I. programming such as genetic algorthyms and neural networks.

My interest in cross-correlational analysis arose out of a need to deal effectively with a tremendous amount of diverse kinds of reponse data from multiple, partially overlapping samples, many of which in a strict conventional sense did not meet the criteria of randomness, and yet that showed without a doubt a realistic patterning of relationship and determination.

Cross-correlational arose out of a need to coordinate and establish systematic patterns of difference and similarity between complex distributions of data--in ways that might otherwise be inobvious, counter-intuitive or not directly ascertainable through conventional forms of analysis and representation. Cross-correlational analysis arose in need of a basic descriptive system for complex data sets derived from realistic conditions of elicitation that rendered the presuppositions of statistics impossible--a form of analytical description or descriptive representation which precedes and in part preconditions explanatory inference in the framework of such inherent complexity of data.

Cross-correlational analysis provides a context for the integration of quantitative and qualitative forms of information, and for the inductive construction, operationalization and validation of theory and topical domains that are otherwise nonquantitative and only qualitatively represented. To the extent to which such a form of analysis is ultimately tethered to a specific set or range of data sets that are empirically rooted and measurable, such analysis can be claimed to be empirically consistent and representative of actual, if hidden, relationships of patterning in the data.

The possibility of cross-correlational analysis rests upon the following premises:

1. A strong correlation may represent a functional relationship, but a weak correlation represents a probable lack of a direct functional relationship.

1 a. A stronger correlation is more likely to represent such a relationship than a weaker one.

1 b. In a complex and large correlational matrix, even low correlational values may be significant indicators of a functional relationship.

2. Correlational sets of a matrix may be grouped and described statistically in a meaningful way--they form a curve the characteristics of which (mode, median, mean, etc.) may be evaluated.

3. A correlational matrix can be reorganized in different ways, resulting in different descriptions of the resulting tables--these values can be compared (i.e. rows versus columns) and evaluated for their relative fitness and theoretical import.

4. The rows and columns of a correlational matrix represent hypothetical dimensions of analysis that can be topically characterized.

5. In a complex and large correlational matrix, even low correlational values may be significant indicators of a functional relationship.

6. Statistical description in Correlational Analysis assumes a

special data type, that I refer to as "cardinal" form of data, that allows itself to be construed alternately as either parametric or non-parametric, or discrete or continuous,

depending on the circumstances of its definition. This data alternate type is, furthermore, a relative and non-absolute form of data.

A correlational matrix forms an uneven landscape of values that can be topographically mapped in detal. It is the pattern of this landscape that yields insight into the functional organization of the data. This patterning may be carried over into second and subsequent order cross-correlational matrices.

Underlying cross-correlational analysis are numerous suggestions of structure and functional relationships between alternate sets of data. The central hypothesis always to be tested in cross-correlational analysis is whether or not there might be some form of predictive "structure" underlying correlational patterns within different matrices, and if so, then how might these implicit correlational structures be further elucidated and evaluated. Evidence for such correlational structures suggests that not only will strong correlations occur within a matrix, but these correlations will co-occur and even reoccur, and will tend to cluster in meaningful and repeatable ways in numerous alternate data sets.

At this point, cross-correlational analysis resembles cluster analysis, and a form of cluster analysis as representation is a spin-off of correlational analysis. But cross-correlational analysis extends beyond mere clustering of the data to make systematic assertions and to draw inferences about the underlying structure that yields a dynamic and partially predictive model--one that can be tested and deliberately manipulated in subsequent experiments.

The close relationship between correlation and linear regression suggests that linear relationships may underlie correlations in significant ways. There is a tendency in large correlational matrices for data sets or scores to form "natural" clusters or statistically uneven distributions which strongly imply such underlying linear structure. It is the search for, representation and analysis of and the inferences drawn from such distributions and clustering of data into natural sets that drives cross-correlational analysis as a fruitful form of research.

Cross-correlational analysis constitutes an "hypothetical-inductive" form of exploratory, empirical research, that is bolstered at points by deductive hypothesis testing of derived inferences. The theoretical implications of cross-correlational analysis will be explored more thoroughly in the final chapter.

The standard Pearson product moment correlation coefficient contains a great deal of information about a distributed set of data. That correlation matrices are grouped by column-row headings entails a higher level patterning between coefficients, which represent an astounding level of complexity. The comparison between similar or different matrices, or of parts of a matrix with one another, makes possible a level of analysis that is intuitively interesting, and yet which is difficult to represent graphically in any simple, straight-forward way, or that is also difficult to demonstrate mathematically or statistically.

A correlation may be calculated for any equal-sized set of points, whether an actual relationship exists between the points or not. Correlations may be indicative of only spurious or quite superficial relationships of otherwise quite different sets of points. Apples and oranges may have high correlations on measures of roundness and diameter. Thus, unrelated sets of points may by chance happen to have a high correlation, and very closely related sets of points may by chance have a very low and apparently insignificant correlation.

Thus, strong correlations of apparently related data sets may actually belie the indirect influence of a third or more unknown set of variables, or else may just be a random ocurrence between two data distributions between which there is little or no other functional relationship at all. On the other hand, correlation coefficients that are seemingly low and insignificant, may actually belie an important functional relationship between two sets of data points.

Two main considerations drive the use of correlation coefficients in the analysis of data--that sets of data are united theoretically or hypothetically in a non-random way, and random sets of data will tend, on average, towards low correlation. When sets of data are linked theoretically, then even minor correlations may have significance. When especially large, naturally occuring sets of data tend toward unusually high correlation, it is probably indicative of some kind of structural, non-random relationship between these points.

The second problem is that though a high correlation may be indicative of a functional, deterministic relationship, the precise nature of this relationship is neither obvious nor directly available. Thus a great deal of correlational analysis has to do with the systematic elucidation of the underlying functional structure of a given correlational matrix, if one can be said to probably exist.

A correlational matrix represents a set of relative values of interrelationship between points that are hypothetically united under related topical domains. Any correlational matrix contains a great deal of information--more than the superficial landscape of its topography--that is available to analysis. Correlational matrices are therefore under-utilized for their potential analytical value. That correlational matrices are hypothetically united under supposedly related domains subsumed by the row and column headings entails that there is a hypothetical and implicit internal structure to the table which allows for a sophisticated and systematic reading of the data, and also that allows for the possibility of the systematic comparison and interrelation of different matrices that can be held to be theoretically or paradigmatically united.

It is important to construe correlational analysis in the kind of framework in which it arose, and that is the systematic comparison and description of psyco-cultural data that was gathered and organized on the presupposition of "cultural consensus"--that cultural patterning is shared by culture bearers. This sharing is somehow fundamental to the understanding of this patterning and its daily reiteration and reinforcement, and that sharing is empirically available for analysis and is partially indicative of the influence of culture. Now sharing may be incidental, may belie a lot of diversity within a cultural orientation, and may be the by-product of rather complex reasons. Cross-correlational analysis appears to be especially suitable for the analytical description of these patterns, and for the hypothetical elucidation of the underlying "structures" which may be held theoretically to account for such patterns.

Elucidating the patterning of complex psycho-cultural data represents the basis of cross-correlational analysis, but as a technique it can be systematically extended to embrace a more realistic description of naturally occuring "cosmographical" systems, and we can see that it may represent a fundamental aspect of information theory. To the extent that "culture" is ordered as a naturally occuring system, we can see that the cross-correlational elucidation of its patterning is but the beginning of many possible applications for this form of analysis in human social life.

A correlational matrix contains a field of information about the interrelationships between the "things" labeled in the column/row headings. It is a central thesis of this study that correlations represented in such matrices may be treated for heuristic purposes like any distributed set of data points--at an ordinal level as relative expressions of "distance" between the things they associate, such that the normal sets of descriptive/predictive statistics that are applied to any other data points, may be applied to this correlational data in a similar manner.

Thus we may compute averages, z-scores, or linear regression from these points--at the same time we can employ chi-square analysis and perform non-parametric correlational analysis upon the table or related tables. These statistical computations are virtually identical to those done on normal sorts of data, except that they come to have special heuristic importance--they allow us to make statements about the relationships between the "things" rather than about the things themselves. Thus what it is that cross-correlational statistics is describing is fundamentally different from what formal descriptive statistics derived--they are measures made on relational measures, rather than on "points" or isolatable "entities" actually occuring in reality.

Furthermore, a correlational table represents a "closed sample" with known data-points and specific degrees of freedom. The cross-correlational statements we can make in this regard remains the same no matter what the size of the original sample, or the size of the host population that was sampled. The correlational matrix always has the same basic primary structure regardless of the dimensional size of the matrix or arrangement of data within it. Thus each matrix contains a set of hypothetically related values that can be arranged within the same dimensional space, and the structural patterning of which can therefore be compared. This aspect of correlational matrixes and cross-correlational analysis is fixed and in a sense "absolute." It therefore provides a common anchor point by which very different correlational structures can be systematically compared and evaluated.

We can therefore calculate the p-value of the cross-correlational description of the data according to original sample or population sizes. If known, it is a value that is to some extent the measure of significance or independence of the derivative cross-correlational structures in relation to the orginal data from which they were derived.

If several correlational matrices were computed from different samples drawn from the same host population, we would then be able to evaluate the degree to which the sampling and correlational matrices resembles the actual relational characteristics of the host population by the extent to which the cross-correlational patterns are significantly different or similar to one another.

Correlational matrices contain information about hypothetical "correlational structures" which are held to be inherent to the actual patterning of the host population or original data set. Correlational objects are not objects in the conventional sense but relational representations of the dimensional space within which all the related correlational values are theoretically contained. They are statements of hypothetical spaces in which functional relations are held to occur with a predictable frequency pattern. Correlational structures therefore are fundamentally spatial or "areal" structures, rather than being point values. Such dimensional space has signficance in the sense that tightly defined areas represent "peaks" of a complex landscape. These peaks sit on the surface of the table like ice-bergs float on the surface of the ocean, disguising and yet indicating a mass of hidden information below the surface.

Correlational values thus contain information about the probable relation between any two or more sets of points within this space, the likelihood that these points are coincidental, simultaneous, or functionally interdependent. Thus movement within the correlational space represents a continuous projection and modulation of the structure through time, and problems of temporal patterning can be represented through the use and analysis of cross-correlational analysis--revealing linear relationships between alternative or related data structures.

A correlational table (an inverted matrix) then comes to also comprise a special kind of discrimination table and hypothetical search space--we are interested in deriving inference trees from such tables based upon the relative saliencies and "peaks" of probability values within the structure. This allows us to speak of precedence of changing structural patterns of the underlying structure, a pattern of change that is to some extent predictable and partially determined by the structure.

No single correlational matrix contains enough information by which to completely analyze the underlying structure from which it is drawn. This is why cross-correlational analysis proceeds with the descriptive and comparative analysis of two or more matrices that are theoretically related. The process of cross-correlation allows us to elucidate more information about the underlying correlational structure than would otherwise be possible from the analysis of only one matrix.

Not every correlation of a matrix is necessarily significant or represents part of the underlying structure. There will occur within any matrix a clustering of correlational values and it is this natural clustering that is most indicative of the underlying pattern within the matrix--two clusters of values may have high positive values within each, and yet have a high negative correlation between the two clusters themselves. Any such clustering will tend to encompass only a few of the total possible number of values represented within such a matrix, though there may be indirect correlational clusters that interconnect these with other value sets.

In general, a correlational matrix that exhibits some meaningful ordering or clustering, is indicative of an underlying correlational structure, such that rearrangement or alteration of values will produce predictable patterns of clustering in the resulting matrix. If such an underlying structure indeed exists within a matrix, then this structure will tend to be consistent and to manifest itself in varying but stable forms in different matrices or arrangements drawn from data of the same original host population.

It is possible that humans think, at least in part and upon a very basic level, in cross-correlational patterns by which we derive estimates of similarity/difference of successive or alternate structures of meaning. We may therefore refer to cross-correlational "networks" and chaining activity of the human mind, which may be something more specific or more than metaphorical compared to "analogical chaining." This suggests that our minds may actually be hard-wired at some basic level to automatically perform a kind of complex analysis that is involved in cross-correlation, and that symbolic templates may exist, constructed and precipitated from experience and more or less available within our memory, which in effect constitute cross-correlational transform operators. Such structural patterning of the brain may in fact represent mental "devices" by which we organize and analyze our realities at a basic level of pattern recognition, intuitive preunderstanding and rational inference production. Because such processing would be mostly done automatically, we are rarely reflexively aware of exercising this mode of thinking.

The logic which guides this mode of thinking may also be different from the more conventional understanding of logic--it may utilize a more flexible and informal form of abductive reasoning structure which permits modus ponens type fallacies of deriving an antecedent from a consequent, if such derivations are rooted inductively in the experience of past occurrences or in conventional preunderstandings or "common knowledge" or "common sense."

This type of reasoning is more befitting correlational structures in which simultaneity of co-occurence is always present, and, being a less restrictive nonstandard form, it therefore permits more flexibility in the inferential interpretation of the correlational structure than would otherwise be possible.

The nature of the data in cross-correlational analysis takes a particular form--it consists of what I refer to as X number parallel data sets which are aligned such that for each data in any given set, there is at least a one-to-one correspondence with a subset of data points from all the other matched sets. Such data sets are common in psycho-cultural research where individuals or subsamples are being compared along a suite of the same sets of traits, or where multiple dimensions occur, each represented by a range of similar data. This permits the analysis of correlational data in both depth-wise "parallel" fashion, and breadth-wise or "cross-wise" fashion, a form of dual analysis that opens the possibility of cross-correlational analysis in several different directions.

Secondly, the data, being thus constrained, while hindering our ability to use data that does not meet the requirements (i.e. unequal sized data sets), at the same times allows us a degree of control over the data that we would otherwise not have. All correlational data tables of whatever dimension X are X squared in size, and this permits us to conduct cross-correlational analysis upon the matrix.

Cross-correlational analysis represents an experimental and exploratory form of data analysis based upon the manipulation, comparison and inter-correlation of different correlational matrices that are derived from the correlation coefficients which measure the degree to which equal sized sets of variables move together, regardless of their relative magnitude. In short, cross-correlational analysis takes two or more correlation matrices of equal dimensionality, and rearranges the data in order to construct a correlational matrix based upon the aligned first order matrices. Alternatively, it takes different, equal sized sections of the same correlational matrix, and compares these by a second order correlation matrix.

Sometimes third and even fourth order matrices can be constructed out of previous intercorrelation matrices, especially if such analysis is strongly motivated by theory and the occurrence of a significant patterning of the correlation.

The first and second order correlation matrix can be reconstructed in different ways, or analyzed in alternative ways, enabling different kinds of intercorrelational matrices to be derived. For instance, rows and columns of a first order matrix, if equal in number, may be compared in a subsequent table based upon their intercorrelation--they may be rank-ordered and compared in this manner in an alternative form of correlational matrix.

Techniques of cross-correlation lead to different ways of representing data and the relationships between sets of data that enable us to draw inferences systematically and to statistically evaluate certain kinds of inferences about such data, as if the data were nominal, ordinal or rank order in type. It enables us to see patterns of interrelationship between sets of data that would not otherwise be apparent, and to infer from such patterns the likelihood and degree to which a functional relationship may cohere between the data points.

In a sense, the information contained within an intercorrelational matrix is fundamentally different from that of the usual correlational matrix, as it demonstrates the degree to which each part is related to and representative of the whole set of variables--information about the total set of relationships is therefore evenly distributed between all the different points contained within such a matrix. This information itself can be rank-ordered or arranged in meaningful ways that allow us to make inferences about the data. It is this underlying distribution of relational information that ulimately permits structural analysis of compared matrices to proceed--where gross unevenness occurs or recurs in the redistribution of values, especially in repeated alternate instances, where none would otherwise be expected, then such unevenness points to a semi-deterministic patterning between the original dimensions being compared.

Any set of points may be correlated with any other set of points, whether an actual relationship exists between such points or not. The only stipulation of different sets of data in their correlation is that they be of the same magnitude in terms of equal number of degrees of freedom. Though anything may be correlated with anything else, it is also the case that when and if a functional relationship exists between two or more sets of data, then definite patterns of correlation will be apparent and will tend to be statistically significant, and will take on certain characteristic patterns that will not be evident among unrelated and basically random sets of data. Thus, techniques of intercorrelation can be systematically exploited for the discovery of hidden functional relationships between different kinds of data, and for providing insight into the causal direction and theoretical significance of such relationships.

Correlation and intercorrelation matrices can also be converted into discrimination tables or histograms, or used in different forms of data analysis and representation, such as multidimensional scaling, cluster analysis or factor analysis.

Cross-correlational analysis can be time-consuming and therefore costly in research resources. The number of possible intercorrelational matrices that can be constructed and derived grows exponentially with the size and number of sets of data. It can yield a great deal of information at a level of complexity that is frequently difficult to decipher or interpret theoretically, or which may belie a fundamental spuriousness or triviality of real relationship. To some extent these draw backs can be offset by the introduction of systematic means of cross-correlational search and analysis, especially by means of computer, and by the close coordination of such analysis with theoretical interpretation and inference derivation, and, finally, by the construction of an integrated data-base or computer program that can take over and exhaustively conduct such search analysis on raw data.

But the reward of conducting such analysis and search is not to be measured in terms of its cost or complexity, but in terms of its productivity of theoretical insights, inferences and in providing a systematic means by which to construct a sophisticated rule-based inference engine derived from the patterning of the data. These advantages far outweigh the disadvantages. It is simply wrong to discard cross-correlational analysis on the basis of its fundamental lack of statistical determinancy that is all to often presumed away in other forms of data analysis, as it is a method that is far too natural and productive of insight.

 


Blanket Copyright, Hugh M. Lewis, © 2005. Use of this text governed by fair use policy--permission to make copies of this text is granted for purposes of research and non-profit instruction only.

Last Updated: 04/19/05