Chapter Four
DATA TYPES AND TABLES
Cross-correlational analysis always works with data that is organized in a particular manner--it can be any kind of numerical, quantitative data, of any magnitude or variance, as long as it is tabular in organization. The data table constitutes the empirical foundation of cross-correlational analysis--all analysis begins and ends with the distribution of data in the table. The data-table always has N number of columns and K number of rows. When a correlation matrix is constructed, correlation coefficients are calculated between all possible binary combinations of either rows or columns--if the calculation is horizontal, then the dimension of the matrix (X) will be the number of rows and its depth (D) will be the number of columns. If the calculations are vertical in direction, then the dimensions of the matrix (X) will be the number of columns and its depth (D) the number of rows.
The data in tabular form is arranged in even numbers of meaningful units. The data is distributed across a field N * K in size--such a data table comes to constitute a frequency field in which each cell contains a value specfic to that location in the table.
Theoretical significance has to do with the question of the degree to which actual scores and correlations actually represent the hypothetical problem and to what extent their structure might represent a partial "solution" to the problem. Correlational search may yield insights into structural relationships between dimensions or sets of data that are theoretically highly productive--at the same time correlational matrices may yield results which are contradictive to theoretical predictions.
Frequency Distributions, Data Types and the Logic of Analysis
Four data types are conventionally distinguished in statistics--nominal, ordinal, interval and ratio scale data.
Conventional caution teaches utilization of the most basic form of data-type possible. A frequency table is normally considered to be of a nominal scale. It is scored on "presence or absence" (1 or 0) of a nominally defined trait. The cumulative frequencies are then added up and calculated.
It is not that the Pearson product moment correlation coefficient cannot be run upon frequency distribution data--indeed it can be and often is. It is that it treats what is in fact a form of nominally defined data as if it were interval type data--in fact this is an entirely permissable and quite common type of "error" to make. The "type" of data used is a function of how the data is to be treated, and the net kinds of inferences which we can then make from such data. Though blood pressure, height, and weight are actually interval scale data, people can be nominally sorted into arbitrary classes of high/intermediate/low categories, and then nominal type statistics can be run on them. Alternatively, we might order blood pressures into certain rank according to a histogram of natural frequencies of occurrence, and then this data can be treated ordinally.
The fact of the matter is that frequency data can be used parametrically in interval type analysis, and it frequently is in large scale statistical surveys. The "misuse" of such data does not invalidate the resulting probabilities--it only adds a cautionary proviso to such use. We must not presume, because we have a frequency distribution, that the data it represents are of a continuous as opposed to a fundamentally discrete or discontinuous nature--this is the line of demarcation separating truly "quantitative" forms of data and analysis from those held to be fundamentally "qualitative" in form. In other words, because a given frequency distribution might suggest a solid linear relationship between the sets of variates, this doesn't mean that in real life a line can simply be drawn connecting the points and thence extended outward in a predictive fashion. It doesn't mean that we can have 2.35 X boys, Y colors, Z people, for every 1.34 M, N shades or Y-type people. But this fiction is one that is inherent to the use of statistics--we cannot have actually have 1.5 cars or 6.7 people in a household simply because we have an average of 1.5 cars or 6.7 people in our survey, and the fact that averages of frequency distributions always turn up on an interval scale does not mean we throw out averages of confine their use only to interval-scale data. And this doesn't mean that there might not be a significant correlation, measured parametrically but interpreted nonparametrically, between the relative frequency of cars and people per household. A correlation of .91 cars to people does not mean that for every person there will be exactly .91 cars. It is important to keep in mind that naturally occuring data always form "fuzzy sets" and the "type" of data composing any one set is more a function of our categories/constructs and means of measurement or determination of scale, than it is anything inherent to the data itself. Thus, when I use a counter to determine the number of cars and motorcycles which pass a certain point at a certain time on a road, I am treating a form of frequency data which is nominally defined as if it were a relatively continuous variable with an equal interval of measurement between each of the equal "clicks" of my hand counter. The same thing is being done with a blood pressure gauge or a ruler.
Income and income distribution is a common example, as are population statistics in general. In fact, practically any kind of statistical procedure or descriptive parameter can be used upon a frequency distribution, regardless of its ultimate qualtiative/quantitative character. We can average frequency distributions, perform t-tests, or calculate z-scores. Coin-flips, sex ratios, blood types, commonly used demonstrate a common pattern of binomial distribution are yet other examples of naturally occuring frequency distributions which can be ultimately used, if done correctly, at any level of data-scale.
One resolution of the interval-frequency paradox is to merely convert the frequency data into percentages--percentages offering a more "interval-like" (even ratio-like) form than the actual tallies themselves, albeit on a relative rather than an absolute scale.
In this regard, it makes a considerable difference whether the categories can be said to be "naturally occuring" or whether they can be said to arbitrary or merely a by-product of our research design. The shear magnitude of the sample sizes might affect the probability levels and confidence of the outcome, but it does not in and of itself alter the intrinsic nature of the data or its uses.
We should not be unnecessarily restrictive to the assignment of a categorical typology to various kinds of data that are encountered in research--the heuristic value of pursuing analysis upon discontinous variables as if they were in fact continuous variables clearly outweighs the considerations of correctness or of ignoring the limitations of data. Much more is gained than lost if we choose not to ignore a .9 correlation based upon a large sample of data, simply because by definition it was non-interval in character.
Cardinal Data-Types & Decimal Scales
Significance in cross correlational analysis rides on the a strong assertion of the validity and real world relevancy of a fifth, inbetween form of data-type, namely the cardinal, versus an ordinal, scale of data. What is unusual about this data type is that it spans the gulf between parametric and nonparametric forms of data, and thus can be used alternately to represent either form of data. Cardinal data type is most commonly found as a frequency or percentage data type. The assumption of an equal interval of strength or "distance" between numbers is assumed, as is the inherent indivisibility of these distances.
A criticism of conventional wisdom in descriptive statistics is that the dividing line between the usual data types is often not clear in real world situations. While parametric statistics can be treated nonparametrically, non-parametric statistics usually cannot be treated as if parametric in orientation, except if we make a presupposition of cardinality of number and decimality of scale.
Relative versus absolute data/samples.
The second critical question regarding data-types concerns our ability to "reuse" correlation scores as if they were basic data to be intercorrelated with other data. This is a more complicated problem, as it is not exactly clear what "data type" a value such as a "dimensionless" correlation coefficient is--relatively speaking it is a ratio type scale. As it was mentioned in the outset, anything may be correlated with anything else. The fact that apples might in terms of roundness or size be correlated with oranges does not make them the same. When second order or subsequent correlations are based on correlations which are themselves specious if not completely spurious, we may end up with high positive correlations between "dimensional relations" which in fact do not exist upon a lower "ground" level of analysis.
Subsequent cross-correlation appears to transform the original data in largely unknown ways. It may well be that a certain degree of dispersion intrinsic to the original distribution becomes factored out upon subsequent re-correlations, such that the total range of variation of the distribution is reduced. It appears that the clustering apparent in the first matrix becomes more pronounced with subsequent iterations of correlation, and that the inbetween values become lost. This is somewhat akin to deriving average means from original sample means--each generation shrinks the overall range and variability of the original distribution.
It can be said that at each level of remove from the ground table, the implications of the values become more complex and the real meanings less clear, such that soon a point of no return is reached, just after which there is a limit of diminishing returns for the amount of time and effort entailed in such analysis. It is usually for these reasons that cross-correlational analysis is rarely carried past the second or third level of remove from the ground table, and at most, the fourth.
In these considerations we can bring up another major consideration between absolute and relative forms of data and absolute and relative forms of manipulation or techniques of analysis of such data. In general, a relative form is that which holds true for the particular context in which it is encountered and used--for instance, a percentage value or possibly a correlation. Such values have no clear significance outside of the framework in which they are immediately used. There are no absolute standards of value by which we may take their measure. Relative values do not travel beyond the bounds of their immediate relevance. Recognizing this, we can also recognize relative forms of analysis, or at least the relativistic use of forms of analysis, which work for the values at hand, but which cannot be applied in a general sense or in the same way to all values encountered.
In this sense, cross-correlational analysis is clearly a species of relative data analysis and manipulation. It is bound to the dimensionless correlation coefficient, to the X sized matrix or table of all other correlational values in which it is bound, and to the relative framework of meaning the interrelationships of these values might have.
Fudge in working parametrically with nonparametric frequency values is permitted if the samples are relatively large (the size a sample needs to be is never really stated, although significance itself doesn't directly depend upon sample size) and it is randomly (at least fairly randomly) distributed. Large random samples are held to approximate the normally distributed curve--samples that are too small may represent skewed curves that are biased.
The difficulty in working with correlational matrices is that the original data samples are frequently small and not quite unbiased. This presents a dilemma, as such correlation matrices should be "misleading and confounding", but often aren't--in fact, significant distributions appear repeatedly in even quite small samples, and these distributions appear to be quite stable, at least as population estimators. Part of this can be attributed to the nature of the samples and the derived data--psycho-cultural data presumes a degree of distinctive patterns of sharing which permit significant patterns to appear in even small samples.
The data in this sense are neither unbiased nor independent--but this weakness lends strength to the inferential power of these estimators. Our original population is not by definition large and theoretically open, though ultimately it may be. It is by definitions small and finite--closed. We are interested in describing statistically a "system" of relations which is thought to be quite consistent and regular in patterning.
The other part of this dilemma can be attributed, I believe, to the inherent aspect of cross-correlational analysis as a relative technique of analysis. It is the "dimensionlessness" of correlational values which confers upon them a degree of flexibility and power as general purpose estimators which they would not otherwise have. The fact that a correlation can be obtained from virtually anything does not render the correlation intrinsically meaningless. Within the context of its use it means that correlation coefficients can and frequently do serve as stable, unbiased estimators of complex relationships between dimensions based upon comparatively small samples. Part of this strength lies in the closed and complete structure of the correlation matrix, in which every dimension is correlated with every other dimension. What unites the correlational matrix within a relative, but significant framework, is the theory and the paradigm which lead to its creation in the first place.
Data Organization
It is important that the data table is arranged (i.e. the order of the rows and columns) so as to best illustrate the design of the paradigm and the patterning of the greatest frequency in each relevant row and/or column. Sometimes this patterning is not as straightforward as merely sorting by ascending or descending values. In a table in which there is relative flatness or lack of saliency across the entire field, then organization based upon frequency values becomes irrelevant, and organization should be based upon the topical values of the dimensions.
This prearrangement of the data is important, and it does not affect the nature of the resulting correlational matrices unless the integrity of individual rows or columns is violated and values of individual cells of different rows or columns are then juxtaposed. This prearrangement simplifies the task of analyzing the resulting correlational matrices, and if done well, will render more obvious the patterning that is to be found in the values of the field.
The following example is based upon two rank orderings of 12 colors from most to least favorite by a total of 44 Chinese men and women, arranged according to the highest frequency from first to last rank order, and which yields a square (12 x 12) square frequency table of twelve colors in 12 individual rank orders. This example will be used through this discussion of correlational analysis to illustrate the use of such analysis.
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
16 |
11 |
5 |
3 |
0 |
12 |
6 |
5 |
1 |
2 |
4 |
2 |
|
TWO |
7 |
14 |
9 |
6 |
5 |
11 |
3 |
9 |
3 |
0 |
2 |
4 |
|
THREE |
3 |
7 |
18 |
6 |
4 |
13 |
9 |
7 |
2 |
1 |
3 |
1 |
|
FOUR |
2 |
7 |
8 |
12 |
3 |
9 |
8 |
4 |
5 |
4 |
3 |
3 |
|
FIVE |
2 |
7 |
2 |
9 |
12 |
5 |
10 |
5 |
6 |
4 |
3 |
4 |
|
SIX |
4 |
3 |
7 |
11 |
7 |
8 |
7 |
5 |
4 |
5 |
5 |
2 |
|
SEVEN |
2 |
6 |
4 |
6 |
11 |
1 |
6 |
7 |
5 |
9 |
4 |
3 |
|
EIGHT |
5 |
3 |
7 |
11 |
8 |
1 |
6 |
10 |
5 |
4 |
6 |
2 |
|
NINE |
7 |
4 |
4 |
4 |
9 |
3 |
4 |
4 |
15 |
9 |
3 |
2 |
|
TEN |
12 |
1 |
3 |
1 |
4 |
2 |
2 |
4 |
6 |
19 |
7 |
5 |
|
ELEVEN |
4 |
2 |
1 |
2 |
4 |
2 |
4 |
5 |
10 |
4 |
18 |
12 |
|
TWELVE |
4 |
3 |
1 |
3 |
1 |
1 |
3 |
6 |
4 |
6 |
8 |
28 |
Before moving on to a detailed consideration of correlational analysis, it is worthwhile to reconsider the original 12 color data table above in order to understand what is lost in correlational analysis and some of the things which such analysis cannot do by itself. The data was elicited with the theoretical expectation that certain non-random olor-rank frequencies would emerge which cannot be explained by chance, and which are held to be epiphenomenally indicative of an underlying psycho-cultural pattern of sharing by the informants.This distribution can be tested for significance both in terms of color and rank order position. In fact, significant differences of frequency distribution emerged not only at a gross cultural level, but at a subcultural level between subsamples (adult men, adult women, girls and boys).
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
0.24 |
0.16 |
0.07 |
0.04 |
0 |
0.18 |
0.09 |
0.07 |
0 |
0.03 |
0.1 |
0 |
|
TWO |
0.1 |
0.19 |
0.12 |
0.08 |
0.1 |
0.15 |
0.04 |
0.12 |
0 |
0 |
0 |
0.1 |
|
THREE |
0.04 |
0.09 |
0.24 |
0.08 |
0.1 |
0.18 |
0.12 |
0.09 |
0 |
0.01 |
0 |
0 |
|
FOUR |
0.03 |
0.1 |
0.12 |
0.18 |
0 |
0.13 |
0.12 |
0.06 |
0.1 |
0.06 |
0 |
0 |
|
FIVE |
0.03 |
0.1 |
0.03 |
0.13 |
0.2 |
0.07 |
0.14 |
0.07 |
0.1 |
0.06 |
0 |
0.1 |
|
SIX |
0.06 |
0.04 |
0.1 |
0.16 |
0.1 |
0.12 |
0.1 |
0.07 |
0.1 |
0.07 |
0.1 |
0 |
|
SEVEN |
0.03 |
0.09 |
0.06 |
0.09 |
0.2 |
0.02 |
0.09 |
0.11 |
0.1 |
0.14 |
0.1 |
0 |
|
EIGHT |
0.07 |
0.04 |
0.1 |
0.16 |
0.1 |
0.01 |
0.09 |
0.15 |
0.1 |
0.06 |
0.1 |
0 |
|
NINE |
0.1 |
0.06 |
0.06 |
0.06 |
0.1 |
0.04 |
0.06 |
0.06 |
0.2 |
0.13 |
0 |
0 |
|
TEN |
0.18 |
0.02 |
0.05 |
0.02 |
0.1 |
0.03 |
0.03 |
0.06 |
0.1 |
0.29 |
0.1 |
0.1 |
|
ELEVEN |
0.06 |
0.03 |
0.01 |
0.03 |
0.1 |
0.03 |
0.06 |
0.07 |
0.1 |
0.06 |
0.3 |
0.2 |
|
TWELVE |
0.06 |
0.04 |
0.01 |
0.04 |
0 |
0.01 |
0.04 |
0.09 |
0.1 |
0.09 |
0.1 |
0.4 |
The z-scores were computed for each tabular cell. The following table represents the z-scores, assuming equal probability of color choice at all levels. It is immediately apparent that black in the last place is the most significant and least likely color-rank, followed by brown in the tenth place, white in the eleventh place, violet in second place, pink in the third place, red in the first place, grey in the ninth place, red in the tenth place, purple in the third place, purple in the first place, orange in the fourth place. Significance and sign of each score was translated into the level of probability of chance occurrence upon a normal curve.
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
4.6 |
2.39 |
-0.3 |
-1.1 |
-2 |
2.84 |
0.18 |
-0.3 |
-2 |
-1.6 |
-1 |
-2 |
|
TWO |
0.39 |
3.35 |
1.24 |
0 |
0 |
2.08 |
-1.3 |
1.24 |
-1 |
-2.6 |
-2 |
-1 |
|
THREE |
-1.3 |
0.35 |
4.98 |
-0.1 |
-1 |
2.87 |
1.19 |
0.35 |
-2 |
-2.2 |
-1 |
-2 |
|
FOUR |
-1.6 |
0.59 |
1.02 |
2.78 |
-1 |
1.46 |
1.02 |
-0.7 |
0 |
-0.7 |
-1 |
-1 |
|
FIVE |
-1.6 |
0.54 |
-1.6 |
1.42 |
2.7 |
-0.3 |
1.85 |
-0.3 |
0.1 |
-0.8 |
-1 |
-1 |
|
SIX |
-0.7 |
-1.2 |
0.59 |
2.34 |
0.6 |
1.02 |
0.59 |
-0.3 |
-1 |
-0.3 |
0 |
-2 |
|
SEVEN |
-1.5 |
0.3 |
-0.6 |
0.3 |
2.6 |
-2 |
0.3 |
0.75 |
0 |
1.66 |
-1 |
-1 |
|
EIGHT |
-0.3 |
-1.2 |
0.59 |
2.34 |
1 |
-2 |
0.15 |
1.9 |
0 |
-0.7 |
0.1 |
-2 |
|
NINE |
0.59 |
-0.7 |
-0.7 |
-0.7 |
1.5 |
-1.2 |
-0.7 |
-0.7 |
4.1 |
1.46 |
-1 |
-2 |
|
TEN |
2.89 |
-2 |
-1.1 |
-2 |
-1 |
-1.6 |
-1.6 |
-0.7 |
0.2 |
6.01 |
0.7 |
0 |
|
ELEVEN |
-0.7 |
-1.6 |
-2 |
-1.6 |
-1 |
-1.6 |
-0.7 |
-0.3 |
1.9 |
-0.7 |
5.4 |
2.8 |
|
TWELVE |
-0.7 |
-1.2 |
-2 |
-1.2 |
-2 |
-2 |
-1.2 |
0.15 |
-1 |
0.15 |
1 |
9.8 |
The following table represents the cut-off values under the normal curve, adjusted for the assymmetry of sign (1 - value), representing the significant likelihoods (confidence values) of these colors being chosen at each rank order position. These scores represent the significance given frequency value given equal likelihood of color choice--red, violet and purple are highly significant (non-chance) first choice colors, and black has a .98 chance of not being chosen as a first choice color.
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
1 |
.99 |
.48 |
.14 |
.02 |
1 |
.57 |
.38 |
.2 |
.05 |
.16 |
.02 |
|
TWO |
.65 |
1 |
.89 |
.5 |
.5 |
.98 |
.1 |
.89 |
.16 |
0 |
.02 |
.16 |
|
THREE |
.1 |
.64 |
1 |
.46 |
.16 |
1 |
.88 |
.64 |
.02 |
.01 |
.16 |
.02 |
|
FOUR |
.04 |
.72 |
.85 |
.98 |
.16 |
.93 |
.85 |
.24 |
.5 |
.24 |
.16 |
.16 |
|
FIVE |
.04 |
.71 |
.05 |
.92 |
.98 |
.38 |
.97 |
.38 |
.54 |
.21 |
.16 |
.16 |
|
SIX |
.24 |
.11 |
.72 |
.99 |
.73 |
.85 |
.72 |
.38 |
.16 |
.38 |
.5 |
.02 |
|
SEVEN |
.07 |
.62 |
.27 |
.62 |
1 |
.02 |
.62 |
.75 |
.5 |
.95 |
.16 |
.16 |
|
EIGHT |
.38 |
.11 |
.72 |
.99 |
.84 |
.02 |
.56 |
.97 |
.5 |
.24 |
.54 |
.02 |
|
NINE |
.72 |
.24 |
.23 |
.24 |
.93 |
.11 |
.24 |
.24 |
1 |
.93 |
.16 |
.02 |
|
TEN |
1 |
.02 |
.14 |
.02 |
.16 |
.05 |
.05 |
.24 |
.98 |
1 |
.76 |
.5 |
|
ELEVEN |
.24 |
.05 |
.02 |
.05 |
.16 |
.05 |
.76 |
.38 |
.97 |
.24 |
1 |
.99 |
|
TWELVE |
.24 |
.11 |
.02 |
.11 |
.02 |
.02 |
.11 |
.6 |
.16 |
.56 |
.84 |
1 |
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
0 |
0.01 |
0.52 |
0.86 |
0.98 |
0 |
0.43 |
0.62 |
0.8 |
0.95 |
0.84 |
0.98 |
|
TWO |
0.35 |
0 |
0.11 |
0.5 |
0.5 |
0.02 |
0.9 |
0.11 |
0.84 |
1 |
0.98 |
0.84 |
|
THREE |
0.9 |
0.36 |
0 |
0.54 |
0.84 |
0 |
0.12 |
0.36 |
0.98 |
0.99 |
0.84 |
0.98 |
|
FOUR |
0.96 |
0.28 |
0.15 |
0.02 |
0.84 |
0.07 |
0.15 |
0.76 |
0.5 |
0.76 |
0.84 |
0.84 |
|
FIVE |
0.96 |
0.29 |
0.95 |
0.08 |
0.02 |
0.62 |
0.03 |
0.62 |
0.46 |
0.79 |
0.84 |
0.84 |
|
SIX |
0.76 |
0.89 |
0.28 |
0.01 |
0.27 |
0.15 |
0.28 |
0.62 |
0.84 |
0.62 |
0.5 |
0.98 |
|
SEVEN |
0.93 |
0.38 |
0.73 |
0.38 |
0 |
0.98 |
0.38 |
0.25 |
0.5 |
0.05 |
0.84 |
0.84 |
|
EIGHT |
0.62 |
0.89 |
0.28 |
0.01 |
0.16 |
0.98 |
0.44 |
0.03 |
0.5 |
0.76 |
0.46 |
0.98 |
|
NINE |
0.28 |
0.76 |
0.77 |
0.76 |
0.07 |
0.89 |
0.76 |
0.76 |
0 |
0.07 |
0.84 |
0.98 |
|
TEN |
0 |
0.98 |
0.86 |
0.98 |
0.84 |
0.95 |
0.95 |
0.76 |
0.02 |
0 |
0.24 |
0.5 |
|
ELEVEN |
0.76 |
0.95 |
0.98 |
0.95 |
0.84 |
0.95 |
0.24 |
0.62 |
0.03 |
0.76 |
0 |
0.01 |
|
TWELVE |
0.76 |
0.89 |
0.98 |
0.89 |
0.98 |
0.98 |
0.89 |
0.4 |
0.84 |
0.44 |
0.16 |
0 |
The following conversion of the original frequency distribution to P-values on a normal, two-tail curve highlights the patterning of the data in the form of a discrimination table. We can say that, according to this table, red, violet and purple as first choice colors are of equal probability, and blue, orange, grey, brown white and black of equal improbability--only gren, pink and yellow are of intermediate probability. While this table does mask some of the finer differences between cells, it does serve to highlight a basic underlying structural patterning in the data, one from which we might derive basic rules governing the rank order choices of colors by Chinese, and from which we might apply an entropy function by which to build a decision tree based upon the discrimination table.
|
red |
violet |
pink |
orange |
blue |
purple |
green |
yellow |
grey |
brown |
white |
black |
|
|
ONE |
1 |
1 |
0.05 |
0 |
0 |
1 |
0.89 |
0.05 |
0 |
0 |
0 |
0 |
|
TWO |
0.99 |
1 |
1 |
0.41 |
0 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
|
THREE |
0 |
0.99 |
1 |
0.33 |
0 |
1 |
1 |
0.99 |
0 |
0 |
0 |
0 |
|
FOUR |
0 |
1 |
1 |
1 |
0 |
1 |
1 |
0 |
0.03 |
0 |
0 |
0 |
|
FIVE |
0 |
1 |
0 |
1 |
1 |
0.02 |
1 |
0.02 |
0.76 |
0 |
0 |
0 |
|
SIX |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
0.03 |
0 |
0.03 |
0.03 |
0 |
|
SEVEN |
0 |
0.98 |
0 |
0.98 |
1 |
0 |
0.98 |
1 |
0.16 |
1 |
0 |
0 |
|
EIGHT |
0.03 |
0 |
1 |
1 |
1 |
0 |
0.83 |
1 |
0.03 |
0 |
0.83 |
0 |
|
NINE |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
|
TEN |
1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.93 |
1 |
1 |
0.07 |
|
ELEVEN |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.03 |
1 |
0 |
1 |
1 |
|
TWELVE |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0.83 |
0 |
0.83 |
1 |
1 |
It also appears that a larger sample from a larger universe would show a juxtapositioning of certain color-rank positions--red, violet, pink and purple are especially relevant, and in the table above purple which occupies a sixth rank position actually usually is first or second in the frequency distribution. Repeated samples and similar results of other independent tasks strongly support the contention of the predictable occurrence of a nonrandom structural patterning in terms of these color choices.
The reasons for the systematic differences or actual underlying structure of this patterning are not understood, or directly explained by the response pattern which is evident. In a sense, the frequency distribution constitutes a kind of discrimination network which is partially culturally and subculturally relative. The saliency or actual likelihood of this distribution is lost in the correlational matrix, but the pattern of sharing or percentage of difference/similarity between samples can be compared. It is possible that these two forms of analysis can be effectively and productively integrated and recombined at a subsequent level of analysis.
Cross-correlational analysis should be understood in the light of its realistic productivity of new insights into covert structures underlying data, and in terms of its basic limitations. It is but one more of an arsenal of methodological techniques now available for the analysis of data--it's real strength is when it can be used effectively in conjunction with these other methods, rather than solely as a stand alone procedure.
It is hoped that through this exposition of cross-correlational analysis, this methodology may be further developed in terms of overcoming its limits, extending the repertory of analysis available to it, and in terms of the interpretations and theoretical underpinings available to it.
Again, we can divide this question into the matter of theoretical or topical significance and of statistical or quantitative significance, on the one hand, and the question of the internal versus external significance of the matrix as a hypothetical model of reality. In brief, external questions of significance have to do with the validity of the given correlational distribution as representative of a larger hypothetical universe of actual or possible relationships. It can be said that the given sample of correlation coefficients is a subset of a larger hypothetical set.
Blanket Copyright, Hugh M. Lewis, © 2005. Use of this text governed by fair use policy--permission to make copies of this text is granted for purposes of research and non-profit instruction only.
Last Updated: 04/19/05