Chapter Six
MAPPING AND MODELING
in Cross-Correlational Analysis
Multidimensional scaling was found to be inadequate to the task of representation of complex correlational matrices--the representation of the topical domains as if they were single points in a complex space was found to be misleading about the true nature of the patterning of correlation of points--topical domains seemed to exist in overlapping areas of space--it is the dimensions and shape of these areas in relation to one another that models need to be aimed at depicting, and not the one-point per variable optima upon which MDS is based. The problem of employing MDS to correlational matrices being that x and y coordinates for any given correlation are not set, that the coordinates of different correlations are not necessarily the same or equivalent, and that the correlations themselves do not represent straight-forward distance between points in real space, but rather these distance measures vary with magnitude with each correlation in a hypothetical dimensional space. Thus, the representation of a correlation matrix occurs in a complex space in which a topical domain is represented not by a single point, but by a combination of alternative points that are arranged in a nonlinear manner in abstract space. Such representation approaches extreme complexity, and there is no single solution to the problem of its representation--there are alternate solutions available.
The relationship of black to red, for instance, does not necessarily share the same space as do the relationship of red to pink, or black to white, or white to pink. The relationhip of black to pink or black to red may condition the relationship of pink to red, but not necessarily. Thus there exists no definite points for Black, Red, or Pink which satisfies all the mutual distance requires of each of these colors to the other, as would be represented by a simple triangle with each of the colors represented by a vertices of the shape. What appears to exist is a range of possible alternate points for each color, the relationships between each set generally satisfying the entire structure. This renders simple, straight-forward modeling of correlational matrices in space complex and difficult to achieve--whatever models that are forthcoming are only approximations of the actual structure of interrelationship.
The difficulty of computing sets of x-y coordinants that are consistent for all the correlations is in part a question of whether a linear, non-linear or nonfunctional relationship can be presumed between different points. It is entirely possible that all three kinds of relationship may occur within the same correlational structure.
There is a close relationship between cluster analysis and the representation of correlational matrices. A modified form of cluster-correlational analysis is considered here for the possibilities of representation of alternative, hypothetical correlational structures. Because all correlational structures exist hypothetically in the same space, the possibility is offered of representing multiple structures in the same space simultaneously, by which their overlapping structures can be compared.
Several alternatives exist for the graphical representation of correlational matrices. We may first graph the point values of the matrix as representations of the degree of relationship between the different dimensions. We may alternatively graph the different dimensions as if they are point or areal values in space. Alternatively we may create different kinds of frequency histograms or bar charts by which to depict different arrangements of the data. Graphing depends upon determining the appropriate X-Y coordinates.
The primary difficulty in the graphical representation of cross-correlational structures is in the difficulty of determining or precisely representing the topical domain as if a single point, or even as a set of related points of set sizes. The form that the data takes in a correlation matrix is inherently relational such that the point values are not directly available, and hence are alterable, given the relationship within which it is measured. At some point in the analysis, some kind of summary statistic must be somewhat arbitrarily introduced, as almost a blind leap of faith, which can be used to represent the particular topical domain which summarizes the data.
The second difficulty is related to the first and involves the extreme complexity of locating either the x or y coordinates which would localize such a hypothetical point in graphic space. Such a solution might be mathematically impossible. We have the net distance value, but we cannot calculate either an x or a y value that would simultaneously satisfy all the different correlations in which that value would be involved. It is for this reason that it is better to think of correlations as being better represented by an bounded area within which a given point has a certain likelihood of occurring, and structures of correlation consist of a set of such spaces, differently shaped, which are overlapping.
Representation will be considered at several levels of analysis. The first level is the mapping of the landscape of the original frequency diagram.
The second level is the mapping of any two sets of subsamples of converted values along the ordinate and abscissa such that proximity to the origin represents strength of correlation.
The final level consists of attempting to plot and orient the correlations of an entire matrix in relative space and then to determine optimum x-y functional equations for the original relationships.
The fourth level consists of mapping cross-correlational matrices in the same space.
Alternative forms of Cluster Analysis
Cluster analysis consists of estimating all possible combinations of values for their net value, and then picking the highest possible values for the lowest possible number of values.
A more direct form of cluster analysis of correlational matrices that can be performed at any level or generation, is to group highly correlated values and then to discover the highly correlated relationships between these groupings. This allows a partial representation of the clustering of a matrix in two dimensional space, a question that is discussed more thoroughly in the next chapter.
A manual form of cluster analysis involves the construction of a table based upon the highest absolute correlational values--correlational values. In the color matrix, the correlation between purple and violet and between purple and pink is above .7. The correlation between violet and purple is ?. Because there are three interrelated colors, the correlation between violet and pink is sought to complete the combination--it is only .38.
This suggests, that because correlations measure relative proximity, purple is situated somewhere between violet and pink.
violet purple pink
The next two highest combinations of high correlation is between green and orange and between violet and brown:
green orange
violet brown
Because violet is related to the first cluster, the relationship between brown and purple (.599) and brown and pink is sought (.443). Because the relationship of pink, purple and violet to brown is all negative, the distance is multiplied by two, the relationship is expressed orthogonally across a central axis:
(violet purple pink)
brown
The next three relationships are:
(violet white), (purple grey), (red orange)
Relationships of white are sought with our main cluster: purple-white (.477) and pink-white (.471) and then brown-white (.131); and then between pink-grey (-.44), violet-grey (-.47), brown-grey (.373) and white-grey (.307); and, because orange is shared with the second cluster above, with green-red (-.46) resulting in one main cluster, a second main cluster, and one sub-clusters (violet-purple-pink):
violet-purple-pink
brown
grey white
green orange
red
The relationship between these two main clusters has not yet been determined. The next two clusters are green-brown (-.47) and green-black (-.45). Then we look for the relationship between brown and black (.065), leaving the configuration:
green
brown black
Because green and brown are of both main clusters, we finally have a linkage between the two which we can exploit to finally resolve the entire configuration. We look for the relationship between green and all the other colors of the first main cluster; then the relationships between brown and all the other points of the second main cluster; and then black and all the remaining points of both clusters. We come out with green-grey (-.28), green-white (-.38), green-purple (.41), green-violet (.202), green-pink (.428) and with brown-red (.251) and brown-orange (-.397). Finally we have black-grey (.019), black-white (.499), black-purple (-.43), black-violet (-.31), black-pink (-.49), black-red (-.14) and black-orange (-.41).
violet-purple-pink
orange green
brown red
black
grey white
We are missing one color, blue. The next color of the rank order is red-blue (-.49). So we look for all the other blues: blue-violet (-.13), purple-blue (-.37), pink-blue (-.12), orange-blue (.386), green-blue (.339), brown-blue (.14), grey-blue (
.398) white-blue (-.25) and black-blue (-.4). it important to find all the remaining color-combinations between the two main clusters: orange-white, orange-pink, orange-purple, orange-violet,
violet-purple-pink
orange green
grey blue
brown red
black
grey white
Only one color remains, and that is yellow. The next color relation is yellow-brown (-.43). Yellow is then plotted with all the other colors: Yellow-red(-.17), violet-yellow (.307), purple-yellow (.016), pink-yellow (.37), orange-yellow (.257), green-yellow (-.02), yellow-blue (.154), yellow-grey (-.39), yellow-white (-.14), yellow-black (-.06).
violet-purple-pink
yellow orange green
blue
grey
brown red
black
white
Blanket Copyright, Hugh M. Lewis, © 2005. Use of this text governed by fair use policy--permission to make copies of this text is granted for purposes of research and non-profit instruction only.
Last Updated: 04/19/05