Title: | Correlational Class Analysis |
---|---|
Description: | Perform a correlational class analysis of the data, resulting in a partition of the data into separate modules. |
Authors: | Andrei Boutyline |
Maintainer: | Andrei Boutyline <[email protected]> |
License: | GPL-2 |
Version: | 0.2.1 |
Built: | 2024-11-22 04:21:12 UTC |
Source: | https://github.com/cran/corclass |
This package implements the Correlational Class Analysis methodology described by Boutyline (under review). The correlational class analysis of a survey dataset produces a partition of the population into separate modules. This is done in four steps:
Create a matrix G of absolute row correlations. This is the network adjacency matrix.
Set statistically insignificant correlations to 0 to reduce noise.
Use igraph's leading.eigenvector.community to partition this network into modules.
Return an object describing the resulting class assignments (as well as the separate data frames describing the individual modules.)
CCA substantially improves the accuracy of the Relational Class Analysis (RCA) algorithm proposed by Goldberg (2011). See Boutyline (under review) for details.
The main function is cca
. plot.cca
plots the modules produced by cca
. Sample data can be accessed via data(cca.example).
Written and maintained by Andrei Boutyline, [email protected].
Boutyline, Andrei. 2017. "Improving the Measurement of Shared Cultural Schemas with Correlational Class Analysis: Theory and Method." Sociological Science 4:353-93. https://www.sociologicalscience.com/articles-v4-15-353/
This package makes heavy use of igraph
.
The CCA algorithm is an improvement of RCA https://cran.r-project.org/package=RCA
data(cca.example) res1 <- cca(cca.example) plot(res1, 1)
data(cca.example) res1 <- cca(cca.example) plot(res1, 1)
Perform a correlational class analysis of the data, resulting in a partition of the data into separate modules. This consists of four steps:
Create a matrix G of absolute row correlations. This is the network adjacency matrix.
Set statistically insignificant correlations to 0 to reduce noise.
Use igraph's leading.eigenvector.community to partition this network into modules.
Return an object describing the resulting class assignments (as well as the separate data frames describing the individual modules.)
cca(dtf, filter.significance = TRUE, filter.value = 0.01, zero.action = c("drop", "ownclass"), verbose = TRUE)
cca(dtf, filter.significance = TRUE, filter.value = 0.01, zero.action = c("drop", "ownclass"), verbose = TRUE)
dtf |
The data frame containing the variables of interest. |
filter.significance |
Significance filtering sets "insignificant" ties to 0 to decrease noise and increase stability. Simulation results show that this greatly increases accuracy in many settings. Set filter.significance = FALSE to disable this. |
filter.value |
Minimum significance cutoff. Absolute row correlations below this value will be set to 0. |
zero.action |
What to do with 0-variance rows before partitioning the graph. If zero.action is "drop", CCA drop rows with 0 variance from the analyses (default). If zero.action is "ownclass", the correlations between 0-variance rows and all other rows is set to 0, and the correlations between all pairs of 0-var rows are set to 1. This effectively creates a "zero class". |
verbose |
Whether to print details of what CCA is doing to the screen. |
membership |
The class assignments produced by CCA. |
cormat |
The row correlation matrix that was partitioned by CCA. It has a "dtf" attribute which holds the dataframe. Note that, if 0-variance were dropped, they will be excluded from the dataframe as well as the correlation matrix. The "zeros" attribute which holds the indexes of the dropped rows. |
modules |
For convenience, the dataframe is separated into the modules found by the algorithm. A separate dataframe for each module i can be found in modules[[i]]$dtf. The matrix of column correlations are in modules[[i]]$cormat. modules[[i]]$degenerate indicates whether this module contains undefined. Note that these modules can be plotted via the S3 plot method. See example below. |
Andrei Boutyline, [email protected].
Boutyline, Andrei. 2017. "Improving the Measurement of Shared Cultural Schemas with Correlational Class Analysis: Theory and Method." Sociological Science 4:353-93. https://www.sociologicalscience.com/articles-v4-15-353/
data(cca.example) res1 <- cca(cca.example) # with igraph 0.7, this should find 3 classes of sizes 218 391 144. plot(res1, 1) # plot them plot(res1, 2) plot(res1, 3) print (round(res1$modules[[1]]$cormat, 2)) # examine the correlation matrix for the 1st module print (summary(res1$modules[[1]]$dtf)) # look at its variable ranges plot(res1, 1, bw = TRUE) # Plot it again in a more journal-friendly format. # now let's try setting the filter value too high res2 <- cca(cca.example, filter.value = 0.001) # With igraph 0.7, the above now finds 17 classes # of sizes 216 1 1 371 1 1 1 1 1 1 1 1 11 141 1 1 2 # The small isolate classes can either be dropped manually, or by increasing filter.value
data(cca.example) res1 <- cca(cca.example) # with igraph 0.7, this should find 3 classes of sizes 218 391 144. plot(res1, 1) # plot them plot(res1, 2) plot(res1, 3) print (round(res1$modules[[1]]$cormat, 2)) # examine the correlation matrix for the 1st module print (summary(res1$modules[[1]]$dtf)) # look at its variable ranges plot(res1, 1, bw = TRUE) # Plot it again in a more journal-friendly format. # now let's try setting the filter value too high res2 <- cca(cca.example, filter.value = 0.001) # With igraph 0.7, the above now finds 17 classes # of sizes 216 1 1 371 1 1 1 1 1 1 1 1 11 141 1 1 2 # The small isolate classes can either be dropped manually, or by increasing filter.value
A randomly generated sample dataset for correlational class analysis, created using the approach described in Boutyline (2017). rownames(cca.example) contain the true schematic class membership for each row. Every row belonging to one schematic class was created from noisy linear transformations of the same vector.
data(cca.example)
data(cca.example)
The format is: num [1:754, 1:10] 4 7 4 7 4 4 7 3 9 8 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:754] "1" "1" "1" "1" ... ..$ : NULL
Boutyline, Andrei. 2017. "Improving the Measurement of Shared Cultural Schemas with Correlational Class Analysis: Theory and Method." Sociological Science 4:353-93. https://www.sociologicalscience.com/articles-v4-15-353/
data(cca.example) res1 <- cca(cca.example)
data(cca.example) res1 <- cca(cca.example)
Plot a CCA-produced module as a network diagram. The network nodes are survey variables (columns), and the ties are their correlations. Red (or dashed) ties represent negative correlations. This is a convenience function wrapping igraph's graphing functionality. Writing to a file is done via the Cairo package.
## S3 method for class 'cca' plot(x, module.index, cutoff = 0.05, LAYOUT = igraph::layout.kamada.kawai, drop.neg.ties.for.layout = TRUE, bw = FALSE, main = NULL, file = NULL, ...)
## S3 method for class 'cca' plot(x, module.index, cutoff = 0.05, LAYOUT = igraph::layout.kamada.kawai, drop.neg.ties.for.layout = TRUE, bw = FALSE, main = NULL, file = NULL, ...)
x |
The cca object returned by |
module.index |
Index of module to plot, between 1 and length(x$modules). |
cutoff |
Minimum absolute column correlation to plot. |
LAYOUT |
If this is a function, it is assumed to be one of the layout routines from igraph (or something that returns data in the same format). Otherwise, it is assumed to be the layout returned by such a function. |
drop.neg.ties.for.layout |
Whether to drop negative ties for the purpose of layout. This may be necessary because some layout algorithms do not work if negative ties are present. Note that the negative ties are only dropped for the purposes of layout. They will still included in the actual plot. |
bw |
Whether to print in color for screen viewing (FALSE), or in b&w with dashed lines for negative ties for a journal manuscript (TRUE). |
main |
Caption at the top of the graph. If NULL, the module number is used as the caption. |
file |
If a filename is provided, the graph is saved as a pdf with that filename. Note that this requires the Cairo package. |
... |
Unused. |
If the LAYOUT paramter is a layout function, then the return value is the static layout generated by this function (this allows the same exact layout to be reproduced in the future–see example below). Otherwise, it is the same static layout that was passed to plot.cca.
Andrei Boutyline, [email protected]
data(cca.example) res1 <- cca(cca.example) # with igraph 0.7, this should find 3 classes of sizes 218 391 144. plot(res1, 1) # plot the first module plot(res1, 2) # plot the second module plot(res1, 3) # plot the third module plot(res1, 1, bw = TRUE) # check out first module in black and white plot(res1, 1, LAYOUT = layout.fruchterman.reingold) # try a different layout algorithm # example of saving a fixed layout layout1 <- plot(res1, 1) # try out a layout ... layout1 <- plot(res1, 1) # ... try again layout1 <- plot(res1, 1) # ... until one looks good # Now plot the result with the chosen layout. To save image to disk, # replace NULL below with file name (e.g., file = "module1.pdf") plot(res1, 1, LAYOUT = layout1, file = NULL)
data(cca.example) res1 <- cca(cca.example) # with igraph 0.7, this should find 3 classes of sizes 218 391 144. plot(res1, 1) # plot the first module plot(res1, 2) # plot the second module plot(res1, 3) # plot the third module plot(res1, 1, bw = TRUE) # check out first module in black and white plot(res1, 1, LAYOUT = layout.fruchterman.reingold) # try a different layout algorithm # example of saving a fixed layout layout1 <- plot(res1, 1) # try out a layout ... layout1 <- plot(res1, 1) # ... try again layout1 <- plot(res1, 1) # ... until one looks good # Now plot the result with the chosen layout. To save image to disk, # replace NULL below with file name (e.g., file = "module1.pdf") plot(res1, 1, LAYOUT = layout1, file = NULL)
Prints a concise description of CCA results, including module membership counts. Reports if any of the modules are degenerate.
## S3 method for class 'cca' print(x, ...)
## S3 method for class 'cca' print(x, ...)
x |
The cca object returned by |
... |
Unused. |
Andrei Boutyline, [email protected]
data(cca.example) res1 <- cca(cca.example) print(res1)
data(cca.example) res1 <- cca(cca.example) print(res1)