An R package to mark functional modules in the network analysis using data from pathwayCommons
by Sara J
Setup GitHub Actions Status: In progress Branch: None PR: None
Add methods for module size parametrization Status: In progress Branch: None PR: None
Fix cgdsr package issue Status: Complete Branch: None PR: None
Meeting with supervisors Status: Complete Branch: None PR: None
R package creation crash course Status: In progress Branch: None PR: None
To fix the cgdsr package issue, a (temporary) solution I implemented in the netboxr vignette was removing any cgdsr-related code and uploading a table containing the identical cgdsr data (“vignette_data.txt”) that I stored in the “inst” directory.
The code I removed (with the exception of the genes <- c("EGFR", "TP53", "ACTB", "GAPDH")
line) can be found below:
library(cgdsr)
mycgds <- CGDS("http://www.cbioportal.org/")
# Find available studies, caselists, and geneticProfiles
studies <- getCancerStudies(mycgds)
caselists <- getCaseLists(mycgds,'gbm_tcga_pub')
geneticProfiles <- getGeneticProfiles(mycgds,'gbm_tcga_pub')
genes <- c("EGFR", "TP53", "ACTB", "GAPDH")
results <- sapply(genes, function(gene) {
geneticProfiles <- c("gbm_tcga_pub_cna_consensus", "gbm_tcga_pub_mutations")
caseList <- "gbm_tcga_pub_cnaseq"
dat <- getProfileData(mycgds, genes=gene, geneticProfiles=geneticProfiles, caseList=caseList)
head(dat) [...] }
The added code can be found below:
results <- sapply(genes, function(gene) {
dat <- read.table(system.file("vignette_data.txt", package = "netboxr"),
header=TRUE, sep="\t",
stringsAsFactors=FALSE) [...] }
In addition to this, I added the name of the file to .Rbuildignore, and removed ‘cgdsr’ from the DESCRIPTION file.
To add module parametrization features to netboxr, I made updates to the geneConnector function as follows:
geneConnector <- function(geneList, networkGraph, directed = FALSE, pValueAdj = "BH", pValueCutoff = 0.05,
communityMethod = "ebc", resolutionParam = 1, weightsInput = NULL, keepIsolatedNodes = FALSE) { [...]
if (communityMethod == "louvain") {
message(sprintf("Detecting modules using \"Louvain\" method\n"))
community <- cluster_louvain(graphOutput, weights = weightsInput, resolution = resolutionParam)
moduleMembership <- membership(community)
}
if (communityMethod == "leiden") {
message(sprintf("Detecting modules using \"Leiden\" method\n"))
community <- cluster_leiden(graphOutput, weights = weightsInput, resolution = resolutionParam)
moduleMembership <- membership(community)
}
[...] }
I’ve also made the following additions to the roxygen documentation in that same function:
#' @param communityMethod string for community detection method c("ebc","lec", "leiden", "louvain")
as described in the details section (default = "ebc") [...]
#' @param resolutionParam numeric value that determines community size, where higher resolutions
leads to more smaller communities (default = 1)
#' @param weightsInput numeric vector for edge weights (default = NULL)
To test whether the function still works, and whether changing the parameters results in the expected outcome (i.e. different community sizes), I reran the example in the vignette using the new community methods and different resolutionParam values as follows:
results <- geneConnector(geneList = geneList, networkGraph = graphReduced, directed = FALSE, pValueAdj = "BH",
pValueCutoff = threshold, communityMethod = "leiden", resolutionParam = 2, keepIsolatedNodes = FALSE)
The graphs I obtained can be found below:
Louvain, resolution = 1
Louvain, resolution = 2
Louvain, resolution = 3
Louvain, resolution = 5
Louvain, resolution = 10
Louvain, resolution = 20
Louvain, resolution = 100
Louvain, resolution = 500
Using the following R code:
table(results$moduleMembership$membership)
I was able to observe that the number of communities does indeed change as the resolution increases (even when it may not be apparent in the graphs), indicating that the parameter is being implemented as expected. The tables for communities (top row) and number of members (bottom row) of the networks generated using resolution values 1, 2, 3, and 5 can be found in the figures below. The maximum number of communities for this network appears to be 72.
Louvain, resolution = 1
Louvain, resolution = 2
Louvain, resolution = 3
Louvain, resolution = 5
The Leiden method consistently outputs a network with 72 communities regardless of which resolution parameter is used in the geneConnector function. In line with this, I’m still trying to figure out what may be going wrong. I will most likely try to use the cgsdr data to test this clustering method further.
I have not tested whether the “weights” parameter works as expected, but will do so at a later time point. Lastly, it is important to note that the previously described changes in the netboxr code are currently implemented only in my forked repository, and will only be merged once the GitHub Actions bugs are resolved.
tags: gsoc