NetBoxR Updates GSoC 2022

An R package to mark functional modules in the network analysis using data from pathwayCommons

View the Project on GitHub sajo25/netboxr

13 June 2022

Week One | GitHub Actions and module parametrization

by Sara J

Tasks

  1. Setup GitHub Actions Status: In progress Branch: None PR: None

  2. Add methods for module size parametrization Status: In progress Branch: None PR: None

  3. Fix cgdsr package issue Status: Complete Branch: None PR: None

  4. Meeting with supervisors Status: Complete Branch: None PR: None

  5. R package creation crash course Status: In progress Branch: None PR: None

Progress report

To fix the cgdsr package issue, a (temporary) solution I implemented in the netboxr vignette was removing any cgdsr-related code and uploading a table containing the identical cgdsr data (“vignette_data.txt”) that I stored in the “inst” directory.

The code I removed (with the exception of the genes <- c("EGFR", "TP53", "ACTB", "GAPDH") line) can be found below:

library(cgdsr)
mycgds <- CGDS("http://www.cbioportal.org/")
# Find available studies, caselists, and geneticProfiles
studies <- getCancerStudies(mycgds)
caselists <- getCaseLists(mycgds,'gbm_tcga_pub')
geneticProfiles <- getGeneticProfiles(mycgds,'gbm_tcga_pub')
genes <- c("EGFR", "TP53", "ACTB", "GAPDH")
results <- sapply(genes, function(gene) {
  geneticProfiles <- c("gbm_tcga_pub_cna_consensus", "gbm_tcga_pub_mutations")
  caseList <- "gbm_tcga_pub_cnaseq"
  dat <- getProfileData(mycgds, genes=gene, geneticProfiles=geneticProfiles, caseList=caseList)
  head(dat) [...] }

The added code can be found below:

results <- sapply(genes, function(gene) {
dat <- read.table(system.file("vignette_data.txt", package = "netboxr"),
header=TRUE, sep="\t",
stringsAsFactors=FALSE) [...] }

In addition to this, I added the name of the file to .Rbuildignore, and removed ‘cgdsr’ from the DESCRIPTION file.

To add module parametrization features to netboxr, I made updates to the geneConnector function as follows:

geneConnector <- function(geneList, networkGraph, directed = FALSE, pValueAdj = "BH", pValueCutoff = 0.05, 
communityMethod = "ebc", resolutionParam = 1, weightsInput = NULL, keepIsolatedNodes = FALSE) { [...]

if (communityMethod == "louvain") { 
message(sprintf("Detecting modules using \"Louvain\" method\n"))
community <- cluster_louvain(graphOutput, weights = weightsInput, resolution = resolutionParam)
moduleMembership <- membership(community)
}

if (communityMethod == "leiden") {
message(sprintf("Detecting modules using \"Leiden\" method\n"))
community <- cluster_leiden(graphOutput, weights = weightsInput, resolution = resolutionParam)
moduleMembership <- membership(community)
} 
[...] }

I’ve also made the following additions to the roxygen documentation in that same function:

#' @param communityMethod string for community detection method c("ebc","lec", "leiden", "louvain") 
as described in the details section (default = "ebc") [...]
#' @param resolutionParam numeric value that determines community size, where higher resolutions 
leads to more smaller communities (default = 1)
#' @param weightsInput numeric vector for edge weights (default = NULL)

To test whether the function still works, and whether changing the parameters results in the expected outcome (i.e. different community sizes), I reran the example in the vignette using the new community methods and different resolutionParam values as follows:

results <- geneConnector(geneList = geneList, networkGraph = graphReduced, directed = FALSE,  pValueAdj = "BH",
pValueCutoff = threshold,  communityMethod = "leiden", resolutionParam = 2, keepIsolatedNodes = FALSE)

The graphs I obtained can be found below:

Louvain, resolution = 1

Louvain, resolution = 2

Louvain, resolution = 3

Louvain, resolution = 5

Louvain, resolution = 10

Louvain, resolution = 20

Louvain, resolution = 100

Louvain, resolution = 500

Using the following R code:

table(results$moduleMembership$membership)

I was able to observe that the number of communities does indeed change as the resolution increases (even when it may not be apparent in the graphs), indicating that the parameter is being implemented as expected. The tables for communities (top row) and number of members (bottom row) of the networks generated using resolution values 1, 2, 3, and 5 can be found in the figures below. The maximum number of communities for this network appears to be 72.

Louvain, resolution = 1

Louvain, resolution = 2

Louvain, resolution = 3

Louvain, resolution = 5

The Leiden method consistently outputs a network with 72 communities regardless of which resolution parameter is used in the geneConnector function. In line with this, I’m still trying to figure out what may be going wrong. I will most likely try to use the cgsdr data to test this clustering method further.

I have not tested whether the “weights” parameter works as expected, but will do so at a later time point. Lastly, it is important to note that the previously described changes in the netboxr code are currently implemented only in my forked repository, and will only be merged once the GitHub Actions bugs are resolved.

tags: gsoc