Software - MonoClaD

 

 

Following is a program which performs semi supervised class discovery as described in Steinfeld et al’s paper.

 

Usage:

  1. Verify Java JRE 6 is installed (If it is not, install it from here).
  2. Download the software from here.
  3. Unzip the file.
  4. Update the parameters file.
  5. Run the software (MonoClaD.bat).

 

 

Parameters:

 

matFileName – The name of the gene expression matrix. The file should be a tab delimited text file. Each line is a gene and each row is a sample. The first line should contain the sample names and the first column should contain the gene names. The upper left entry should contain the name of the dataset and will be ignored. In order to use the output of the software as input to GOrilla for GO enrichment analysis gene names should be represented as gene symbols, RefSeq, UniGene, Uniprot or Ensembl. Missing values should be represented by #. An example file can be found here.

 

qttveFileName – The name of the quantitative phenotype vector file name. The file should contain one value for each sample and the order should be consistent with the order of the columns in the expression data matrix. The first column should be the name of the quantitative phenotype. Missing values should be represented by #. An example file can be found here.

 

setFileName (Optional) – The name of the set file name. Each row in the file should contain the name of a gene in the set selected to be used for enrichment. Gene names should be compatible with the names in the gene expression matrix. The first row should be the name of the set. If this parameter is not in the file, overabundance will be used instead of set enrichment as a score. An example file can be found here.

 

Other parameters and their default are explained inside the parameters file.

 

 

Output:

After running the application, the following files are generated by the software:

 

Out.txt – The results of the analysis. The thresholds which determine the best partition are shown as well as the enrichment p-value or overabundance score.

 

DiffExps.txt – The list of all genes, sorted by their level of differential expression with respect to the above partition, along with their TNoM scores, TNoM p-values, t-test p-values and directions.

 

GOrilla.txt – A list of all genes, sorted by their level of differential expression. If the gene names in the expression matrix were RefSeq or Uniprot, this list can be copied “as is” to GOrilla, to view GO enrichment.

 

Log.txt – A log file of the application. Used for debugging.