GSEABoT (Gene Set Enrichment Analysis Based on Taxonomy) is a web-based tool that performs gene set enrichment analysis (GSEA)[1,2] of tissue or cell groups defined by 'SHOGoiN Cell Taxonomy'. Given a query set consisting of 'species', 'gene expression dataset', 'gene assignment resource', 'tissue/cell groups', and 'functional gene set', GSEABoT starts differentially expressed gene (DEG) analysis of the input tissue/cell groups and then perfroms GSEA in order to show statistical significance of the input functional gene set for the input tissue/cell groups according to the query set. Although there exist several enrichment analysis tools, GSEABoT is the only one that is based on 'SHOGoiN Cell ID', very well-defined tissue/cell grouping. The following is a usage example of GSEABoT.
STEP 0: Select species
Before starting parameter setting, species of interest must be selected from the top tabs. In the current system, H. sapiens and M. musculus are available.
STEP 1: Select gene expression dataset
GSEABoT can deal with 2.919 Affymetrix microarray data and 1,832 single-cell RNA-seq data for H. sapiens as well as 3,596 single-cell RNA-seq data for M. musculus. The following DEG analysis and GSEA will be performed based on the gene expression dataset selected from the pull-down list. If you check 'Transcription factor genes' radio button, the selected dataset is limited to transcription associated genes, which are selected according to "GO:006351" and its offsprings of GO terms.
STEP 2: Selet gene assignment resource
In this step, user selects which gene assignment resource defines genes of the gene expression dataset selected in STEP 1. When "Ensembl" is selected, gene IDs will be represented by Ensembl Gene ID. In the current system, "Ensembl" is available for "scRNA-seq" and "NCBI Gene" for "microarray".
STEP 3: Select tissue/cell groups
Two tissue or cell groups of interest are selected from the pull-down lists. Which tissues/cells can be available is determined according to the gene expression dataset selected in STEP 1. It should be noted that the tissue/cell grouping is based on SHOGoiN Cell IDs. Users can upload their own gene expression data optionally. Please note that the input expression data must be a tab- or space-delimited table (rows: genes, columns: samples). In addition, each value must be raw read counts.
STEP 4: Upload your functional gene set
Functional gene sets of interest are uploaded as gene matrix transposed (GMT) file format, which is a tab deliminated file format. In the GMT file format, each row indicates a gene set, and the first column are gene set names, the second column are short descriptions of the corresponding gene set, and the third and the following columns are gene IDs. For each row, the different number of genes is allowed. Input gene ID types are identified by checking a radio button. When the input file is not uploaded, GSEA will be skipped (i.e., only DEG analysis will be performed).
STEP 5: Click "Submit"
After filling the input form through all STEPs, click the "Submit" button. GSEABoT will start DEG analysis and GSEA according to user's input.
- A. Subramanian et al., "Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles", PNAS, 10(43), 15545--15550, 2005.
- S.-Y. Kim and D.J. Volsky, "PAGE: Parametric Analysis of Gene Set Enrichment", BMC Bioinformatics, 6:144, 2005
- M. Ashburner et al., "Gene Ontology: tool for the unification of biology", Nature Genetics,25(1), 25--29. 2000