You must be logged in to analyze datasets. Please use the login form in the top-right.
Loading datasets...
Choose dataset
Analysis steps available
QC - filter by mitochondrial content
Identify highly-variable genes
Principal Component Analysis (PCA)
tSNE / UMAP
Clustering (Louvain)
Find marker genes
Compare genes / clusters
About
The workbench for single cell RNAseq (scRNAseq) is designed to allow
biologists meaningful access to single cell data, even with limited informatics training.
The workbench begins by selecting a dataset for analysis, and then offers analysis tools
following several standard pre-processing steps.
Start by choosing a dataset on the left.
Imported analysis selected
You have a selected an analysis bundled with the dataset itself, usually created by the dataset
author outside of this workbench. You can use the workbench to perform actions below
provided by the analyses the authors have uploaded, but all other analysis steps will be disabled
until you create an entirely new analysis.
Compare genes / clusters
The method dropdown menu lists the available statistical tests for your comparison. The
“t-test overestimated variance” option may help in situations where clusters contain few
cells and variance is difficult to estimate directly, otherwise for robust clusters
choose “t-test” (Assumes normally distributed data). The “Wilcoxon-Rank-Sum” option is
a non-parametric test and may be helpful when a dataset includes a few genes with very
high expression (outliers) or data distribution is not known. Multiple testing correction
can be performed either by Benjamini-Hochberg or Bonferroni methods, where Bonferroni is
more conservative.
Name
log2FC
P-value
FDR
No data to display
Name
log2FC
P-value
FDR
No data to display
Find marker genes
This option will show you the marker genes within each group of cells as defined by the clustering method used.
You can adjust the number of genes you would like to see for each cluster by adjusting the N.
Please either rerun louvain clustering with labels (in previous step) or click on the "Compare genes / clusters" toggle to proceed.
Top ranked genes per cluster (click to select genes of interest)
Marker gene visualization
Select desired marker genes in the table above and/or type gene symbols (separated by commas) in the field below to visualize
Unique marker genes selected in table: 0
Unique marker genes manually entered: 0
Total unique genes selected: 0
Labeled tSNE
Enter a gene of interest to see its tSNE colored both by expression and cluster / cell type.
Unable to load this image. Perhaps that gene is not found in this dataset?
Clustering (Louvain)
Please click on the "Find marker genes" toggle to proceed.
Group
Num Cells
Markers
New label
The Louvain clustering is used to find the most likely groups of associated cells within a network.
Here they are color-coded. The number of neighbors will have an effect on the smallest possible
size for a cluster. If you are interested in groups of cells that are all larger than 20 cells,
for example (based on the gene coloring in the initial PCA) – then you can try 6, 10 or 15 neighbors,
for example. However, if this is a smaller dataset of regular RNA-seq, for example, with only
biological triplicates, starting with two neighbors makes more sense – because the smallest ‘natural’
group should be 3 replicates. Alternatively, if some of the populations in a single cell dataset
are very small, again, 3 neighbors could be a useful approach. However, the smaller the number of
neighbors, the larger the number of clusters.
The resolution determines how granular the clustering will be. It is set to 1.3 by default. To decrease
resolution you can drop it to 1, for example. Or increase the number for higher resolution.
tSNE / UMAP
Couldn't find this gene in the dataset:
Please click on the "Clustering (Louvain)" toggle to proceed.
This non-linear dimensional reduction visualization tool is used to visually cells that are similar to
each other. The number of principal components to include depends on the result in the PCA step, as listed
above. You can vary the number of principal components used and view how it changes the data display. The
default recommendation is to look for the point in the PCA curve in which additional components result in
minimal added variation.
Principal Component Analysis (PCA)
Couldn't find this gene in the dataset:
Please click on the "tSNE / UMAP" toggle to proceed.
The principal component analysis indicates the groups of genes that have the largest
contribution to the variability in gene expression in the dataset. For example, PC1 with have
the largest effect in dividing the dataset into subtypes of cells or samples. Each principal
component is composed of groups of “related” genes. Visualizing your principal component
graph and knowing which genes contribute to it (listed in the table) is important for the
next step. The number of principal components included will affect your tSNE (t-distributed
stochastic neighbor embedding) plot.
Identify highly-variable genes
Please click "Save these genes", then click on the "Principal Component Analysis (PCA)" toggle to proceed.
We are next going to start to use dimensionality reduction methods to look for structure
within the data, but before we do that we want to filter the large gene list down to those
genes that are more likely to represent the biologically important variability between each
of the cells. There are several parameters that can be adjusted here to set the sensitivity
versus stringency of what genes are included and you may find that trying different
parameters help you identify new features of the dataset. As a brief description, the
x-axis represents the average expression of genes across the dataset. The y-axis is a
measure called dispersion, which indicates the variance of that gene across the dataset.
The workbench will limit your maximum number of highly variable genes to 2,000. By
increasing the Min mean you increase the minimal expression value of genes that may be
considered as highly variable. We suggest that you change the parameters and observe the
plot to further guide your selection parameters. At the completion of this step press on
‘save these genes’.
QC by mitochondrial content
Filtered shape:
genes x
obs
No mitochondrial genes with this prefix were found. This could be real, or it could be just
because this prefix is case-sensitive. Common options are mt-, Mt- or MT-. (This should
be handled for you automatically in a later release.)
Please click "Save these genes", then click on the "Identify highly-variable genes" toggle to proceed.
Press on the plot button to see plots of (a) the number of genes in each cell; (b) number of read counts
per cell; and (c) percent mitochondrial content. The general recommendation is to maintain
percent mitochondrial content below 0.05% to focus on living cells. Based on the data in these
plots you may wish to change some of the criteria in your previous step. Press ‘save these
genes’ before moving to the next step.
Dataset:
Initial shape:
Filtered shape:
Apply filters as desired.
Please click on the "QC - filter by mitochondrial content" toggle to proceed.