FAQ

How can I use Single Cell Explorer in Core Genomic lab?

After you install Single cell explorer, MOngoDB, and python notebook, and pipeline such as Cell ranger, you have a operation streameline system for your lab.You can run your prefered pipelines to generate counts data first. Or you can even use our example codes to call Cell Ranger pipelines to process FASTQ files (you need to import our library scpipeline to do that). Then you can use our example code (PBMC10k) to run "seurat" like analysis using scanpy packages. At last, you can save he result to database. The result swill be automatically posted on Single cell Explorer website you hosted. Scientists can take an initial analysis to check if the prelimianry analysis (most likey you will use default seetings) satisfy the needs of analysis.

Can I integrate Python Notebook with Single Cell Explorer?

Yes, we demonstrated how to use Jupyter Notebook to communicate with Single Cell Explorer system on data analysis section. You can find more about why Jupyter is the computational notebook of choice for data scientists here . For R users, you can use http API to retrieve data from Single Cell Explorer and run your statistical analysis such as EdgeR or DESeq2. We also published the approach to convert Seurat result into csv files, followed by python scripts to load into database.

Do Single Cell Explorer support data from different species?

The current application is able to host data from different species. In addition to human blood and tissue samples, we added non-standard model organisms such as Drosophila in our software demo website.

Why Django, not R Shiny?

Django is a high-level Python Web framework that enables rapid development and clean, pragmatic design. With tools including Django ORM, Middlewares, Authentication, HTTP libraries, Multi-site support, i18n, Django Admin, and a template engine, it removes the hassle in Web development, database integration, and content management. We chose Django to make the development journey easier for the future.
While Shiny is particularly great for fast prototyping and fairly easy to use, it has problems. R stack is weak in retrieving and saving data to the database and in multipage support. Better concurrent user support requires Shiny Server Pro.
Also, it is hard to add extra functionality not already in the package or Dash.

How can I manage the dataset?

To manage data content on your single cell data site, you can use admin interface, a powerful part of Django to manage your dataset.

What is the key difference between Single Cell Explorer and Cellxgene software?

Cellxgene is an interactive data explorer for checking single-cell transcriptomics datasets that are already well curated.That serves the purpose for Chan Zuckerberg Initiative (Human Cell Atlas). Single Cell Explorer was built for scientists to label, annotate, and share their findings so that experimentla scientists will be able to participate data mining more proactively. For Cellxgene, the time required to load h5ad files for a particular dataset or map to cellxgene makes it difficult for it to serve as a data portal for more datasets with concurrent users. Single Cell Explorer was built for supporting general purpose ( multiple datasets, multiple species, flexibility for intergration).

Function	Cellxgene	Single Cell Explorer
Multiple datasets	Not yet, needs to load h5ad each time	Data portal host studies with multiple samples
Manual label and Annotattion	No	Yes
Gene Search & Visualization	Yes	Yes
Dataset registration	No	Data content managment
API and Re-analysis	No	Yes

What is the hardware requirement for using Single Cell Explorer+MongoDB?

Memory: A linux machine with minimal 8G memory is needed to host Single Cell Explorer and MongoDB, which now use WiredTiger as default storage engine. MongoDB uses both the WiredTiger internal cache and the filesystem cache. Its internal cache size is the larger of either: 50% of (RAM - 1 GB), or 256 MB. On a system with a total of 8GB of RAM, the WiredTiger cache will use 3.5 GB of RAM (0.5 * (8 GB - 1 GB) = 3.5 GB). Filesystem cache is used by OS to reduce disk I/O. For the filesystem cache, MongoDB MongoDB automatically uses all free memory that is not used by the WiredTiger cache or by other processes. Therefore, it is better not install Notebook, R studio or pipelines on the same machine that host MongoDB. Since filesystem cache allows the compressed data to stay in memory, we recommend add more memory to ensure performence. If verticle scaling is not possible, horizonal scaling by adding additional servers as needed, this will be a lower overall cost, but the trade off is increased complexity in infrastructure and maintenance for the deployment. You can learn more about MongoDB Atlas from their official website. Diskspace: It all depends on the data or sample sizes you want to add to the database. For example, a sample with 3000 cells (10X Genomic Technology) will require 1.5 G disk space ( for both counts and normalized counts) to store in MongoDB.

Any approach to make installation and setup easy?

Please follow our installation tutorial step by step. If you want to use MongoDB to host your dataset, we recommend your learn basic operation skills for MongoDB. A Docker solution does not make the task easy at this moment. If you run MongoDB in a container (e.g. Docker, etc.), you need to set you need to set storage.wiredTiger.engineConfig.cacheSizeGB if it does not have access to all of the RAM available in a system. The conda installation method is not currently available.