2021 Review of paid bioinformatics SaaS

latch.ai
6 min readMar 14, 2021

The goal of this post is to elucidate three things:

  1. What paid SaaS is used in bioinformatics?
  2. What do these tools/platforms do?
  3. How much do they cost?

Important caveat: Many SaaS tools meant for broader applications (e.g. Excel, Dropbox, AWS) are used within bioinformatics. While this post does not include these, there’s good evidence that they account for a considerable chunk of SaaS spending in biology and life sciences.

Definition: “paid SaaS” in this article will refer to any paid software tool, platform, or infrastructure that is used bioinformatics, including both academic research and industry applications. Open-source tools are listed at the bottom.

Sources: There are primary and secondary sources

  1. 20 interviews with bioinformaticians
  2. Online research, reports, reviews, & reddit (citations below)

What paid SaaS is used in biology?

TlDr; Mostly lab inventory management systems (LIMS), data management, cloud infrastructure, and workflow automation tools.

listed in no particular order

LIMS

Benchling

  • A cloud-based informatics platform for life sciences research and development. Strong data management & sharing.
  • Pricing: Premium plan is $20,000/year. Free for individuals.

Caliber

  • Cloud-based laboratory information management system that is designed for pharmaceutical companies
  • $49/month

LabGuru

  • Web-based platform to record/manage laboratory data in one place. Scientists can design experiments and workflows with an electronic lab notebook (ELN,) capture structured and unstructured data, manage projects, and share their work in one user interface.
  • $10/month per user

Many more are used

but primarily have offerings for clinical LIMS management and are targeting big pharma, thus their pricing is consulting based and can be thousands of $ per licensed user. More examples are

BioinFormatics Workflows

QuiltData (YC)

  • Quilt solves the reproducibility & versioning crisis for teams working with huge, varied datasets. Quilt transforms data into versioned, reusable datasets so that teams in ML and computational biology can iterate faster, reduce errors, and deploy smarter models. The whole platform is really a Python API, web catalog, and backend stack to manage data in S3. Used by the Allen Institute, Netguru, Ribon Therapeutics, Celsius Therapeutics, and more.
  • Pricing: $1000/month for unlimited data & unlimited users.

Geneious

  • They kill sequence analysis & MCB. Geneious makes intuitive genomics tools for Sanger, NGS, and long-read sequence analysis, including pairwise & multiple alignments, de novo assembly, mapping, expression analysis, variant calling, NGS visualization, automatic annotation, and phylogenetic tree building.
  • They also offer cloning & primer design for molecular biology, as well as simplified data management from many different file formats to implement into a shared, browsable, and versioned database.
  • Pricing: Academic is $3800/year per 10 seats, Industry is 13,500/year per 10 seats. Personal plans at $200/$450/$1500/yr.

SnapGene

  • SnapGene solves the problem of confusing, irreproducible, and scattered cell cloning. They simply offer better cloning simulations for molecular biology. Snapgene enables you to design, simulate, and test cloning procedures in an ultra-simplified visualization to show your cloning task. It also automates documentation, so you can see and share every sequence edit and cloning procedure that led to your final plasmid.
  • Pricing: For academic teams, $1395/year for 10 seats. For industry, $11,950/year for 10 seats. For students, $149/year for 1 seat.

DNAStar

  • Offering molecular biology, protein analysis, genomic analysis, and transcriptomics software for life sciences. They seem to have robust solutions with ~1300 universities, 500 biotech/pharma companies, and 80,000 citations of their tooling. Nothing completely novel that we haaven’t seen before (annotation, cloning, prediction, visualization) but just seems to be an “all in one” kind of solution:
  • Pricing: Teired pricing. For academics, ranges from $250/year to $2400/year. For industry, ranges from $600/year to $3999/year.

Sentieon

  • Though outdated in UI/UX, Sentieon’s strength is improving speed & reducing cost. The company mantains a wedge in genome sequencing, molecular diagnostics, pharmaceutical, and biotech companies with its offering of ultra efficient Variant Calling and alignment tools. According to reddit users, the big benfit they offer is a “vastly faster” genome analysis tool-kit as they make it easy to run multithreaded sequencing in the cloud.(source)
  • Pricing: You must contact them to get pricing (boo)

DNAStack

  • End-to-end software solution for simplified genome sequence workflows. Partnered with Sentieon, they shifted focused last year to develop genomic-based diagnostics and treatments for Covid. Looks like they have about 15 employees.
  • Pricing: You have to request a demo to get pricing (boo)

FlowJo

  • The de-facto solution for flow cytometry analysis. They are now trying to differentiate and get into single-cell RNA sequencing analysis. Built SeqGeq, their proprietary scRNA-seq software tool, and appear to have some pretty substantial use from industry.
  • Pricing: FlowJo is $2695 with academic discount. SeqGeq is $750/year.

USearch

  • Cited by 14,300 papers, USearch is a high-throughput sequence search and clustering analysis tool. Over 60,000 users have registered to try it, with UC Berkeley and many other universities being customers.
  • Pricing: $1485 USD for technical support & a single license for use.

BioRender

  • Used by Roche, Genentech, Stanford, Cambridge, Princeton, NIH, & recommended by Reddit, BioRender is a science figure design creation tool. They enable editable and customizeable scientific visualizations. Canva for science.
  • Pricing: $35/month per individual, $99/month per lab, custom for institution

Shrodinger Maestro

  • Computational drug discovery platform used by biopharma that is based on predictive physics and machine learning, aimed at accelerating discovery and optimization of chemical matter in-silico. Strengths are molecular exploration, faster lead discovery, and property prediction. Used by iover 1200 academic research institutions.
  • Pricing: Annual contract value sits at over $100,000.

PetaGene

  • Decreases size of genomic data, reducing storage costs & data transfer times by up to 90% while maintaining data integrity. They do this through multi-headed linux software which they call PetaSuite, allowing users to losslessly compress BAM & FASTQ files for massive storage savings.
  • Pricing: $160 per TB of data compression

Rosalind Bio

  • Impressively designed software for gene expression analysis, nanostring analysis, single cell analysis, ChIP Seq, ATAC-Seq, and knowledge graph representations. Probably the most beautiful / best UI software I’ve seen. Used by NYU, Cornell, UCSD, Astellas, and more.
  • Pricing: From $499/year to $10,000/year

Apeer

  • Apeer does no-code workflow automation for biologists. Allows you to train deep learning algorithms for image segmentation. Then use these deep learning models to automate segmentation, processing, and operations. They also have insights to extract info from segmented regions, including data evaluation, analysis, and plotting.
  • Pricing: Free version + premium “Coming soon” with subscription price TBD

What unpaid SaaS is used in biology?

TlDr; hundreds of tools are available, but those that are highly cited with reliable results in research come out on top.

Galaxy

  • Web platform for data intensive biomedical research

AutoDock

  • Molecular modeling & ligand docking software

LakeFS

  • Transform object storage into a Git-like repository, 833 stars on Git

BioDock.ai

  • Automated pipeline for phylogenomic analysis. Overcomes bottlenecks of large-space protein phylogenetic analysis. High-throughput, high-quality, and reproducible results. Used to derive phylogenetic information from metagenomic data sets.

Nextflow.io

  • Make scalable & reproducible scientific workflows with existing software containers and data-driven computational pipelines

KBase

  • Developed for bench biologists and bioinformaticians, KBase is a software and data science platform designed to meet the grand challenge of systems biology: predicting and designing biological function.

CellxGene

  • Interactive, performant explorer for single-cell transcriptomics data

Seurat

  • Multi-modal single-cell genomics analysis. Most of the bioinformaticians we’ve met have used Seurat in their sc workflows.

ScanPy

  • Scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.

Human Cell Atlas

  • The HCA data portal offers a community generated multi-open dataset of 4.5 million cells

Allen Cell Explorer

  • 3D visualization tool for cells

Many many more…

--

--