Public omics data is a valuable resource for drug research and development, especially when it comes to validating insights and training AI/ML models. But the problem is, it is scattered across different repositories, in various formats (usually raw matrices), units, and metadata standards. This means a lot of time and effort goes into processing, standardizing, and harmonizing the data before scientists can actually use it for analysis – which can be a real roadblock.
Pythiomics was built with these challenges in mind. Developed and curated by Pythia Biosciences, it is a unified, one-stop multi-omics database that brings together data from different omics types and repositories. Pythiomics undergoes rigorous quality control, combining manual validation with AI-assisted methods for cell type prediction and standardization. Additionally, Pythiomics is easy to interactively explore through the C-DIAM Multi-Omics Studio platform, so researchers can dive in and start leveraging the data right away.
Pythiomics Quick Stats
As of January 2025, Pythiomics hosts over 10,000 omics datasets, covering Bulk RNA-seq, Single-cell RNA-seq, Proteomics, Spatial Transcriptomics, and data from leading public databases. The single-cell database alone includes more than 104 million cells, spanning hundreds of disease subtypes. And with new omics types and repositories being added regularly, Pythiomics continues to grow.
How Does Pythiomics Transform Public Data Exploration?
Multiple Omics Types in the Same Place Enable Easier Cross-omics Insight Validation
Pythiomics solves the problem of fragmented data by offering a centralized database that hosts thousands of datasets of different omics types. Whether you are looking for bulk RNA-seq, single-cell RNA-seq, proteomics, or spatial transcriptomics data, everything is available and ready for exploration under the same directory. No more jumping between different databases—Pythiomics consolidates valuable resources, so you can focus on research without the hassle.
Currently, Pythiomics includes data from multiple omics types, such as Bulk RNA-seq, Single-cell RNA-seq, Proteomics, and Spatial Transcriptomics, sourced from different repositories. New omics types and repositories are continually being added.
Rigorous Harmonization, Standardization, and Cell Type Labeling for Seamless Data Integration and AI/ML Model Training
A key strength of Pythiomics is its harmonized metadata. The metadata across datasets undergoes meticulous manual curation and classification, with AI-driven approaches and algorithms supporting harmonization and cell type prediction. This creates a standardized database that simplifies data integration and enables meta-analysis across studies. It also provides reliable and unified training sets for cutting-edge AI/ML models.
A major milestone in this effort is the creation of a uniform, accessible single-cell database for solid tumors, which already includes 12 million cells from 1,427 cancer patients. This database continues to grow, with future expansion into additional therapeutic areas.
Instant and Interactive Access, for Any Scientist Who Needs It
Most existing omics databases are available in raw formats, requiring tremendous processing effort before scientists can actually get their hands on visualizing the data and extracting insights.
Pythiomics isn't just about providing data—it’s about making that data easy to explore! The database is hosted on the C-DIAM Multi-Omics Studio platform, a modern web platform that supports interactive and comprehensive analysis of multi-omics data. Scientists can simply browse the data, and instantly analyze it through an easy-to-use graphical UI with rich packages of state-of-the-art machine learning algorithms and analysis workflows.
Example: Meta-analysis of CRC multi-omics data to validate emerging CRC tumor markers
In this example, we leveraged the Pythiomics database to extract relevant colorectal cancer (CRC) datasets for validating 25 emerging CRC tumor markers. Using the intuitive GUI of the C-DIAM Multi-Omics Studio, we were able to quickly gather 6 different omics datasets for this analysis:
From the 6 datasets, using C-DIAM multi-omics integration dashboards, we were able to identify 9 potential tumor markers with consistent upregulation across 6 omics datasets. Explore our full work here: 25 Emerging Tumor Markers for Colorectal Cancer Tested across Omics
Access Pythiomics
Pythiomics is currently hosted on the C-DIAM Multi-Omics Studio platform, and will also be made available through a direct data delivery. For access requests and database inquiries, please follow the link below:
Comments