#datasets — oanor

Gene Expression API

Functional-genomics experiments as an API — powered by NCBI GEO (Gene Expression Omnibus), the largest public repository of gene-expression data. GEO archives expression series and curated datasets from microarray and high-throughput-sequencing experiments across every organism. Search experiments by keyword and optionally by organism, and look up any series or dataset to get its metadata: title, summary, assay type (expression profiling by array or by sequencing), organism, number of samples, platform and the publication behind it. From β-cell stress studies to cancer transcriptomics across human and mouse, it turns the GEO archive into a simple search-and-fetch API for transcriptomics, bioinformatics and research-data discovery. A gene-expression / functional-genomics dataset repository — distinct from sequence (ENA), variant (ClinVar, dbVar), structure (PDB) and ontology databases. Open data from NCBI GEO (public domain).

api.oanor.com/geodatasets-api

DataCite API

DataCite as an API — the global registry of DOIs (Digital Object Identifiers) for research outputs. Where Crossref registers DOIs for journal articles, DataCite registers and describes DOIs for research data, software, samples, dissertations, preprints, models, images and other outputs, from repositories such as Zenodo, Dryad and thousands of institutions. /v1/search?query=climate full-text searches the registry and can be narrowed by resource type (type=dataset, software, text, image, audiovisual, collection, model and more), returning each DOI with its title, type, creators, publisher and publication year. /v1/doi?id=10.5281/zenodo.3509134 returns a single DOI's full metadata — title, resource type, creators, publisher, publication year, description, subjects, version, license and registration date. DOIs look like 10.5281/zenodo.3509134 (Zenodo) or 10.5061/dryad.xxxx (Dryad). Ideal for research-data discovery and citation, data-repository and reference-management tools, software-citation features and reproducibility workflows. Metadata is CC0 from DataCite. This is the registry of research data and software DOIs — distinct from the journal-article DOI index (Crossref) and from preprint and open-access services.

api.oanor.com/datacite-api

BioStudies API

BioStudies as an API, powered by EMBL-EBI — the database that holds the descriptions of biological studies and links their data together across EBI resources, including imaging (BioImage Archive), functional genomics (ArrayExpress), proteomics, and the literature (Europe PMC). Each study has an accession, a title and abstract, the collection it belongs to and links to its underlying data and publications. /v1/search?query=covid searches the studies and returns each match's accession (e.g. S-EPMC8017430), title, author, study type, release date and link/file counts. /v1/study?id=S-EPMC8017430 returns a study's metadata — its accession, the collection it belongs to (such as EuropePMC, ArrayExpress or BioImages), title, abstract, release date, authors and the number of linked resources. Accessions look like S-EPMC8017430 or S-BSST123; get one from the search endpoint. Ideal for research-data discovery, linking literature to its underlying datasets, systematic reviews and reproducibility tooling. Data from EMBL-EBI BioStudies (public). This is a studies and datasets metadata index — distinct from the sequence (UniProt, ENA), structure (PDB, EMDB), variant (ClinVar) and ontology databases.

api.oanor.com/biostudies-api

Hugging Face API

The Hugging Face Hub as an API — the central, open registry of machine-learning models and datasets that powers much of the modern AI ecosystem. This API wraps the public huggingface.co Hub into clean JSON. /v1/models searches the Hub's models and lets you filter by task (pipeline_tag — e.g. text-generation, text-to-image, image-classification, automatic-speech-recognition, sentence-similarity) and by library (transformers, diffusers, sentence-transformers, …), sorted by downloads, likes, last-modified, created or trending score — each model returned with its id, author, task, library, download and like counts, license, tags and timestamps. /v1/model?id=google-bert/bert-base-uncased returns a single model's full metadata. /v1/datasets searches ML datasets the same way, and /v1/dataset?id=ILSVRC/imagenet-1k returns a single dataset's metadata. Ids are in org/name form (take them from the search endpoints). Ideal for ML and MLOps tooling, model-discovery and comparison sites, AI leaderboards and dashboards, and AI assistants that recommend models. Data comes from the public Hugging Face Hub (free to use). This is the AI/ML model and dataset hub — distinct from software-package registries (npm, PyPI, Maven, NuGet) and academic paper indexes (arXiv).

api.oanor.com/huggingface-api

MGnify API

MGnify as an API, powered by EMBL-EBI — the world's largest free resource for the analysis and archiving of microbiome sequencing data, and the metagenomics sister to PRIDE (proteomics) and MetaboLights (metabolomics). MGnify holds tens of thousands of public metagenomics and metabarcoding studies spanning the human gut microbiome, marine and freshwater environments, soils, wastewater, the built environment and host-associated communities. Search the studies by keyword, getting each study's MGnify accession (MGYS...), name, abstract, biome, sample count and the source sequencing BioProject; read a study's full metadata including its name and abstract, biome classification, number of samples, submitting centre, public status, data origination and last-update date; and browse the GOLD-style biome classification tree — from root:Host-associated:Human:Digestive system to root:Environmental:Aquatic:Marine — with per-biome sample and study counts, for discovery by environment. Ideal for microbiome and environmental-genomics research, dataset reuse and meta-analysis, bioinformatics pipelines and teaching. Study accessions look like MGYS00006862. Data from EMBL-EBI MGnify.

api.oanor.com/mgnify-api

EU Open Data API

The European Union open-data portal as an API, powered by data.europa.eu — the official single point of access to more than 1.8 million open datasets published by the EU institutions and harvested from the national open-data portals of all 27 member states (including data.gov.uk, data.gouv.fr and GovData Germany). Search datasets across every theme — energy, health, transport, environment, agriculture, economy, justice and more — with optional filters by file format and by publishing country, getting each dataset's identifier, English title and description, publisher, source portal, country, available formats, resource count, last-modified date and licence; read a dataset's full metadata together with all of its downloadable distributions (each distribution's title, format and direct URL), plus categories, keywords, languages and temporal coverage; and explore discovery facets for any query — the most common file formats and the countries publishing matching datasets. Ideal for data journalism, civic-tech and govtech applications, research, market and policy analysis, and any tool that needs to find and download European public-sector information. Dataset identifiers come from search results; titles and descriptions are returned in English where available. Data from data.europa.eu (licences vary per dataset; most are CC-BY or public domain).

api.oanor.com/eudata-api

MetaboLights API

MetaboLights as an API, powered by EMBL-EBI — the world's premier open repository for metabolomics experiments (NMR spectroscopy and mass spectrometry) and a sister resource to PRIDE for proteomics. Search the public metabolomics studies by keyword (returning each study's accession, title, description and organism); read a study's full metadata including its abstract, status, submission and release dates, study-design descriptors, experimental factors, the analytical assays with their measurement type, technology and platform, the contributors and their roles, the linked publications with DOI and PubMed identifiers, submitters, sample count, FTP download URL and data license; inspect the analytical workflow — every protocol with its name, type, description and parameters (sample collection, extraction, chromatography, NMR/MS spectroscopy, data transformation and metabolite identification); and list the organisms and organism parts studied with their ontology terms. Ideal for metabolomics and systems-biology research, dataset reuse and meta-analysis, bioinformatics pipelines and tools that integrate experimental evidence. Study accessions look like MTBLS1. Data from EMBL-EBI MetaboLights.

api.oanor.com/metabolights-api

PRIDE API

The PRIDE proteomics archive as an API, powered by the EMBL-EBI PRIDE Archive — the world's largest public repository of mass-spectrometry proteomics data and a founding member of ProteomeXchange. Search the public proteomics experiments by keyword (returning each project's accession, title, organisms, diseases and instruments); read a project's full metadata including its description, keywords, organisms and organism parts, mass-spectrometry instruments, software, the protein modifications identified, sample- and data-processing protocols, submitters, affiliations and the linked publication (DOI and PubMed); list a project's data files with their category, format, size and a direct download link; and explore facets — the diseases, organisms, instruments, experiment types, software and countries represented across matching projects — for discovery. Ideal for proteomics and systems-biology research, dataset reuse and meta-analysis, bioinformatics pipelines, and tools that integrate experimental evidence. Project accessions look like PXD000001. Data from EMBL-EBI.

api.oanor.com/pride-api