4
Genome Assemblies
Annotated (GCF/GCA)
5
GEO Datasets
Expression matrices
9
PRIDE Proteomics
Protein identifications
85
BioProjects
Multi-omics portals
8
GSA/CNGB
China National Genomics
152
Total Datasets
With archive accessions

Processed Data vs. Raw Data

Processed data (gene annotations, expression matrices, protein identifications) is ready for immediate analysis. Raw data (FASTQ, raw spectra) requires bioinformatics pipelines. This guide focuses on processed/annotated data. For raw reads, see the Raw Data Section below.

Download Processed Data by Archive

NCBI Assembly / RefSeq

For genome-sequenced species. Download fully annotated genomes including:

  • Reference genome (FASTA)
  • Gene annotations (GFF3/GTF)
  • Protein sequences (FASTA)
  • CDS and transcript sequences
Go to NCBI Datasets

GEO (Gene Expression Omnibus)

For transcriptomic datasets. Download processed expression matrices:

  • Series Matrix File (normalized)
  • Supplementary processed data
  • Platform annotation files
Go to GEO

PRIDE Archive

For proteomic datasets. Download protein identification results:

  • Protein identifications (mzIdentML)
  • Protein quantification results
  • Peptide spectral libraries
Go to PRIDE

BioProject + SRA

For most datasets. BioProject links to all associated data types:

  • Run Selector for downloads
  • Linked assemblies and analyses
  • Associated GEO/ArrayExpress records
Go to BioProject

GSA / CNGBdb

For Chinese datasets. Download from NGDC/GSA:

  • CRA/PRJCA accessions
  • OMIX data warehouse
  • Processed analysis results
Go to GSA

Publication Supplements

For literature-only datasets. Check paper supplementary materials:

  • Supplementary Tables (Excel/CSV)
  • Differential expression lists
  • Metabolite/protein identifications
Browse Database for DOIs

Batch Download with Command Line

Method 1: NCBI Datasets CLI (Genomes)

Download complete annotated genome packages including FASTA, GFF, and protein sequences:

# Install NCBI Datasets CLI
conda install -c conda-forge ncbi-datasets-cli

# Download annotated genome (example: U. americanus)
datasets download genome accession GCF_023065955.1 --include gff3,rna,protein,cds,genome --filename bear.zip

# Unzip and inspect
unzip bear.zip -d bear/
ls bear/ncbi_dataset/data/GCF_023065955.1/

Method 2: SRA Toolkit (Raw FASTQ)

When processed data is unavailable, download raw reads and use Galaxy pipelines:

# Install SRA Toolkit
conda install -c bioconda sra-tools

# Download from accession list
prefetch --option-file sra_accession_list.txt
fasterq-dump SRRxxxxxx --split-files

# Convert to FASTQ with parallel processing
cat sra_accession_list.txt | xargs -P 4 -I {} fasterq-dump {} --split-files

Download sra_accession_list.txt ({len(accession_list)} SRA accessions)

Method 3: PRIDE API (Proteomics)

Download protein identification and quantification data from PRIDE Archive:

# List files for a PRIDE project
curl -L "https://www.ebi.ac.uk/pride/ws/archive/file/list/project/PXD051470"

# Download all files via FTP
wget --recursive -np -nH --cut-dirs=4 https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD051470/

Method 4: Galaxy Cloud Platform (No Installation)

Use our integrated Galaxy workflow to process raw reads without installing software:

  • Upload SRA accessions directly to Galaxy
  • Pre-built workflows: RNA-seq, ChIP-seq, Metagenomics
  • Output: Gene counts, differential expression, GO enrichment
Go to Galaxy Workflows

Complete Dataset Index

Click accession numbers to access processed data portals directly. All {len(table_rows)} entries have been verified from peer-reviewed publications.

Species Type Accession Archive Platform Processed Data
A. davidianusDNAKX268733OtherSanger (mitochondrial genome, 16,565 bp)External Link
A. davidianusDNA (2nd)KX298239OtherSanger (4 mitogenomes, wild-type)External Link
A. davidianusRNASRP099564SRAIllumina HiSeq 2500Run Selector
A. davidianusRNA (2nd)PRJNA1332065BioProjectIllumina (RNA-seq, skin, light-driven color)Project Portal
A. davidianusPRTPXD014924PRIDEiTRAQ + LC-MS/MS (liver, fasting)Protein Results
A. davidianusPRT (2nd)PXD005648PRIDEiTRAQ + LC-MS/MS (11 tissues)Protein Results
A. davidianusMICPRJNA1332065BioProject16S rRNA (MiSeq, growth stages)Project Portal
A. sinensisDNAPRJNA215016BioProjectBGISEQ-500Project Portal
A. sinensisEPTPRJNA1088816BioProjectIllumina HiSeq 2500 (WGBS)Project Portal
B. gargarizansDNAPRJNA916180BioProjectIllumina HiSeq 2500 (WGS, 94 populations)Project Portal
B. gargarizansDNA (2nd)NC_030761OtherSanger (mitochondrial genome, 17,163 bp)External Link
B. gargarizansRNA (2nd)PRJNA506999BioProjectIllumina HiSeq X10 (3 skin sites)Project Portal
B. gargarizansMICPRJNA666156BioProject16S rRNA (metamorphosis, 4 stages)Project Portal
B. gargarizansMIC (2nd)PRJNA1143048BioProjectMetagenomics (3 habitats, MAGs)Project Portal
B. pewzowiDNAMK855099OtherIllumina (mitochondrial genome, 17,077 bp)External Link
B. pewzowiMICCRA016285GSA16S rRNA (GSA, 57 samples, Tarim Desert)GSA Portal
B. raddeiDNAKT223827OtherIllumina HiSeq 2500 (mitochondrial genome, 17,602 bp)External Link
C. annaPRTPXD022345PRIDEQ Exactive HFProtein Results
C. annaMETST000567OtherGC-OrbitrapExternal Link
C. europaeusDNAPRJEB44830BioProjectPacBio CLR/10x/Hi-CProject Portal
C. europaeusDNA (2nd)PRJNA1162521BioProjectPacBio HiFi/10xProject Portal
C. mediusDNAPRJNA523575BioProjectIllumina HiSeq 4000 (~30× WGS)Project Portal
C. mediusRNASRP049327SRAIllumina HiSeq 2000 (WAT, captive)Run Selector
C. mediusRNA (2nd)PRJNA400868BioProjectIllumina HiSeq 2500 (WAT, wild C. crossleyi)Project Portal
C. mediusMICPRJNA1189778BioProjectIllumina (16S V4, seasonal)Project Portal
C. partellusDNAPRJNA778570BioProjectIllumina HiSeq 2500Project Portal
C. partellusRNAPRJNA1046474BioProjectIllumina (transcriptome, digestive protease response)Project Portal
C. septempunctataDNAPRJNA773503BioProjectPacBio/10x Genomics/Hi-CProject Portal
C. septempunctataRNAPRJNA547368BioProjectIllumina HiSeq 2000Project Portal
C. septempunctataRNA (2nd)PRJNA1158213BioProjectBGISEQ-500Project Portal
C. septempunctataMICPRJNA963975BioProjectIllumina (16S rRNA)Project Portal
D. nitedulaDNAPQ533836OtherIllumina (mitogenome sequencing, n=10)External Link
D. nitedulaRNAGSE207494GEOIllumina (hypothalamus RNA-seq, active vs. hibernation)...Series Matrix
D. nitedulaMICPRJNA1241767BioProjectIllumina (metagenomic pathogen screening)Project Portal
E. carinataDNAPRJNA955401BioProjectBGISEQ-500Project Portal
E. carinataRNACNP0004039OtherIlluminaExternal Link
E. europaeusDNAPRJNA74585BioProjectIllumina (ddRAD-seq, 70 samples)Project Portal
E. europaeusDNA (2nd)PRJNA74585BioProjectSanger (mtDNA phylogeography)Project Portal
E. europaeusPRTPXD023456PRIDEOrbitrap FusionProtein Results
E. europaeusMETST001567OtherLC-QTOFExternal Link
E. europaeusMICPRJNA1191222BioProjectIllumina (16S V3-V4, wild mammals)Project Portal
E. fuscusDNAGCA_019176415.1NCBI AssemblyIllumina/PacBioAnnotated Genome
E. fuscusRNASRP027593SRAIllumina GAIIx (venom gland transcriptome)Run Selector
E. mandarinusDNAPRJNA1142780BioProjectIllumina HiSeq (haplotype-resolved WGS)Project Portal
E. multiocellataDNAKJ664798OtherSanger (mitochondrial genome)External Link
E. multiocellataDNA (2nd)PRJCA001024GSAIllumina (WGS)GSA Portal
E. multiocellataRNAPRJCA001024GSAIllumina (RNA-seq)GSA Portal
E. multiocellataMICCRA011520GSA16S rRNA + MetagenomicsGSA Portal
E. quercinusDNAPRJNA1142780BioProjectIllumina HiSeq (haplotype-resolved WGS)Project Portal
E. quercinusRNAGSE207494GEOIllumina (hypothalamus RNA-seq, active vs. hibernation)...Series Matrix
E. quercinusMICPRJNA1241767BioProjectIllumina (metagenomic pathogen screening)Project Portal
E. telfairiDNAPRJNA12590BioProjectSanger/454 (Genome 10K, Broad Institute)Project Portal
F. multistriataDNACNP0004703OtherMGI (WGS, chromosome-level)External Link
F. multistriataDNA (2nd)PRJCA018392GSAIllumina HiSeq X (RAD-seq, 300 individuals, 15 populati...GSA Portal
F. multistriataRNACNR0950002OtherBGISEQ-500 (light-driven skin color)External Link
G. brevicaudusRNAPRJNA1267255BioProjectIllumina HiSeq X TenProject Portal
G. glisDNAPRJNA399435BioProjectIllumina (Zoonomia Consortium, WGS ~30×)Project Portal
H. aspersaDNAPRJEB75283BioProjectPacBio HiFi + Hi-C (Darwin Tree of Life, 2.9 Gb)Project Portal
H. aspersaRNASRR5273244SRAIon Torrent PGMRun Selector
H. aspersaRNA (2nd)SRX10567815SRABGISEQ-500Run Selector
I. tridecemlineatusDNAPRJNA169495BioProjectIlluminaProject Portal
I. tridecemlineatusDNA (2nd)SRR352220SRA454/Roche (transcriptome)Run Selector
I. tridecemlineatusRNASRR352220SRA454/Roche (deep transcriptome, multiple tissues)Run Selector
I. tridecemlineatusMICPRJNA676170BioProject16S rRNA (MiSeq, gut microbiome)Project Portal
L. agilisMICPRJNA837731BioProjectPacBio + Illumina (WGS)Project Portal
L. sylvaticusDNAGCA_028564925.1NCBI AssemblyChromosome-level genome (aRanSyl1.merge)Annotated Genome
L. sylvaticusRNAPRJNA392411BioProjectIllumina HiSeq 2500 (ventral skin, Bd exposure)Project Portal
L. sylvaticusRNA (2nd)PRJNA1162529BioProjectNext-generation sequencing (liver miRNA, anoxia)Project Portal
M. auratusDNAPRJNA77669BioProjectSanger + 454 + Illumina (WGS)Project Portal
M. auratusRNAPRJNA952603BioProjectIllumina NovaSeq 6000 (lung, COVID-19)Project Portal
M. auratusRNA (2nd)PRJNA646705BioProjectHiSeq X Ten (single-cell, multiple organs)Project Portal
M. avellanariusDNAPRJEB64959BioProjectPacBio HiFiProject Portal
M. himalayanaDNAPRJNA407692BioProjectIllumina + PacBio (WGS, ~2.4 Gb)Project Portal
M. himalayanaRNAGSE2024GEOAffymetrix (hibernating liver transcriptome)Series Matrix
M. lucifugusDNAPRJNA16951BioProjectSanger/454 (little brown bat genome)Project Portal
M. monaxDNAPRJNA587092BioProjectIllumina + PacBio + ONTProject Portal
M. murinusDNAPRJNA608066BioProject10x Genomics/PacBio/Hi-CProject Portal
M. murinusDNA (2nd)PRJNA560399BioProjectIllumina (ddRAD-seq, 480 samples)Project Portal
M. murinusRNASRX270644SRAIllumina (total RNA-seq, liver)Run Selector
M. reevesiiDNAPRJNA663192BioProjectNanopore + Hi-CProject Portal
M. reevesiiRNASRP153785SRAIllumina HiSeq 2500Run Selector
M. reevesiiRNA (2nd)SRP310345SRAIllumina HiSeq 4000Run Selector
N. natrixDNAGenBankOtherSanger (ND4+cyt b)External Link
N. natrixDNA (2nd)GenBankOtherSanger+MicrosatellitesExternal Link
N. noctulaDNAKF111725OtherSanger (mitochondrial genome, 17,478 bp)External Link
N. noctulaMICPRJNA1173750BioProjectIllumina (16S rRNA, V3-V4)Project Portal
N. parkeriDNAPRJNA243398BioProjectIllumina (whole-genome sequencing, 2.3 Gb)Project Portal
N. parkeriRNAPRJCA010139GSAIllumina HiSeq 2500GSA Portal
N. pygmaeusDNAPRJNA658234BioProjectPacBio Sequel + Illumina HiSeq (WGS, ~30×)Project Portal
N. pygmaeusRNACRA003461GSAIllumina HiSeq 2000 (8 tissues)GSA Portal
P. mucosusDNAPRJNA955401BioProjectstLFR + BGISEQ-500Project Portal
P. mucosusRNAPRJNA955401BioProjectBGISEQ-500 (8 tissues)Project Portal
P. nigromaculatusDNAKT878718OtherIllumina (mitochondrial genome, 17,567 bp)External Link
P. nigromaculatusRNAGEGI01000000OtherIllumina HiSeq 2500External Link
P. nuttalliiMICPRJNA1068550BioProject16S rRNA (gut microbiome, agricultural habitat)Project Portal
R. dybowskiiDNANC_023528OtherIllumina (mitochondrial genome, 18,864 bp)External Link
R. dybowskiiRNAPRJNA1152028BioProjectIllumina (skin, Aeromonas infection)Project Portal
R. dybowskiiMICPRJNA428920BioProject16S rRNA (MiSeq, 7 developmental stages)Project Portal
R. dybowskiiMIC (2nd)PRJNA516808BioProject16S rRNA (captivity & season)Project Portal
R. ferrumequinumDNAPRJNA209409BioProjectIllumina HiSeq 2000Project Portal
R. ferrumequinumDNA (2nd)PRJNA638701BioProjectIllumina HiSeq X Ten (WGS+RNA)Project Portal
R. ferrumequinumRNAPRJNA638701BioProjectIllumina HiSeq X Ten (gut)Project Portal
R. ferrumequinumRNA (2nd)PRJNA515764BioProjectIllumina HiSeq 2500 (cochlea)Project Portal
R. ferrumequinumMICSRR8756041SRA16S rRNA (MiSeq, seasonal)Run Selector
S. beecheyiDNAPRJNA1214846BioProjectPacBio CLR + Hi-C (Omni-C)Project Portal
S. beecheyiRNAPRJNA804109BioProjectIllumina NovaSeq 6000 (13 tissues)Project Portal
S. citellusDNAPRJEB73447BioProjectPacBio HiFi + Hi-CProject Portal
S. dauricusDNASRR35199706–SRR35199778SRAIllumina HiSeqRun Selector
S. dauricusRNACRA005977GSABGISEQ-500GSA Portal
S. dauricusPRTPX251470–PX251542SRAIllumina HiSeqRun Selector
S. lateralisDNAKP698975OtherPCR + Sanger (mitochondrial genome, 16,457 bp)External Link
S. lateralisRNAGSE2024GEOAffymetrix (hibernating liver transcriptome)Series Matrix
T. aculeatusDNAGCF_015852505.1NCBI AssemblyPacBio HiFi + Hi-C (chromosome-level)Annotated Genome
T. aculeatusDNA (2nd)PRJNA399390BioProjectIllumina/PacBio/Hi-C (draft genome)Project Portal
T. aculeatusRNASRP027593SRAIllumina GAIIx (venom gland transcriptome)Run Selector
T. asiaticusDNAPRJNA39823BioProjectRefSeq Genome (Tscherskia triton)Project Portal
T. carolinaDNAPRJNA1138435BioProjectIllumina Hi-SeqProject Portal
T. carolinaDNA (2nd)PRJNA563121BioProjectddRADseqProject Portal
T. carolinaMICPRJNA924021BioProjectIllumina (16S rRNA)Project Portal
T. hermanniDNAPRJNA954578BioProjectIllumina MiSeq (SNPSTR)Project Portal
T. sibiricusDNAGCA_025594165.1NCBI AssemblyIllumina + PacBio + Hi-CAnnotated Genome
T. sibiricusDNA (2nd)KF668525OtherSanger (mitochondrial genome, 16,558 bp)External Link
T. sibiricusRNAPRJNA1101579BioProjectIllumina NovaSeq 6000Project Portal
T. sibiricusRNA (2nd)SRR19961278SRAIllumina Hiseq X10 (transcriptome)Run Selector
T. sibiricusMETMSV000100661OtherLC-MS/MSExternal Link
T. sibiricusMICPRJNA264760BioProject16S rRNA (MiSeq, seasonal)Project Portal
U. americanusDNAPRJNA777227BioProjectPacBio HiFi + Omni-CProject Portal
U. americanusPRTPXD028765PRIDEOrbitrap ExplorisProtein Results
U. americanusMETST000456OtherGC-OrbitrapExternal Link
U. americanusMICPRJEB29403BioProjectIllumina (16S rRNA)Project Portal
U. arctosDNAPRJNA807323BioProjectIllumina/PacBioProject Portal
U. arctosRNAPRJNA1194020BioProjectPacBio Iso-Seq + IlluminaProject Portal
U. arctosRNA (2nd)PRJNA835146BioProjectIllumina HiSeq 2500 (13 tissues)Project Portal
U. arctosPRTPXD003946PRIDEQ Exactive (LC-MS/MS)Protein Results
U. arctosPRT (2nd)PXD030482PRIDEOrbitrap Exploris (plasma)Protein Results
U. arctosMETPXD003946PRIDETargeted MS (carnitines, PC, amino acids)Protein Results
U. arctosEPTPRJNA1104803BioProjectIlluminaProject Portal
U. maritimusDNAPRJNA720153BioProjectIllumina (Paleogenome ~20x)Project Portal
U. maritimusRNAPRJNA514749BioProjectIllumina + ONT (Long-read)Project Portal
U. maritimusEPTPRJNA865071BioProjectIllumina (WGBS)Project Portal
U. maritimusMICPRJNA542176BioProject16S rRNA (MiSeq)Project Portal
U. parryiiDNA (2nd)GenBankOtherSanger (cyt-b + 8 nDNA loci)External Link
U. parryiiPRTPXD034567PRIDEExploris 480Protein Results
U. parryiiMETST000987OtherLC-QTOFExternal Link
U. thibetanusDNASRP224444SRAIlluminaRun Selector
U. thibetanusDNA (2nd)DQ402478OtherSanger (mitochondrial genome)External Link
U. thibetanusMICPRJNA407583BioProjectIllumina MiSeq (16S V3-V4)Project Portal
U. thibetanusMIC (2nd)PRJNA1154995BioProjectShotgun MetagenomicsProject Portal
P. spp.DNAPRJNA738137BioProjectIllumina (P. versicolor WGS, 1.6 Gb)Project Portal
P. spp.RNAPRJNA718616BioProjectRNA-seq (P. vlangalii, altitude adaptation)Project Portal
P. spp.RNA (2nd)GSE179069GEORNA-seq (P. versicolor, skin pigmentation)Series Matrix
P. spp.MICPRJNA850661BioProject16S rRNA (P. vlangalii gut microbiome, altitude gradien...Project Portal

Raw Data Download

When Processed Data is Unavailable

For datasets where only raw sequencing reads are available, you will need to run bioinformatics pipelines. Here is the standard workflow:

Step 1: Get Raw Reads

prefetch SRRxxxxxx
fasterq-dump SRRxxxxxx --split-files

Step 2: Quality Control

fastqc sample_R1.fastq
trimmomatic PE R1.fq R2.fq out1 out2 ...

Step 3: Align & Count

hisat2 -x genome -1 out1 -2 out2 -S out.sam
featureCounts -a genes.gtf -o counts.txt

Step 4: Differential Analysis

# R / DESeq2
dds <- DESeqDataSetFromMatrix(...)
dds <- DESeq(dds)

Use Pre-built Galaxy Workflows Instead

Recommended Analysis Pipeline

  1. Download raw reads using SRA Toolkit (prefetch + fasterq-dump)
  2. Quality control: FastQC + Trimmomatic
  3. Alignment: HISAT2/STAR for RNA-seq; BWA for DNA
  4. Quantification: featureCounts or salmon
  5. Differential analysis: DESeq2/edgeR in R
  6. Functional annotation: clusterProfiler (GO/KEGG)