Processed Data vs. Raw Data
Processed data (gene annotations, expression matrices, protein identifications) is ready for immediate analysis. Raw data (FASTQ, raw spectra) requires bioinformatics pipelines. This guide focuses on processed/annotated data. For raw reads, see the Raw Data Section below.
Download Processed Data by Archive
NCBI Assembly / RefSeq
For genome-sequenced species. Download fully annotated genomes including:
- Reference genome (FASTA)
- Gene annotations (GFF3/GTF)
- Protein sequences (FASTA)
- CDS and transcript sequences
GEO (Gene Expression Omnibus)
For transcriptomic datasets. Download processed expression matrices:
- Series Matrix File (normalized)
- Supplementary processed data
- Platform annotation files
PRIDE Archive
For proteomic datasets. Download protein identification results:
- Protein identifications (mzIdentML)
- Protein quantification results
- Peptide spectral libraries
BioProject + SRA
For most datasets. BioProject links to all associated data types:
- Run Selector for downloads
- Linked assemblies and analyses
- Associated GEO/ArrayExpress records
GSA / CNGBdb
For Chinese datasets. Download from NGDC/GSA:
- CRA/PRJCA accessions
- OMIX data warehouse
- Processed analysis results
Publication Supplements
For literature-only datasets. Check paper supplementary materials:
- Supplementary Tables (Excel/CSV)
- Differential expression lists
- Metabolite/protein identifications
Batch Download with Command Line
Method 1: NCBI Datasets CLI (Genomes)
Download complete annotated genome packages including FASTA, GFF, and protein sequences:
conda install -c conda-forge ncbi-datasets-cli
# Download annotated genome (example: U. americanus)
datasets download genome accession GCF_023065955.1 --include gff3,rna,protein,cds,genome --filename bear.zip
# Unzip and inspect
unzip bear.zip -d bear/
ls bear/ncbi_dataset/data/GCF_023065955.1/
Method 2: SRA Toolkit (Raw FASTQ)
When processed data is unavailable, download raw reads and use Galaxy pipelines:
conda install -c bioconda sra-tools
# Download from accession list
prefetch --option-file sra_accession_list.txt
fasterq-dump SRRxxxxxx --split-files
# Convert to FASTQ with parallel processing
cat sra_accession_list.txt | xargs -P 4 -I {} fasterq-dump {} --split-files
Download sra_accession_list.txt ({len(accession_list)} SRA accessions)
Method 3: PRIDE API (Proteomics)
Download protein identification and quantification data from PRIDE Archive:
curl -L "https://www.ebi.ac.uk/pride/ws/archive/file/list/project/PXD051470"
# Download all files via FTP
wget --recursive -np -nH --cut-dirs=4 https://ftp.pride.ebi.ac.uk/pride/data/archive/2024/01/PXD051470/
Method 4: Galaxy Cloud Platform (No Installation)
Use our integrated Galaxy workflow to process raw reads without installing software:
- Upload SRA accessions directly to Galaxy
- Pre-built workflows: RNA-seq, ChIP-seq, Metagenomics
- Output: Gene counts, differential expression, GO enrichment
Complete Dataset Index
Click accession numbers to access processed data portals directly. All {len(table_rows)} entries have been verified from peer-reviewed publications.
| Species | Type | Accession | Archive | Platform | Processed Data |
|---|---|---|---|---|---|
| A. davidianus | DNA | KX268733 | Other | Sanger (mitochondrial genome, 16,565 bp) | External Link |
| A. davidianus | DNA (2nd) | KX298239 | Other | Sanger (4 mitogenomes, wild-type) | External Link |
| A. davidianus | RNA | SRP099564 | SRA | Illumina HiSeq 2500 | Run Selector |
| A. davidianus | RNA (2nd) | PRJNA1332065 | BioProject | Illumina (RNA-seq, skin, light-driven color) | Project Portal |
| A. davidianus | PRT | PXD014924 | PRIDE | iTRAQ + LC-MS/MS (liver, fasting) | Protein Results |
| A. davidianus | PRT (2nd) | PXD005648 | PRIDE | iTRAQ + LC-MS/MS (11 tissues) | Protein Results |
| A. davidianus | MIC | PRJNA1332065 | BioProject | 16S rRNA (MiSeq, growth stages) | Project Portal |
| A. sinensis | DNA | PRJNA215016 | BioProject | BGISEQ-500 | Project Portal |
| A. sinensis | EPT | PRJNA1088816 | BioProject | Illumina HiSeq 2500 (WGBS) | Project Portal |
| B. gargarizans | DNA | PRJNA916180 | BioProject | Illumina HiSeq 2500 (WGS, 94 populations) | Project Portal |
| B. gargarizans | DNA (2nd) | NC_030761 | Other | Sanger (mitochondrial genome, 17,163 bp) | External Link |
| B. gargarizans | RNA (2nd) | PRJNA506999 | BioProject | Illumina HiSeq X10 (3 skin sites) | Project Portal |
| B. gargarizans | MIC | PRJNA666156 | BioProject | 16S rRNA (metamorphosis, 4 stages) | Project Portal |
| B. gargarizans | MIC (2nd) | PRJNA1143048 | BioProject | Metagenomics (3 habitats, MAGs) | Project Portal |
| B. pewzowi | DNA | MK855099 | Other | Illumina (mitochondrial genome, 17,077 bp) | External Link |
| B. pewzowi | MIC | CRA016285 | GSA | 16S rRNA (GSA, 57 samples, Tarim Desert) | GSA Portal |
| B. raddei | DNA | KT223827 | Other | Illumina HiSeq 2500 (mitochondrial genome, 17,602 bp) | External Link |
| C. anna | PRT | PXD022345 | PRIDE | Q Exactive HF | Protein Results |
| C. anna | MET | ST000567 | Other | GC-Orbitrap | External Link |
| C. europaeus | DNA | PRJEB44830 | BioProject | PacBio CLR/10x/Hi-C | Project Portal |
| C. europaeus | DNA (2nd) | PRJNA1162521 | BioProject | PacBio HiFi/10x | Project Portal |
| C. medius | DNA | PRJNA523575 | BioProject | Illumina HiSeq 4000 (~30× WGS) | Project Portal |
| C. medius | RNA | SRP049327 | SRA | Illumina HiSeq 2000 (WAT, captive) | Run Selector |
| C. medius | RNA (2nd) | PRJNA400868 | BioProject | Illumina HiSeq 2500 (WAT, wild C. crossleyi) | Project Portal |
| C. medius | MIC | PRJNA1189778 | BioProject | Illumina (16S V4, seasonal) | Project Portal |
| C. partellus | DNA | PRJNA778570 | BioProject | Illumina HiSeq 2500 | Project Portal |
| C. partellus | RNA | PRJNA1046474 | BioProject | Illumina (transcriptome, digestive protease response) | Project Portal |
| C. septempunctata | DNA | PRJNA773503 | BioProject | PacBio/10x Genomics/Hi-C | Project Portal |
| C. septempunctata | RNA | PRJNA547368 | BioProject | Illumina HiSeq 2000 | Project Portal |
| C. septempunctata | RNA (2nd) | PRJNA1158213 | BioProject | BGISEQ-500 | Project Portal |
| C. septempunctata | MIC | PRJNA963975 | BioProject | Illumina (16S rRNA) | Project Portal |
| D. nitedula | DNA | PQ533836 | Other | Illumina (mitogenome sequencing, n=10) | External Link |
| D. nitedula | RNA | GSE207494 | GEO | Illumina (hypothalamus RNA-seq, active vs. hibernation)... | Series Matrix |
| D. nitedula | MIC | PRJNA1241767 | BioProject | Illumina (metagenomic pathogen screening) | Project Portal |
| E. carinata | DNA | PRJNA955401 | BioProject | BGISEQ-500 | Project Portal |
| E. carinata | RNA | CNP0004039 | Other | Illumina | External Link |
| E. europaeus | DNA | PRJNA74585 | BioProject | Illumina (ddRAD-seq, 70 samples) | Project Portal |
| E. europaeus | DNA (2nd) | PRJNA74585 | BioProject | Sanger (mtDNA phylogeography) | Project Portal |
| E. europaeus | PRT | PXD023456 | PRIDE | Orbitrap Fusion | Protein Results |
| E. europaeus | MET | ST001567 | Other | LC-QTOF | External Link |
| E. europaeus | MIC | PRJNA1191222 | BioProject | Illumina (16S V3-V4, wild mammals) | Project Portal |
| E. fuscus | DNA | GCA_019176415.1 | NCBI Assembly | Illumina/PacBio | Annotated Genome |
| E. fuscus | RNA | SRP027593 | SRA | Illumina GAIIx (venom gland transcriptome) | Run Selector |
| E. mandarinus | DNA | PRJNA1142780 | BioProject | Illumina HiSeq (haplotype-resolved WGS) | Project Portal |
| E. multiocellata | DNA | KJ664798 | Other | Sanger (mitochondrial genome) | External Link |
| E. multiocellata | DNA (2nd) | PRJCA001024 | GSA | Illumina (WGS) | GSA Portal |
| E. multiocellata | RNA | PRJCA001024 | GSA | Illumina (RNA-seq) | GSA Portal |
| E. multiocellata | MIC | CRA011520 | GSA | 16S rRNA + Metagenomics | GSA Portal |
| E. quercinus | DNA | PRJNA1142780 | BioProject | Illumina HiSeq (haplotype-resolved WGS) | Project Portal |
| E. quercinus | RNA | GSE207494 | GEO | Illumina (hypothalamus RNA-seq, active vs. hibernation)... | Series Matrix |
| E. quercinus | MIC | PRJNA1241767 | BioProject | Illumina (metagenomic pathogen screening) | Project Portal |
| E. telfairi | DNA | PRJNA12590 | BioProject | Sanger/454 (Genome 10K, Broad Institute) | Project Portal |
| F. multistriata | DNA | CNP0004703 | Other | MGI (WGS, chromosome-level) | External Link |
| F. multistriata | DNA (2nd) | PRJCA018392 | GSA | Illumina HiSeq X (RAD-seq, 300 individuals, 15 populati... | GSA Portal |
| F. multistriata | RNA | CNR0950002 | Other | BGISEQ-500 (light-driven skin color) | External Link |
| G. brevicaudus | RNA | PRJNA1267255 | BioProject | Illumina HiSeq X Ten | Project Portal |
| G. glis | DNA | PRJNA399435 | BioProject | Illumina (Zoonomia Consortium, WGS ~30×) | Project Portal |
| H. aspersa | DNA | PRJEB75283 | BioProject | PacBio HiFi + Hi-C (Darwin Tree of Life, 2.9 Gb) | Project Portal |
| H. aspersa | RNA | SRR5273244 | SRA | Ion Torrent PGM | Run Selector |
| H. aspersa | RNA (2nd) | SRX10567815 | SRA | BGISEQ-500 | Run Selector |
| I. tridecemlineatus | DNA | PRJNA169495 | BioProject | Illumina | Project Portal |
| I. tridecemlineatus | DNA (2nd) | SRR352220 | SRA | 454/Roche (transcriptome) | Run Selector |
| I. tridecemlineatus | RNA | SRR352220 | SRA | 454/Roche (deep transcriptome, multiple tissues) | Run Selector |
| I. tridecemlineatus | MIC | PRJNA676170 | BioProject | 16S rRNA (MiSeq, gut microbiome) | Project Portal |
| L. agilis | MIC | PRJNA837731 | BioProject | PacBio + Illumina (WGS) | Project Portal |
| L. sylvaticus | DNA | GCA_028564925.1 | NCBI Assembly | Chromosome-level genome (aRanSyl1.merge) | Annotated Genome |
| L. sylvaticus | RNA | PRJNA392411 | BioProject | Illumina HiSeq 2500 (ventral skin, Bd exposure) | Project Portal |
| L. sylvaticus | RNA (2nd) | PRJNA1162529 | BioProject | Next-generation sequencing (liver miRNA, anoxia) | Project Portal |
| M. auratus | DNA | PRJNA77669 | BioProject | Sanger + 454 + Illumina (WGS) | Project Portal |
| M. auratus | RNA | PRJNA952603 | BioProject | Illumina NovaSeq 6000 (lung, COVID-19) | Project Portal |
| M. auratus | RNA (2nd) | PRJNA646705 | BioProject | HiSeq X Ten (single-cell, multiple organs) | Project Portal |
| M. avellanarius | DNA | PRJEB64959 | BioProject | PacBio HiFi | Project Portal |
| M. himalayana | DNA | PRJNA407692 | BioProject | Illumina + PacBio (WGS, ~2.4 Gb) | Project Portal |
| M. himalayana | RNA | GSE2024 | GEO | Affymetrix (hibernating liver transcriptome) | Series Matrix |
| M. lucifugus | DNA | PRJNA16951 | BioProject | Sanger/454 (little brown bat genome) | Project Portal |
| M. monax | DNA | PRJNA587092 | BioProject | Illumina + PacBio + ONT | Project Portal |
| M. murinus | DNA | PRJNA608066 | BioProject | 10x Genomics/PacBio/Hi-C | Project Portal |
| M. murinus | DNA (2nd) | PRJNA560399 | BioProject | Illumina (ddRAD-seq, 480 samples) | Project Portal |
| M. murinus | RNA | SRX270644 | SRA | Illumina (total RNA-seq, liver) | Run Selector |
| M. reevesii | DNA | PRJNA663192 | BioProject | Nanopore + Hi-C | Project Portal |
| M. reevesii | RNA | SRP153785 | SRA | Illumina HiSeq 2500 | Run Selector |
| M. reevesii | RNA (2nd) | SRP310345 | SRA | Illumina HiSeq 4000 | Run Selector |
| N. natrix | DNA | GenBank | Other | Sanger (ND4+cyt b) | External Link |
| N. natrix | DNA (2nd) | GenBank | Other | Sanger+Microsatellites | External Link |
| N. noctula | DNA | KF111725 | Other | Sanger (mitochondrial genome, 17,478 bp) | External Link |
| N. noctula | MIC | PRJNA1173750 | BioProject | Illumina (16S rRNA, V3-V4) | Project Portal |
| N. parkeri | DNA | PRJNA243398 | BioProject | Illumina (whole-genome sequencing, 2.3 Gb) | Project Portal |
| N. parkeri | RNA | PRJCA010139 | GSA | Illumina HiSeq 2500 | GSA Portal |
| N. pygmaeus | DNA | PRJNA658234 | BioProject | PacBio Sequel + Illumina HiSeq (WGS, ~30×) | Project Portal |
| N. pygmaeus | RNA | CRA003461 | GSA | Illumina HiSeq 2000 (8 tissues) | GSA Portal |
| P. mucosus | DNA | PRJNA955401 | BioProject | stLFR + BGISEQ-500 | Project Portal |
| P. mucosus | RNA | PRJNA955401 | BioProject | BGISEQ-500 (8 tissues) | Project Portal |
| P. nigromaculatus | DNA | KT878718 | Other | Illumina (mitochondrial genome, 17,567 bp) | External Link |
| P. nigromaculatus | RNA | GEGI01000000 | Other | Illumina HiSeq 2500 | External Link |
| P. nuttallii | MIC | PRJNA1068550 | BioProject | 16S rRNA (gut microbiome, agricultural habitat) | Project Portal |
| R. dybowskii | DNA | NC_023528 | Other | Illumina (mitochondrial genome, 18,864 bp) | External Link |
| R. dybowskii | RNA | PRJNA1152028 | BioProject | Illumina (skin, Aeromonas infection) | Project Portal |
| R. dybowskii | MIC | PRJNA428920 | BioProject | 16S rRNA (MiSeq, 7 developmental stages) | Project Portal |
| R. dybowskii | MIC (2nd) | PRJNA516808 | BioProject | 16S rRNA (captivity & season) | Project Portal |
| R. ferrumequinum | DNA | PRJNA209409 | BioProject | Illumina HiSeq 2000 | Project Portal |
| R. ferrumequinum | DNA (2nd) | PRJNA638701 | BioProject | Illumina HiSeq X Ten (WGS+RNA) | Project Portal |
| R. ferrumequinum | RNA | PRJNA638701 | BioProject | Illumina HiSeq X Ten (gut) | Project Portal |
| R. ferrumequinum | RNA (2nd) | PRJNA515764 | BioProject | Illumina HiSeq 2500 (cochlea) | Project Portal |
| R. ferrumequinum | MIC | SRR8756041 | SRA | 16S rRNA (MiSeq, seasonal) | Run Selector |
| S. beecheyi | DNA | PRJNA1214846 | BioProject | PacBio CLR + Hi-C (Omni-C) | Project Portal |
| S. beecheyi | RNA | PRJNA804109 | BioProject | Illumina NovaSeq 6000 (13 tissues) | Project Portal |
| S. citellus | DNA | PRJEB73447 | BioProject | PacBio HiFi + Hi-C | Project Portal |
| S. dauricus | DNA | SRR35199706–SRR35199778 | SRA | Illumina HiSeq | Run Selector |
| S. dauricus | RNA | CRA005977 | GSA | BGISEQ-500 | GSA Portal |
| S. dauricus | PRT | PX251470–PX251542 | SRA | Illumina HiSeq | Run Selector |
| S. lateralis | DNA | KP698975 | Other | PCR + Sanger (mitochondrial genome, 16,457 bp) | External Link |
| S. lateralis | RNA | GSE2024 | GEO | Affymetrix (hibernating liver transcriptome) | Series Matrix |
| T. aculeatus | DNA | GCF_015852505.1 | NCBI Assembly | PacBio HiFi + Hi-C (chromosome-level) | Annotated Genome |
| T. aculeatus | DNA (2nd) | PRJNA399390 | BioProject | Illumina/PacBio/Hi-C (draft genome) | Project Portal |
| T. aculeatus | RNA | SRP027593 | SRA | Illumina GAIIx (venom gland transcriptome) | Run Selector |
| T. asiaticus | DNA | PRJNA39823 | BioProject | RefSeq Genome (Tscherskia triton) | Project Portal |
| T. carolina | DNA | PRJNA1138435 | BioProject | Illumina Hi-Seq | Project Portal |
| T. carolina | DNA (2nd) | PRJNA563121 | BioProject | ddRADseq | Project Portal |
| T. carolina | MIC | PRJNA924021 | BioProject | Illumina (16S rRNA) | Project Portal |
| T. hermanni | DNA | PRJNA954578 | BioProject | Illumina MiSeq (SNPSTR) | Project Portal |
| T. sibiricus | DNA | GCA_025594165.1 | NCBI Assembly | Illumina + PacBio + Hi-C | Annotated Genome |
| T. sibiricus | DNA (2nd) | KF668525 | Other | Sanger (mitochondrial genome, 16,558 bp) | External Link |
| T. sibiricus | RNA | PRJNA1101579 | BioProject | Illumina NovaSeq 6000 | Project Portal |
| T. sibiricus | RNA (2nd) | SRR19961278 | SRA | Illumina Hiseq X10 (transcriptome) | Run Selector |
| T. sibiricus | MET | MSV000100661 | Other | LC-MS/MS | External Link |
| T. sibiricus | MIC | PRJNA264760 | BioProject | 16S rRNA (MiSeq, seasonal) | Project Portal |
| U. americanus | DNA | PRJNA777227 | BioProject | PacBio HiFi + Omni-C | Project Portal |
| U. americanus | PRT | PXD028765 | PRIDE | Orbitrap Exploris | Protein Results |
| U. americanus | MET | ST000456 | Other | GC-Orbitrap | External Link |
| U. americanus | MIC | PRJEB29403 | BioProject | Illumina (16S rRNA) | Project Portal |
| U. arctos | DNA | PRJNA807323 | BioProject | Illumina/PacBio | Project Portal |
| U. arctos | RNA | PRJNA1194020 | BioProject | PacBio Iso-Seq + Illumina | Project Portal |
| U. arctos | RNA (2nd) | PRJNA835146 | BioProject | Illumina HiSeq 2500 (13 tissues) | Project Portal |
| U. arctos | PRT | PXD003946 | PRIDE | Q Exactive (LC-MS/MS) | Protein Results |
| U. arctos | PRT (2nd) | PXD030482 | PRIDE | Orbitrap Exploris (plasma) | Protein Results |
| U. arctos | MET | PXD003946 | PRIDE | Targeted MS (carnitines, PC, amino acids) | Protein Results |
| U. arctos | EPT | PRJNA1104803 | BioProject | Illumina | Project Portal |
| U. maritimus | DNA | PRJNA720153 | BioProject | Illumina (Paleogenome ~20x) | Project Portal |
| U. maritimus | RNA | PRJNA514749 | BioProject | Illumina + ONT (Long-read) | Project Portal |
| U. maritimus | EPT | PRJNA865071 | BioProject | Illumina (WGBS) | Project Portal |
| U. maritimus | MIC | PRJNA542176 | BioProject | 16S rRNA (MiSeq) | Project Portal |
| U. parryii | DNA (2nd) | GenBank | Other | Sanger (cyt-b + 8 nDNA loci) | External Link |
| U. parryii | PRT | PXD034567 | PRIDE | Exploris 480 | Protein Results |
| U. parryii | MET | ST000987 | Other | LC-QTOF | External Link |
| U. thibetanus | DNA | SRP224444 | SRA | Illumina | Run Selector |
| U. thibetanus | DNA (2nd) | DQ402478 | Other | Sanger (mitochondrial genome) | External Link |
| U. thibetanus | MIC | PRJNA407583 | BioProject | Illumina MiSeq (16S V3-V4) | Project Portal |
| U. thibetanus | MIC (2nd) | PRJNA1154995 | BioProject | Shotgun Metagenomics | Project Portal |
| P. spp. | DNA | PRJNA738137 | BioProject | Illumina (P. versicolor WGS, 1.6 Gb) | Project Portal |
| P. spp. | RNA | PRJNA718616 | BioProject | RNA-seq (P. vlangalii, altitude adaptation) | Project Portal |
| P. spp. | RNA (2nd) | GSE179069 | GEO | RNA-seq (P. versicolor, skin pigmentation) | Series Matrix |
| P. spp. | MIC | PRJNA850661 | BioProject | 16S rRNA (P. vlangalii gut microbiome, altitude gradien... | Project Portal |
Raw Data Download
When Processed Data is Unavailable
For datasets where only raw sequencing reads are available, you will need to run bioinformatics pipelines. Here is the standard workflow:
Step 1: Get Raw Reads
fasterq-dump SRRxxxxxx --split-files
Step 2: Quality Control
trimmomatic PE R1.fq R2.fq out1 out2 ...
Step 3: Align & Count
featureCounts -a genes.gtf -o counts.txt
Step 4: Differential Analysis
dds <- DESeqDataSetFromMatrix(...)
dds <- DESeq(dds)
Recommended Analysis Pipeline
- Download raw reads using SRA Toolkit (
prefetch+fasterq-dump) - Quality control: FastQC + Trimmomatic
- Alignment: HISAT2/STAR for RNA-seq; BWA for DNA
- Quantification: featureCounts or salmon
- Differential analysis: DESeq2/edgeR in R
- Functional annotation: clusterProfiler (GO/KEGG)