Non-linear Dimensionality Reduction

t-SNE & UMAP Analysis:
High-Dimensional Data Visualization Platform

Interactive web-based dimensionality reduction analysis supporting t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection). Generate publication-ready visualizations with automatic R code generation for single-cell transcriptomics and high-dimensional data analysis.

t

t-SNE

Preserves local neighborhood structure by converting high-dimensional Euclidean distances into conditional probabilities. Particularly effective for revealing clusters in single-cell RNA sequencing data and visualizing complex cell populations.

  • Perplexity-based neighborhood sizing
  • Barnes-Hut O(n log n) approximation
  • Package: Rtsne
U

UMAP

Theoretically grounded in Riemannian geometry and algebraic topology. Faster than t-SNE with better preservation of global data structure while maintaining local neighborhood relationships. Ideal for large-scale datasets.

  • n_neighbors & min_dist control
  • Multi-core parallel processing
  • Package: uwot

Analysis Parameters

Configure dimensionality reduction settings and generate R code

Method Selection

Upload Expression Matrix

CSV/TSV format: Cells × Genes

30

Must be less than (n_cells - 1) / 3

0 = exact, >0 = approximation

0
Cells
0
Features
t-SNE
Method
analysis_script.R
# Configure parameters and click "Generate R Code"
# Complete analysis script will appear here

Visualization Preview

Upload coordinate files or run generated R code

Upload t-SNE/UMAP Coordinates CSV

Format: Dim1, Dim2, Label (optional)

Implementation Guide

Complete workflow documentation for dimensionality reduction analysis

t t-SNE Implementation (Rtsne Package)

Key Parameters

  • perplexity: Balance between local and global aspects. Default 30. Constraint: perplexity ≤ (n-1)/3
  • theta: Barnes-Hut approximation threshold. Use 0.5 for large datasets (>1000 cells), 0.0 for exact computation.
  • max_iter: Number of optimization iterations. Minimum 1000 recommended for convergence.
  • pca: Preprocess with PCA. Strongly recommended to reduce to 30-50 PCs before t-SNE for single-cell data.

Critical Considerations

For high-dimensional sparse data (e.g., scRNA-seq), always perform PCA preprocessing first. Direct t-SNE on >1000 features is computationally expensive and may produce noisy results. Theta parameter controls the speed/accuracy trade-off.

Workflow Steps

  1. Load expression matrix (cells as rows, genes as columns)
  2. Remove zero-variance genes and perform quality control
  3. Apply PCA to reduce dimensions to 30-50 components
  4. Run Rtsne on PCA-reduced data with appropriate perplexity
  5. Visualize results using ggplot2 with cell type annotations
  6. Export coordinates and high-resolution figures

U UMAP Implementation (uwot Package)

Key Parameters

  • n_neighbors: Controls local vs global structure trade-off (similar to perplexity). Range: 2-100. Default 15.
  • min_dist: Controls clustering tightness. Lower values (0.01-0.1) create tighter clusters, higher values (0.5+) expand them.
  • metric: Distance metric. "euclidean" for general use, "cosine" for gene expression (normalized data).
  • n_threads: Parallel processing. Use parallel::detectCores() - 1 for optimal speed.

Comparison with t-SNE

Speed: UMAP is significantly faster O(n) vs O(n log n)
Global Structure: UMAP better preserves inter-cluster relationships
Reproducibility: Both require set.seed() for consistent results
Memory: UMAP handles larger datasets (>100k cells) more efficiently

Parameter Tuning Guidelines

For single-cell data, start with n_neighbors=15 and min_dist=0.3. Decrease min_dist to 0.1 for tighter clusters, increase to 0.5 for more dispersed visualization. The n_neighbors parameter behaves similarly to perplexity in t-SNE but typically uses smaller values (5-50).

Input Data Format

Expression Matrix

  • • Rows: Cells/observations
  • • Columns: Genes/features
  • • Format: CSV or TSV
  • • First column: Cell IDs (rownames)
  • • Values: Normalized or raw counts

Metadata (Optional)

  • • Rows: Cell IDs (matching expression matrix)
  • • Columns: cell_type, batch, condition
  • • Used for coloring points in visualization
  • • First column: Cell IDs