Non-linear Dimensionality Reduction

t-SNE & UMAP Analysis:
High-Dimensional Data Visualization Platform

Interactive web-based dimensionality reduction analysis supporting t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection). Generate publication-ready visualizations with automatic R code generation for single-cell transcriptomics and high-dimensional data analysis.

t-SNE

Preserves local neighborhood structure by converting high-dimensional Euclidean distances into conditional probabilities. Particularly effective for revealing clusters in single-cell RNA sequencing data and visualizing complex cell populations.

Perplexity-based neighborhood sizing
Barnes-Hut O(n log n) approximation
Package: Rtsne

UMAP

Theoretically grounded in Riemannian geometry and algebraic topology. Faster than t-SNE with better preservation of global data structure while maintaining local neighborhood relationships. Ideal for large-scale datasets.

n_neighbors & min_dist control
Multi-core parallel processing
Package: uwot

Analysis Parameters

Configure dimensionality reduction settings and generate R code

Method Selection

Upload Expression Matrix

CSV/TSV format: Cells × Genes

Perplexity

Must be less than (n_cells - 1) / 3

Max Iterations

Theta (Barnes-Hut)

0 = exact, >0 = approximation

PCA Preprocessing

Reduce to 30 PCs before DR (recommended)

Cell Labels/Meta

Upload cluster annotations for coloring

R Backend Demo

Cells

Features

t-SNE

Method

analysis_script.R

# Configure parameters and click "Generate R Code"
# Complete analysis script will appear here

Visualization Preview

Upload coordinate files or run generated R code

Upload t-SNE/UMAP Coordinates CSV

Format: Dim1, Dim2, Label (optional)

Implementation Guide

Complete workflow documentation for dimensionality reduction analysis

t t-SNE Implementation (Rtsne Package)

Key Parameters

perplexity: Balance between local and global aspects. Default 30. Constraint: perplexity ≤ (n-1)/3
theta: Barnes-Hut approximation threshold. Use 0.5 for large datasets (>1000 cells), 0.0 for exact computation.
max_iter: Number of optimization iterations. Minimum 1000 recommended for convergence.
pca: Preprocess with PCA. Strongly recommended to reduce to 30-50 PCs before t-SNE for single-cell data.

Critical Considerations

For high-dimensional sparse data (e.g., scRNA-seq), always perform PCA preprocessing first. Direct t-SNE on >1000 features is computationally expensive and may produce noisy results. Theta parameter controls the speed/accuracy trade-off.

Workflow Steps

Load expression matrix (cells as rows, genes as columns)
Remove zero-variance genes and perform quality control
Apply PCA to reduce dimensions to 30-50 components
Run Rtsne on PCA-reduced data with appropriate perplexity
Visualize results using ggplot2 with cell type annotations
Export coordinates and high-resolution figures

U UMAP Implementation (uwot Package)

Key Parameters

n_neighbors: Controls local vs global structure trade-off (similar to perplexity). Range: 2-100. Default 15.
min_dist: Controls clustering tightness. Lower values (0.01-0.1) create tighter clusters, higher values (0.5+) expand them.
metric: Distance metric. "euclidean" for general use, "cosine" for gene expression (normalized data).
n_threads: Parallel processing. Use parallel::detectCores() - 1 for optimal speed.

Comparison with t-SNE

Speed: UMAP is significantly faster O(n) vs O(n log n)

Global Structure: UMAP better preserves inter-cluster relationships

Reproducibility: Both require set.seed() for consistent results

Memory: UMAP handles larger datasets (>100k cells) more efficiently

Parameter Tuning Guidelines

For single-cell data, start with n_neighbors=15 and min_dist=0.3. Decrease min_dist to 0.1 for tighter clusters, increase to 0.5 for more dispersed visualization. The n_neighbors parameter behaves similarly to perplexity in t-SNE but typically uses smaller values (5-50).

Input Data Format

Expression Matrix

• Rows: Cells/observations
• Columns: Genes/features
• Format: CSV or TSV
• First column: Cell IDs (rownames)
• Values: Normalized or raw counts

Metadata (Optional)

• Rows: Cell IDs (matching expression matrix)
• Columns: cell_type, batch, condition
• Used for coloring points in visualization
• First column: Cell IDs

t-SNE & UMAP Analysis: High-Dimensional Data Visualization Platform

t-SNE

UMAP

Analysis Parameters

Method Selection

Visualization Preview

Implementation Guide

t t-SNE Implementation (Rtsne Package)

Key Parameters

Critical Considerations

Workflow Steps

U UMAP Implementation (uwot Package)

Key Parameters

Comparison with t-SNE

Parameter Tuning Guidelines

Input Data Format

Expression Matrix

Metadata (Optional)

t-SNE & UMAP Analysis:
High-Dimensional Data Visualization Platform