Platform Documentation
Learn how to browse, search, analyze, and export peptide data using the Pep2Net-DB platform.
Platform Overview
Pep2Net-DB — a curated peptide evidence database integrating sequences, assays, structures, and ML predictions.
Pep2Net-DB is built on a PostgreSQL database storing thousands of peptide entries sourced from public databases (DRAMP, APD, etc.) and literature. Each entry links a peptide sequence with its biological source, functional annotations, experimental assay evidence, computed physicochemical properties, predicted 3D structures, and machine-learning-based function predictions.
The platform uses the ESM-2 + GATv2 deep learning architecture for function prediction, providing consistent and reliable results across 40+ biological activity models.
Data Highlights
What makes this database distinctive.
- Integrated Sequence + Assay Evidence — Each peptide entry links directly to its experimental assay records (MIC, IC50, EC50, LD50) with target organisms, measurement units, PubMed references, and source provenance.
- Predicted 3D Structures — ESMFold-generated PDB files available for structure-ready peptides, visualized interactively with 3Dmol.js in the browser.
- Structure Similarity Search — Find peptides with similar 3D conformations by uploading a PDB file or referencing an existing structure. Uses PDB coordinate parsing with a precomputed signature cache.
- Sequence Similarity Search — k-mer prefiltering + global alignment scoring for finding similar peptides, with adjustable identity and length tolerance thresholds.
- Curated Function Ontology — Hierarchical function classification tree built from the database content, from broad Antimicrobial/Anticancer categories down to specific organism-level subtypes.
- ESM-2 + GATv2 Multi-label Predictions — Toxicity bundle predicts 6 labels simultaneously (Toxin, Hemolytic, Cytotoxic, Cytolytic, Mammalian Toxicity, Celiac Toxicity). Antiviral bundle predicts 7 labels. Subtype models share the same backbone.
- Streaming Data Export — Download presets for evidence-backed subsets (non-hemolytic candidates, MIC/MBC potency, antimicrobial collections) in CSV, FASTA, or JSON with streaming for large datasets.
- Unique Sequence Deduplication — Browse and analyze by unique peptide sequence, not just by entry ID, to avoid inflated statistics from duplicate records.
Key Features
Six core capabilities of the Pep2Net-DB platform.
Curated Database
Access standardized peptide data with comprehensive annotations including sequence, function, source, and cross-references.
Smart Browsing
Filter peptides by type, source organism, modifications, and biological function with intuitive interfaces.
ML-Based Predictions
Single ESM-2 + GATv2 architecture for 40+ prediction tasks — Toxicity, Antiviral, Antibacterial, and more.
Flexible Export
Download datasets in CSV, FASTA, or JSON format with complete annotation information.
Interactive Statistics
ECharts-powered interactive charts showing database composition across dimensions.
Browse Database
Explore peptides through interactive filters and multiple classification dimensions.
The Browse page lets you navigate the entire peptide database through multiple classification dimensions. Clicking any entry opens a detailed single-peptide view.
Filter Dimensions
Peptide Type
- Linear: Straight chain peptides
- Cyclic: Head-to-tail cyclized
- Branched: Multi-chain structures
- Other: Special topologies
Source Organism
- Animal: Mammalian, amphibian, insect
- Plant: Botanical sources
- Bacterium: Microbial-derived
- Synthetic: Lab-designed
Modifications
- N-terminal: Acetylation, formylation
- C-terminal: Amidation, carboxylation
- Chemical: Side chain modifications
Biological Function
- Antimicrobial: Antibacterial, antifungal
- Therapeutic: Anticancer, anti-aging
- Functional: Signaling, cell-penetrating
Single Peptide View
Clicking any entry opens a comprehensive detail page featuring the full amino acid sequence, all computed physicochemical properties (molecular weight, pI, net charge, GRAVY, hydrophobic moment, instability index, Boman index), functional annotations, and linked assay evidence records.
Interactive 3D Structure Viewer — powered by 3Dmol.js, predicted PDB structures are rendered directly in the browser with zoom, rotation, and style toggles. Previous / Next peptide navigation enables seamless browsing through the database.
Function Ontology Browser
A hierarchical function ontology tree built from the database lets you navigate from broad categories (e.g., Antimicrobial) down to specific subtypes (e.g., Anti-Gram-positive, Anti-MRSA), with counts showing how many peptides belong to each node. Filter controls allow narrowing by evidence status, sequence length, and standard/non-standard sequences.
Other Browse Views
- Unique Sequence Browse — Deduplicated view grouping all entries by unique peptide sequence.
- Assay Record Browse — Paginated raw experimental evidence table with target organisms, assay methods (MIC, IC50, EC50, LD50), and activity values.
Search
Find peptides by sequence, name, cross-reference, or structural features.
The Search page offers two search modes with downloadable results (CSV export, up to 50,000 records):
Sequence Search
Search by peptide name (e.g., LL-37), amino acid sequence, or database cross-reference (DRAMP ID, APD ID, PubMed ID). Apply additional filters for sequence length, molecular weight range, isoelectric point, source organism, and function category.
Sequence Similarity Search — paste a query sequence to find similar peptides using k-mer prefiltering followed by global alignment scoring. Adjustable parameters include minimum identity, length tolerance, maximum candidates to scan, and result count.
Structure Search
Search by 3D structural similarity in two ways: select an existing peptide from the database (using its predicted PDB structure), or upload your own PDB file. The search generates a structure signature from PDB coordinates and compares against a precomputed cache for fast similarity ranking, with configurable score threshold and bounded candidate scanning.
Prediction Tools
Compute physicochemical properties and run ML-based function predictions.
The Tools page provides two categories: Physicochemical Property Calculations (deterministic) and Machine Learning Function Predictions (ESM-2 + GATv2).
How to Use
Input Sequences
Paste one or more peptide sequences (one per line) or upload a FASTA/CSV file.
Select Models
Choose from available physicochemical calculators and ML function classifiers.
Run Analysis
Click Submit to execute all selected predictions. ML results include confidence scores.
Export Results
Download prediction results as a CSV file for offline analysis.
Physicochemical Properties (Deterministic)
Computed without machine learning — fast and deterministic:
ESM-2 + GATv2 Architecture
Unified deep learning architecture for all function prediction models.
All ML-based function prediction models share a unified architecture:
- ESM-2 Backbone: 650M-parameter protein language model (Meta AI) generates rich per-residue sequence embeddings that capture evolutionary and structural information.
- GATv2 Layers: Graph Attention Networks v2 with dynamic attention mechanism process the sequence as a graph, learning residue-residue interaction patterns.
- Bundle Models: Top-level models (
toxicity,antiviral) perform multi-label classification (6 and 7 labels respectively). Selecting individual subtypes automatically promotes the parent bundle for efficient inference. - Single-Label Models: Subtype models for antibacterial, therapeutic, antifungal, and other categories use specialized prediction heads sharing the same backbone.
Prediction Workflow
- Sequence embedding generation using ESM-2
- Graph construction and feature propagation
- Attention-based feature aggregation with GATv2
- Task-specific classification and confidence scoring
Available Predictions
40+ trained models using the ESM-2 + GATv2 unified architecture.
Antimicrobial Activity
Therapeutic Activity
Functional Peptides
Plant & Defense
Toxicity
Data Download
Export curated peptide datasets with complete annotations.
The Download page provides curated preset data packages for offline analysis in CSV, FASTA, and JSON formats, with streaming export for large datasets.
Available Preset Packages
Evidence-Based Subsets
- Verified Peptide Index — experimentally verified entries
- Activity-Supported Peptides — entries with quantitative evidence
- MIC/MBC Potency — peptides with potency data
- PMID-Supported Evidence — literature-linked records
Function-Focused Collections
- Antimicrobial Sequences — antibacterial, antifungal set
- Antibacterial / Antiviral / Anticancer — activity-specific exports
- Non-Hemolytic Candidates — therapeutic development subset
- Structure-Ready Index — peptides with predicted 3D structures
Source-Specific Packages
- Animal / Synthetic / Bacterium
- Plant / Virus
- Verified Assay Evidence
Statistics
Interactive ECharts visualizations summarizing database contents.
The Statistics page presents interactive charts built with ECharts, backed by a precomputed JSON cache for fast loading:
- Peptide Type Distribution — pie/bar chart across linear, cyclic, branched, and other topologies.
- Source Organism Breakdown — distribution across animals, plants, bacteria, viruses, fungi, and synthetic sources.
- Function Category Overview — top functions, sources, and assay methods ranked by count.
- Sequence & Property Distributions — histograms of sequence length, molecular weight, pI, net charge, GRAVY, hydrophobic moment, instability index, and Boman index across the entire database.
- Evidence Coverage — verified vs unverified peptides, PMID-linked records, unique vs duplicate sequences, and standard vs nonstandard amino acid composition.
- Top Studied Peptides — peptides with the most assay records and activity evidence.
All charts support hover details, zoom, and series toggling.