Algorithm & Calculation Logic#
1. Algorithm Overview#
HBAT uses a geometric approach to identify hydrogen bonds by analyzing distance and angular criteria between donor-hydrogen-acceptor triplets. The main calculation is performed by the NPMolecularInteractionAnalyzer class in hbat/core/np_analyzer.py, which provides enhanced performance through NumPy vectorization.
Module Structure (Updated):
- hbat/core/analyzer.py: Main analysis engine interface
- hbat/core/np_analyzer.py: High-performance NumPy-based implementation
- hbat/core/interactions.py: Interaction data classes and structures
- hbat/core/parameters.py: Analysis parameters and constants
- hbat/ccd/ccd_analyzer.py: CCD data management and BinaryCIF parsing
2. Core Calculation Steps#
Step 1: Donor-Acceptor Identification#
- Donors: Heavy atoms (N, O, S) bonded to hydrogen atoms ( - _get_hydrogen_bond_donors())
- Acceptors: Electronegative atoms (N, O, S, F, Cl) ( - _get_hydrogen_bond_acceptors())
Step 2: Distance Criteria#
Two distance checks are performed:
- H…A Distance: Hydrogen to acceptor distance - Cutoff: 3.5 Å (default from - ParametersDefault.HB_DISTANCE_CUTOFF)
- Calculated: Using 3D Euclidean distance via - Vec3D.distance_to()
 
- D…A Distance: Donor to acceptor distance - Cutoff: 4.0 Å (default from - ParametersDefault.HB_DA_DISTANCE)
- Purpose: Ensures realistic hydrogen bond geometry 
 
Step 3: Angular Criteria#
- Angle: D-H…A angle using - angle_between_vectors()from- hbat/core/vector.py
- Cutoff: 120° minimum (default from - ParametersDefault.HB_ANGLE_CUTOFF)
- Calculation: Uses vector dot product formula: - cos(θ) = (BA·BC)/(|BA||BC|)
3. Geometric Validation Process#
def _check_hydrogen_bond(donor, hydrogen, acceptor):
    # Distance validation
    h_a_distance = hydrogen.coords.distance_to(acceptor.coords)
    if h_a_distance > 3.5:  # Distance cutoff
        return None
    d_a_distance = donor.coords.distance_to(acceptor.coords)
    if d_a_distance > 4.0:  # Donor-acceptor cutoff
        return None
    # Angular validation
    angle = angle_between_vectors(donor.coords, hydrogen.coords, acceptor.coords)
    if math.degrees(angle) < 120.0:  # Angle cutoff
        return None
    # Bond classification and creation
    return HydrogenBond(...)
4. Key Parameters and Defaults#
From hbat/constants/parameters (ParametersDefault class):
| Parameter | Default Value | Description | 
|---|---|---|
| 
 | 3.5 Å | Maximum H…A distance | 
| 
 | 120.0° | Minimum D-H…A angle | 
| 
 | 4.0 Å | Maximum D…A distance | 
| 
 | 0.6 | Van der Waals to covalent bond factor | 
| 
 | 2.5 Å | Maximum covalent bond distance | 
| 
 | 0.5 Å | Minimum realistic bond distance | 
5. PDB Structure Fixing and Preprocessing#
Missing Hydrogen Atom Detection#
HBAT now includes automatic PDB structure fixing capabilities to handle structures with missing hydrogen atoms:
PDB Fixing Methods:
- OpenBabel Method (default): - Uses OpenBabel’s built-in hydrogen addition functionality - Fast and efficient for standard residues - Preserves original atom coordinates 
- PDBFixer Method: - Uses PDBFixer library with OpenMM - More comprehensive fixing capabilities - Can add missing heavy atoms and replace nonstandard residues 
PDB Fixing Parameters:
From ParametersDefault class:
| Parameter | Default Value | Description | 
|---|---|---|
| 
 | True | Enable/disable PDB structure fixing | 
| 
 | “openbabel” | Choose fixing method (“openbabel” or “pdbfixer”) | 
| 
 | True | Add missing hydrogen atoms | 
| 
 | False | Add missing heavy atoms (PDBFixer only) | 
| 
 | False | Replace nonstandard residues | 
| 
 | False | Remove heterogens | 
| 
 | True | Keep water when removing heterogens | 
Workflow Process:
- Input validation: Check if PDB fixing is enabled and needed 
- Method selection: Choose between OpenBabel or PDBFixer 
- Structure fixing: Add missing atoms and fix structural issues 
- Output generation: Create fixed PDB file (e.g., - structure_fixed.pdb)
- Analysis continuation: Use fixed structure for interaction analysis 
6. CCD Data Integration and Bond Detection#
Chemical Component Dictionary (CCD) Integration#
HBAT now integrates with the RCSB Chemical Component Dictionary (CCD) for accurate bond information:
CCD Data Manager:
- Automatically downloads CCD BinaryCIF files from RCSB 
- Atom data: - cca.bcifcontaining atomic properties
- Bond data: - ccb.bcifcontaining bond connectivity information
- Storage location: - ~/.hbat/ccd-data/directory
- Auto-download: Files are downloaded on first use and cached locally 
Bond Detection Priority (Updated)#
The enhanced bond detection follows this priority:
- RESIDUE_LOOKUP (new, preferred): - Uses pre-defined bond information from CCD for standard residues 
- Provides chemically accurate bond connectivity 
- Includes bond order (single/double) and aromaticity information 
- Covers all standard amino acids and nucleotides 
 
- CONECT Records (if available): - Parses explicit bond information from CONECT records in the PDB file 
- Creates bonds with - bond_type="explicit"
- Preserves author-specified connectivity 
 
- Distance-based Detection (fallback): - Only used when no CONECT records are present or no bonds were found 
- Uses optimized spatial grid algorithm for large structures 
- Implements - _are_atoms_bonded_with_distance()method
 
Distance-based Bond Criteria#
When detecting bonds by distance:
- Van der Waals radii from - AtomicData.VDW_RADII
- Distance criteria: - MIN_BOND_DISTANCE ≤ distance ≤ min(vdw_cutoff, MAX_BOND_DISTANCE)
- VdW cutoff formula: - vdw_cutoff = (vdw1 + vdw2) × COVALENT_CUTOFF_FACTOR
- Example: C-C bond = (1.70 + 1.70) × 0.6 = 2.04 Å maximum (but limited to 2.5 Å by MAX_BOND_DISTANCE) 
Bond Types#
- "residue_lookup": Bonds from CCD residue definitions
- "explicit": Bonds from CONECT records
- "covalent": Bonds detected by distance criteria
7. Performance Optimization and Vectorization#
NumPy-based High-Performance Analyzer#
HBAT now uses a high-performance NumPy-based analyzer (NPMolecularInteractionAnalyzer) for enhanced computational efficiency:
Key Optimizations:
- Vectorized Distance Calculations: - Uses - compute_distance_matrix()for batch distance calculations - Replaces nested loops with NumPy array operations - Reduces computational complexity from O(n²) to O(n) for many operations
- Spatial Indexing: - Pre-computed atom indices by type (hydrogen, donor, acceptor) - Optimized residue indexing for fast same-residue filtering - Grid-based spatial partitioning for bond detection 
- Batch Processing: - Vectorized angle calculations using NumPy operations - Simultaneous processing of multiple atom pairs - Optimized memory access patterns 
Performance Benefits:
- Large structures: Significant speedup for structures with >1000 atoms 
- Memory efficiency: Reduced memory allocation overhead 
- Scalability: Better performance scaling with structure size 
Spatial Grid Algorithm#
For distance-based bond detection, HBAT uses a spatial grid algorithm:
Grid Setup:
- Grid cell size based on MAX_BOND_DISTANCE (2.5 Å)
- Atoms are assigned to grid cells based on coordinates
- Only neighboring cells are checked for potential bonds
Benefits: - Reduces bond detection complexity from O(n²) to approximately O(n) - Particularly effective for large molecular systems - Maintains accuracy while improving performance
8. Vector Mathematics#
The NPVec3D class (hbat/core/np_vector.py) provides NumPy-based vector operations:
- 3D coordinates: - NPVec3D(x, y, z)or- NPVec3D(np.array([x, y, z]))
- Batch operations: Support for multiple vectors simultaneously - NPVec3D(np.array([[x1,y1,z1], [x2,y2,z2]]))
- Distance calculation: - √[(x₂-x₁)² + (y₂-y₁)² + (z₂-z₁)²]with vectorized operations
- Angle calculation: - arccos(dot_product / (mag1 × mag2))using NumPy for efficiency
- Performance: Leverages NumPy’s optimized C implementations for mathematical operations 
9. Enhanced Analysis Flow#
Updated Analysis Process:
- Structure preprocessing → PDB fixing if enabled (add missing H atoms) 
- CCD data loading → Download/load chemical component dictionary 
- Parse PDB file → Extract atomic coordinates from fixed structure 
- Bond detection → Apply RESIDUE_LOOKUP → CONECT → Distance-based priority 
- Identify donors → Find N/O/S atoms bonded to H 
- Identify acceptors → Find N/O/S/F/Cl atoms 
- Distance screening → Apply H…A and D…A cutoffs (vectorized) 
- Angular validation → Check D-H…A geometry (batch processing) 
- Bond classification → Determine bond type (e.g., “N-H…O”) 
- Cooperativity analysis → Identify interaction chains 
10. Output Structure and Analysis Summary#
Enhanced Analysis Summary:
The analysis now provides comprehensive summary information including:
- Structure Information: Original vs. fixed structure statistics 
- PDB Fixing Details: Atoms added, bonds created, method used 
- Bond Detection Statistics: Counts by detection method (residue_lookup, explicit, covalent) 
- Performance Metrics: Analysis timing information 
- Interaction Counts: Detailed breakdown by interaction type 
Interaction Data Classes:
Each detected interaction is stored with enhanced information:
- HydrogenBond: Donor, hydrogen, acceptor atoms with geometric parameters 
- HalogenBond: Halogen, carbon, acceptor atoms with X-bond specifics 
- PiInteraction: Donor, hydrogen, aromatic ring center coordinates 
- CooperativityChain: Linked interaction sequences 
11. Additional Features#
Halogen Bonds#
HBAT also detects halogen bonds (X-bonds) using similar geometric criteria:
- Distance: X…A ≤ 4.0 Å 
- Angle: C-X…A ≥ 120° 
- Halogens: F, Cl, Br, I 
π Interactions#
X-H…π interactions are detected using the aromatic ring center as a pseudo-acceptor:
Aromatic Ring Center Calculation (_calculate_aromatic_center())#
The center of aromatic rings is calculated as the geometric centroid of specific ring atoms:
Phenylalanine (PHE): - Ring atoms: CG, CD1, CD2, CE1, CE2, CZ (6-membered benzene ring) - Forms a planar hexagonal structure
Tyrosine (TYR): - Ring atoms: CG, CD1, CD2, CE1, CE2, CZ (6-membered benzene ring) - Same as PHE but with hydroxyl group at CZ
Tryptophan (TRP): - Ring atoms: CG, CD1, CD2, NE1, CE2, CE3, CZ2, CZ3, CH2 (9-atom indole system) - Includes both pyrrole and benzene rings
Histidine (HIS): - Ring atoms: CG, ND1, CD2, CE1, NE2 (5-membered imidazole ring) - Contains two nitrogen atoms in the ring
Centroid Calculation Process#
# For each aromatic residue:
center = Vec3D(0, 0, 0)
for atom_coord in ring_atoms_coords:
    center = center + atom_coord
center = center / len(ring_atoms_coords)  # Average position
π Interaction Geometry Validation (_check_pi_interaction())#
Once the aromatic center is calculated:
- Distance Check: H…π center distance - Cutoff: ≤ 4.5 Å (from - ParametersDefault.PI_DISTANCE_CUTOFF)
- Calculation: 3D Euclidean distance from hydrogen to ring centroid 
 
- Angular Check: D-H…π angle - Cutoff: ≥ 90° (from - ParametersDefault.PI_ANGLE_CUTOFF)
- Calculation: Angle between donor-hydrogen vector and hydrogen-π_center vector 
- Uses same - angle_between_vectors()function as regular hydrogen bonds
 
Geometric Interpretation#
- The aromatic ring center acts as a “virtual acceptor” representing the π-electron cloud 
- Distance measures how close the hydrogen approaches the aromatic system 
- Angle ensures the hydrogen is positioned to interact with the π-electron density above/below the ring plane 
Cooperativity Chains#
HBAT identifies cooperative interaction chains where molecular interactions are linked through shared atoms. This occurs when an acceptor atom in one interaction simultaneously acts as a donor in another interaction.
Chain Detection Algorithm (_find_cooperativity_chains())#
Step 1: Interaction Collection - Combines all detected interactions: hydrogen bonds, halogen bonds, and π interactions - Requires minimum of 2 interactions to form chains
Step 2: Atom-to-Interaction Mapping Creates two lookup dictionaries:
- donor_to_interactions: Maps each donor atom to interactions where it participates
- acceptor_to_interactions: Maps each acceptor atom to interactions where it participates
Atom keys are tuples: (chain_id, residue_sequence, atom_name)
Step 3: Chain Building Process (_build_cooperativity_chain_unified())
Starting from each unvisited interaction:
- Initialize: Begin with starting interaction in chain 
- Follow Forward: Look for next interaction where current acceptor acts as donor 
- Validation: Ensure same atom serves dual role (acceptor → donor) 
- Iteration: Continue until no more connections found 
- Termination: π interactions cannot chain further as acceptors (no single acceptor atom) 
Chain Building Logic#
# Simplified chain building process:
chain = [start_interaction]
current_interaction = start_interaction
while True:
    current_acceptor = current_interaction.get_acceptor_atom()
    if not current_acceptor:
        break  # No acceptor atom (π interactions)
    # Find interaction where this acceptor acts as donor
    acceptor_key = (acceptor.chain_id, acceptor.res_seq, acceptor.name)
    next_interaction = None
    for candidate in donor_to_interactions[acceptor_key]:
        candidate_donor = candidate.get_donor_atom()
        if candidate_donor matches current_acceptor:
            next_interaction = candidate
            break
    if next_interaction is None:
        break  # Chain ends
    chain.append(next_interaction)
    current_interaction = next_interaction
Cooperativity Examples#
Example 1: Sequential H-bonds
Residue A (Donor) --H-bond--> Residue B (Acceptor/Donor) --H-bond--> Residue C (Acceptor)
Example 2: Mixed interactions
Residue A (N-H) --H-bond--> Residue B (O) --X-bond--> Residue C (halogen)