Architecture¶

This document describes MolPy's architecture, design principles, and package structure.

Design Philosophy¶

MolPy is built on these core principles:

1. Composability¶

Small, focused components that work together:

# Each component does one thing well
frame = read_pdb("protein.pdb")
topology = Topology(frame)
bonds = topology.bonds()

2. Explicit Over Implicit¶

Clear, predictable behavior:

# Explicit: Clear what's happening
merged = frame1.merge(frame2)

# Not: Implicit modification
# frame1.merge(frame2)  # Does this modify frame1?

3. Type Safety¶

Full type hints throughout:

def read_pdb(
    filepath: str | Path,
    model: int = 0
) -> Frame:
    """Read PDB file and return Frame."""
    ...

4. Immutability Where Possible¶

Core data structures behave like values:

# Operations return new objects
frame2 = frame1.select("type == 'O'")
frame3 = frame2.translate([1, 0, 0])

# Original frame1 is unchanged

5. Separation of Concerns¶

Core - Data structures (Frame, Block, Box)
IO - Reading/writing files
Compute - Calculations and analysis
Builder - System construction
Engine - External tool integration

Package Structure¶

molpy/
├── core/           # Core data structures
├── io/             # File I/O
├── compute/        # Calculations and analysis
├── data/           # Force field data
├── engine/         # Simulation engines
├── optimize/       # Geometry optimization
├── wrapper/        # External tool wrappers
├── parser/         # String parsing (SMILES, SMARTS)
├── reacter/        # Chemical reactions
├── builder/        # System builders
├── pack/           # Molecular packing
├── potential/      # Potential functions
├── adapter/        # Toolkit adapters (RDKit)
├── typifier/       # Atom typing
└── op/             # Geometric operations

Core Module¶

The core module provides fundamental data structures.

Frame¶

The central data structure representing a molecular system:

class Frame:
    """Container for molecular structure data.

    A Frame contains:
    - Named blocks of data (atoms, bonds, etc.)
    - Simulation box
    - Metadata
    """

    def __init__(self, box: Box | None = None):
        self._blocks: dict[str, Block] = {}
        self._box = box or Box()
        self._metadata: dict[str, Any] = {}

Design decisions: - Blocks are accessed by name: frame["atoms"] - Immutable operations return new Frame - Box is always present (infinite if not specified)

Block¶

Generic container for tabular data:

class Block:
    """Container for columnar data.

    Like a lightweight DataFrame:
    - Named columns
    - Homogeneous length
    - NumPy-backed storage
    """

    def __init__(self, data: dict[str, ArrayLike]):
        self._data = {k: np.asarray(v) for k, v in data.items()}

Design decisions: - Column-oriented storage - NumPy arrays for efficiency - No row-based indexing (use slicing)

Box¶

Simulation box with periodic boundaries:

class Box:
    """Simulation box for periodic boundaries.

    Supports:
    - Orthogonal boxes
    - Triclinic boxes
    - Infinite boxes
    """

    def __init__(
        self,
        lengths: Sequence[float] | None = None,
        angles: Sequence[float] | None = None
    ):
        ...

Atomistic¶

Specialized structures for atoms, bonds, angles, dihedrals:

class Atom:
    """Single atom representation."""
    id: int
    type: str
    position: np.ndarray
    mass: float

class Bond:
    """Bond between two atoms."""
    i: int  # Atom index
    j: int  # Atom index
    type: str | None

IO Module¶

The io module handles file reading and writing.

Design Pattern: Reader/Writer Classes¶

Each format has dedicated Reader and Writer classes:

class PDBReader(DataReader):
    """Read PDB files."""

    def read(self, filepath: Path) -> Frame:
        ...

class PDBWriter(DataWriter):
    """Write PDB files."""

    def write(self, filepath: Path, frame: Frame) -> None:
        ...

Factory Functions¶

Convenient functions wrap readers/writers:

def read_pdb(filepath: str | Path, **kwargs) -> Frame:
    """Read PDB file."""
    reader = PDBReader(**kwargs)
    return reader.read(Path(filepath))

Hierarchy¶

io/
├── data/           # Single-frame formats (PDB, XYZ, LAMMPS)
├── trajectory/     # Multi-frame formats
├── forcefield/     # Force field files
├── readers.py      # Factory functions for reading
└── writers.py      # Factory functions for writing

Compute Module¶

The compute module provides analysis and calculations.

Design: Functional Style¶

Computations are pure functions:

def calculate_rdf(
    frame: Frame,
    r_max: float = 10.0,
    n_bins: int = 100
) -> tuple[np.ndarray, np.ndarray]:
    """Calculate radial distribution function.

    Returns:
        (r, g_r): Distance bins and RDF values
    """
    ...

Result Objects¶

Complex results use dedicated classes:

@dataclass
class RDFResult:
    """Result of RDF calculation."""
    r: np.ndarray
    g_r: np.ndarray
    n_bins: int
    r_max: float

    def plot(self) -> None:
        """Plot the RDF."""
        ...

Wrapper vs Adapter Pattern¶

MolPy uses two distinct patterns for external integration:

Wrapper Pattern¶

Purpose: Execute external command-line tools

Example: LAMMPS, Packmol, AmberTools

class LammpsWrapper:
    """Wrapper for LAMMPS executable."""

    def run(
        self,
        script: Script,
        working_dir: Path
    ) -> subprocess.CompletedProcess:
        """Execute LAMMPS with given script."""
        cmd = [self.executable, "-in", script.path]
        return subprocess.run(cmd, cwd=working_dir)

Characteristics: - Subprocess execution - File-based communication - Error handling for external failures

Adapter Pattern¶

Purpose: Convert between MolPy and other Python libraries

Example: RDKit, MDAnalysis

class RDKitAdapter:
    """Adapter between MolPy and RDKit."""

    @staticmethod
    def to_rdkit(frame: Frame) -> Chem.Mol:
        """Convert MolPy Frame to RDKit Mol."""
        ...

    @staticmethod
    def from_rdkit(mol: Chem.Mol) -> Frame:
        """Convert RDKit Mol to MolPy Frame."""
        ...

Characteristics: - In-memory conversion - Bidirectional (to/from) - Type preservation

See Wrapper & Adapter Patterns for detailed examples.

Builder Module¶

The builder module constructs molecular systems.

Design: Builder Pattern¶

Builders provide fluent interfaces:

builder = PolymerBuilder()
polymer = (
    builder
    .add_monomer("CC(C)O", name="monomer1")
    .add_monomer("CCCO", name="monomer2")
    .set_chain_length(100)
    .set_sequence("AABB")
    .build()
)

Hierarchy¶

builder/
├── crystal.py      # Crystal builders
└── polymer/        # Polymer builders
    ├── linear.py
    ├── branched.py
    └── crosslinked.py

Reacter Module¶

The reacter module handles chemical reactions and topology modification.

Design: Template-Based¶

Reactions are defined by templates:

template = ReactionTemplate(
    reactants=["[C:1]=[C:2]", "[H:3][O:4]"],
    products=["[C:1][C:2]([O:4][H:3])"]
)

result = template.apply(frame)

Components¶

Template - Reaction definition
Selector - Select reactive sites
Transformer - Apply transformations
Connector - Form new bonds

Typifier Module¶

The typifier module assigns atom types.

Design: Rule-Based Engine¶

Atom typing uses pattern matching:

typifier = AtomTypifier(forcefield="oplsaa")
typed_frame = typifier.assign_types(frame)

Layered Approach¶

Graph matching - Identify chemical environments
Dependency analysis - Resolve type dependencies
Type assignment - Assign final types

Data Flow¶

Typical MolPy workflow:

Input File
    ↓
[IO Reader]
    ↓
Frame (core data structure)
    ↓
[Builder/Reacter] → Modified Frame
    ↓
[Typifier] → Typed Frame
    ↓
[Compute] → Analysis Results
    ↓
[IO Writer]
    ↓
Output File

Extension Points¶

Adding New File Format¶

Create reader class in io/data/:

class MyFormatReader(DataReader):
    def read(self, filepath: Path) -> Frame:
        ...

Add factory function in io/readers.py:

def read_myformat(filepath: str | Path) -> Frame:
    reader = MyFormatReader()
    return reader.read(Path(filepath))

Export in io/__init__.py

Adding New Computation¶

Create function in compute/:

def calculate_my_property(frame: Frame) -> np.ndarray:
    """Calculate my property."""
    ...

Export in compute/__init__.py

Adding New Builder¶

Create builder class in builder/:

class MySystemBuilder:
    def build(self) -> Frame:
        ...

Export in builder/__init__.py

Performance Considerations¶

NumPy Vectorization¶

Use NumPy operations instead of loops:

Good:

distances = np.linalg.norm(positions[i] - positions[j], axis=1)

Bad:

distances = [
    np.linalg.norm(positions[i] - positions[j])
    for i, j in pairs
]

Lazy Evaluation¶

Defer expensive operations:

class Topology:
    def __init__(self, frame: Frame):
        self._frame = frame
        self._bonds = None  # Compute on demand

    def bonds(self) -> list[Bond]:
        if self._bonds is None:
            self._bonds = self._detect_bonds()
        return self._bonds

Memory Efficiency¶

Use views instead of copies when possible:

# View (no copy)
subset = frame["atoms"]["x"][start:end]

# Copy (when needed)
subset = frame["atoms"]["x"][start:end].copy()

Testing Architecture¶

Unit Tests¶

Test individual components: - tests/test_core/ - Core data structures - tests/test_io/ - IO readers/writers - tests/test_compute/ - Computations

Integration Tests¶

Test components working together: - Read → Modify → Write roundtrips - Builder → Typifier → Writer workflows

Fixtures¶

Reusable test data:

@pytest.fixture
def water_frame():
    """Provide water molecule frame."""
    return create_water_molecule()

Architecture¶

Design Philosophy¶

1. Composability¶

2. Explicit Over Implicit¶

3. Type Safety¶

4. Immutability Where Possible¶

5. Separation of Concerns¶

Package Structure¶

Core Module¶

Frame¶

Block¶

Box¶

Atomistic¶

IO Module¶

Design Pattern: Reader/Writer Classes¶

Factory Functions¶

Hierarchy¶

Compute Module¶

Design: Functional Style¶

Result Objects¶

Wrapper vs Adapter Pattern¶

Wrapper Pattern¶

Adapter Pattern¶

Builder Module¶

Design: Builder Pattern¶

Hierarchy¶

Reacter Module¶

Design: Template-Based¶

Components¶

Typifier Module¶

Design: Rule-Based Engine¶

Layered Approach¶

Data Flow¶

Extension Points¶

Adding New File Format¶

Adding New Computation¶

Adding New Builder¶

Performance Considerations¶

NumPy Vectorization¶

Lazy Evaluation¶

Memory Efficiency¶

Testing Architecture¶

Unit Tests¶

Integration Tests¶

Fixtures¶

Further Reading¶