Skip to content

MolPy

A composable, strongly typed toolkit for computational molecular modeling — from single-molecule parameterization to polydisperse polymer system construction.

Python 3.12+ License: BSD Docs Ruff Type checked: ty


Representative Workflows

Parameterize a small organic molecule from a SMILES string using the bundled OPLS-AA force field and export complete LAMMPS input files.

import molpy as mp

mol   = mp.parser.parse_molecule("CCO")          # ethanol from SMILES
ff    = mp.io.read_xml_forcefield("oplsaa.xml")  # bundled OPLS-AA
typed = mp.typifier.OplsAtomisticTypifier(ff).typify(mol)

mp.io.write_lammps_system("output/", typed.to_frame(), ff)
# → output/system.data  output/system.in

Specify a poly(ethylene oxide) chain via G-BigSMILES notation. MolPy generates three-dimensional coordinates and exports a simulation-ready topology.

import molpy as mp

# PEO chain, degree of polymerization = 10
peo = mp.tool.polymer("{[<]CCOCC[>]}|10|")

ff    = mp.io.read_xml_forcefield("oplsaa.xml")
typed = mp.typifier.OplsAtomisticTypifier(ff).typify(peo)
mp.io.write_lammps_system("output/", typed.to_frame(), ff)

Sample a Schulz–Zimm molecular-weight distribution, construct each chain atomistically, and pack the ensemble into a periodic simulation box.

import molpy as mp

# Mn = 1500 Da, Mw = 3000 Da, target total mass ≈ 500 kDa
chains = mp.tool.polymer_system(
    "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
    random_seed=42,
)
print(f"Built {len(chains)} chains")

frames = [c.to_frame() for c in chains]
packed = mp.pack.pack(frames, box=[80, 80, 80])
mp.io.write_lammps_system("peo_bulk/", packed, ff)

Prepare a monomer with partial charges via antechamber, assemble a chain with GAFF2 parameters via tleap, and retrieve AMBER topology files programmatically.

import molpy as mp

# BigSMILES → three-dimensional structure with port annotation
eo = mp.tool.PrepareMonomer().run("{[<]CCOCC[>]}")

# Assemble DP = 20 chain via AmberTools
result = mp.tool.polymer(
    "{[#EO]|20}",
    library={"EO": eo},
    backend="amber",
)
# result.prmtop_path  result.inpcrd_path  result.pdb_path

Design Principles

  • Explicit representational hierarchy — Molecular graphs (Atomistic), numerical snapshots (Frame), and force field parameters (ForceField) occupy distinct layers with explicit conversion boundaries.

  • Native support for polymer chemistry notations — SMILES, BigSMILES, CGSmiles, and G-BigSMILES are parsed directly. A monomer, an architecture, or a polydisperse ensemble can each be expressed as a single string.

  • Statistical molecular-weight distributions — Schulz–Zimm, Poisson, Flory–Schulz, and uniform distributions are implemented natively. Target number- and weight-average molecular weights are specified directly; reproducible chain populations are generated from a fixed random seed.

  • Force fields as queryable data structures — A ForceField object is an inspectable typed dictionary. Parameter completeness and type consistency are verifiable at the Python level before any file export occurs.

  • Programmatic reaction framework — Chemical reactions are expressed through composable anchor selectors and leaving-group selectors. Pre- and post-reaction topology templates for LAMMPS fix bond/react are generated automatically.

  • Modular, independently composable packages — The parser, builder, typifier, packer, and I/O subsystems share no hidden coupling. Each may be used independently or assembled into composite pipelines through explicit function calls.


External Integrations

  • AmberTools — Antechamber (partial charge assignment), parmchk2 (missing parameter estimation), and tleap (topology assembly) are invoked programmatically with structured Python interfaces.

  • RDKitRDKitAdapter provides bidirectional conversion between Atomistic and RDKit Mol objects, enabling three-dimensional embedding, conformer generation, and SMILES export.

  • Packmol — Molecule packing into periodic simulation boxes is managed through a typed constraint interface wrapping the Packmol executable.

  • LAMMPS · CP2K — Complete input decks are generated from MolPy data objects. The engine abstraction layer decouples system description from simulation-code-specific syntax.


Documentation Structure

  • Getting Started — Installation, environment verification, and a five-minute end-to-end example establishing the Atomistic → Frame → export pipeline.

  • Concepts — Systematic exposition of the core data model: Atomistic, Block, Frame, Box, Trajectory, ForceField, and their inter-relationships.

  • Guides — Task-oriented executable notebooks covering chemistry parsing, polymer construction, force field typification, and simulation file generation.

  • Developer Guide — Conventions, extension patterns, and internal architecture for contributors and library developers.