Skip to content

MolPy

A programmable toolkit for molecular simulation workflows.

Build, edit, type, and export molecular systems in Python with explicit, composable data structures.

Python 3.12+ License: BSD Docs Ruff Type checked: ty


Representative Workflows

Parse a small organic molecule from SMILES, assign OPLS-AA types, and export complete LAMMPS input files.

import molpy as mp

mol   = mp.parser.parse_molecule("CCO")          # ethanol from SMILES
ff    = mp.io.read_xml_forcefield("oplsaa.xml")  # bundled OPLS-AA
typed = mp.typifier.OplsAtomisticTypifier(ff).typify(mol)

mp.io.write_lammps_system("output/", typed.to_frame(), ff)
# → output/system.data  output/system.in

Construct a poly(ethylene oxide) chain from G-BigSMILES notation, generate three-dimensional coordinates, and export a simulation-ready topology.

import molpy as mp

# PEO chain, degree of polymerization = 10
peo = mp.tool.polymer("{[<]CCOCC[>]}|10|")

ff    = mp.io.read_xml_forcefield("oplsaa.xml")
typed = mp.typifier.OplsAtomisticTypifier(ff).typify(peo)
mp.io.write_lammps_system("output/", typed.to_frame(), ff)

Sample a Schulz-Zimm molecular-weight distribution, construct each chain atomistically, and pack the ensemble into a periodic simulation box.

import molpy as mp

# Mn = 1500 Da, Mw = 3000 Da, target total mass ≈ 500 kDa
chains = mp.tool.polymer_system(
    "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
    random_seed=42,
)
print(f"Built {len(chains)} chains")

frames = [c.to_frame() for c in chains]
packed = mp.pack.pack(frames, box=[80, 80, 80])
mp.io.write_lammps_system("peo_bulk/", packed, ff)

Prepare a monomer with partial charges via antechamber, assemble a chain with GAFF2 parameters via tleap, and retrieve AMBER topology files programmatically.

import molpy as mp

# BigSMILES → three-dimensional structure with port annotation
eo = mp.tool.PrepareMonomer().run("{[<]CCOCC[>]}")

# Assemble DP = 20 chain via AmberTools
result = mp.tool.polymer(
    "{[#EO]|20}",
    library={"EO": eo},
    backend="amber",
)
# result.prmtop_path  result.inpcrd_path  result.pdb_path

Core Capabilities

  • Explicit representational hierarchy — Molecular graphs (Atomistic), numerical snapshots (Frame), and force field parameters (ForceField) occupy distinct layers with explicit conversion boundaries.

  • Polymer notation support — SMILES, BigSMILES, CGSmiles, and G-BigSMILES are parsed directly. Monomer definitions, architectures, and polydisperse specifications can be represented in a compact textual form.

  • Statistical molecular-weight distributions — Schulz–Zimm, Poisson, Flory–Schulz, and uniform distributions are implemented natively. Target number- and weight-average molecular weights are specified directly; reproducible chain populations are generated from a fixed random seed.

  • Force fields as queryable data structures — A ForceField object is an inspectable typed dictionary. Parameter completeness and type consistency can be checked in Python before file export.

  • Reactive topology editing — Chemical reactions are expressed through anchor selectors and leaving-group selectors. Pre- and post-reaction topology templates for LAMMPS fix bond/react can be generated directly from the edited system.

  • Explicit subsystem boundaries — The parser, builder, typifier, packer, and I/O subsystems interact through explicit data conversions rather than hidden shared state, so each layer can be used independently or combined into a larger workflow.


External Integrations

  • MCP interface for large language model agents — MolPy can expose source symbols and documentation through the Model Context Protocol, supporting structured agent access without altering the core data model.

  • AmberTools — Antechamber (partial charge assignment), parmchk2 (missing parameter estimation), and tleap (topology assembly) are invoked programmatically with structured Python interfaces.

  • RDKitRDKitAdapter provides bidirectional conversion between Atomistic and RDKit Mol objects, enabling three-dimensional embedding, conformer generation, and SMILES export.

  • Packmol — Molecule packing into periodic simulation boxes is managed through a typed constraint interface wrapping the Packmol executable.

  • LAMMPS · CP2K — Complete input decks are generated from MolPy data objects. The engine abstraction layer decouples system description from simulation-code-specific syntax.


Documentation Map

  • Getting Started — Installation, environment verification, MCP setup, and the core Atomistic → Frame → export workflow for first-time users.

  • Concepts — Systematic exposition of the core data model plus the surrounding I/O, Tool, and engine boundaries that connect MolPy to external workflows.

  • Guides — Task-oriented executable notebooks and worked examples covering chemistry parsing, polymer construction, force field typification, and AmberTools-based system preparation.

  • Developer Guide — Conventions, extension patterns, and internal architecture for contributors and library developers.