Polydisperse Systems¶
After reading this page you will be able to sample chain lengths from a target distribution, build typed copolymer chains with incremental junction re-typification, pack them into a periodic box, and export LAMMPS files.
!!! note "Prerequisites"
This guide requires RDKit, Packmol, and the oplsaa.xml force field. Familiarity with Stepwise Polymer Construction is assumed.
From distribution to simulation box¶
Real polymer systems are polydisperse — chains have varying lengths. MolPy provides distribution samplers, a system planner, and packing tools to go from a target molecular weight distribution to a simulation-ready box in one continuous workflow.
The system in this guide is a statistical copolymer of styrene (Sty, 80 mol%) and methyl acrylate (MA, 20 mol%), produced by controlled radical polymerization. The corresponding GBigSMILES is:
CCOC(=O)C(C)(C){[>][<|8|]CC([>|8|])c1ccccc1, [<|2|]CC([>|2|])C(=O)OC [<]}|schulz_zimm(1400,1500)|[Br].|5e5|
The three summary statistics that describe polydispersity are the number-average molecular weight Mn, the weight-average molecular weight Mw, and the dispersity PDI = Mw/Mn, which equals 1.0 for a perfectly monodisperse sample.
Typification happens per monomer, not per chain¶
Each monomer is parsed, expanded to 3D with hydrogens, and assigned force field types individually. The builder then handles incremental re-typification at each junction during chain growth, so there is no need to re-typify the entire chain after assembly.
import molpy as mp
from molpy.core.element import Element
from molpy.typifier import OplsAtomisticTypifier
ff = mp.io.read_xml_forcefield("oplsaa.xml")
typifier = OplsAtomisticTypifier(ff, strict_typing=False)
BIGSMILES = {
"Sty": "{[][<]CC(c1ccccc1)[>][]}",
"MA": "{[][<]CC(C(=O)OC)[>][]}",
}
def build_typed_monomer(bigsmiles, typifier):
monomer = mp.parser.parse_monomer(bigsmiles)
monomer = mp.tool.generate_3d(monomer, add_hydrogens=True, optimize=False)
monomer = monomer.get_topo(gen_angle=True, gen_dihe=True)
monomer = typifier.typify(monomer)
return monomer
library = {label: build_typed_monomer(bs, typifier) for label, bs in BIGSMILES.items()}
monomer_mass = {}
for label, mon in library.items():
mass = sum(Element(a.get("element")).mass for a in mon.atoms)
ports = [a.get("port") for a in mon.atoms if a.get("port")]
monomer_mass[label] = mass
print(f"{label}: atoms={len(mon.atoms)}, mass={mass:.1f}, ports={ports}")
2026-04-10 07:27:00,951 - molpy.potential.dihedral.opls - WARNING - RB coefficients do not lie on the ideal 4-term OPLS manifold (C0+C1+C2+C3+C4 = 10.041600, expected ≈ 0). Conversion will preserve forces and relative energies exactly, but will introduce a constant energy offset of ΔE = 10.041600 kJ/mol. This does not affect MD simulations.
Sty: atoms=24, mass=112.2, ports=['<', '>'] MA: atoms=14, mass=88.1, ports=['<', '>']
Sampling draws chain lengths from a statistical distribution¶
The sampling layer has three components that compose cleanly. WeightedSequenceGenerator controls the monomer mole ratio (80:20 here). PolydisperseChainGenerator draws a degree of polymerization or mass from the chosen distribution for each chain. SystemPlanner accumulates chains until a target total mass is reached, stopping when the accumulated mass is within max_rel_error of the target. Four distributions are demonstrated below so that their shape differences become visible in the next section.
import numpy as np
from molpy.builder.polymer import (
SchulzZimmPolydisperse,
UniformPolydisperse,
PoissonPolydisperse,
FlorySchulzPolydisperse,
WeightedSequenceGenerator,
PolydisperseChainGenerator,
SystemPlanner,
)
distributions = {
"Schulz-Zimm": SchulzZimmPolydisperse(Mn=1400, Mw=1500, random_seed=42),
"Uniform": UniformPolydisperse(min_dp=8, max_dp=22, random_seed=42),
"Poisson": PoissonPolydisperse(lambda_param=14, random_seed=42),
"Flory-Schulz": FlorySchulzPolydisperse(a=0.08, random_seed=42),
}
seq_gen = WeightedSequenceGenerator(monomer_weights={"Sty": 8.0, "MA": 2.0})
target_total_mass = 5e5
results = {}
for name, dist in distributions.items():
chain_gen = PolydisperseChainGenerator(
seq_generator=seq_gen,
monomer_mass=monomer_mass,
end_group_mass=0.0,
distribution=dist,
)
planner = SystemPlanner(
chain_generator=chain_gen,
target_total_mass=target_total_mass,
max_rel_error=0.02,
)
plan = planner.plan_system(np.random.default_rng(42))
results[name] = plan.chains
for name, chains in results.items():
mw = np.array([c.mass for c in chains])
Mn = float(np.mean(mw))
Mw = float(np.sum(mw**2) / np.sum(mw))
print(f"{name:15s}: {len(chains):4d} chains, Mn={Mn:.0f}, PDI={Mw / Mn:.3f}")
Schulz-Zimm : 374 chains, Mn=1338, PDI=1.076 Uniform : 312 chains, Mn=1609, PDI=1.086 Poisson : 334 chains, Mn=1500, PDI=1.073 Flory-Schulz : 193 chains, Mn=2597, PDI=1.619
Visualising the sampled ensembles reveals the distribution shape¶
The four panels below overlay sampled histograms against their theoretical curves. Schulz-Zimm is plotted as a continuous probability density; the other three are plotted as probability mass functions over degree of polymerization. Vertical dashed lines mark Mn and Mw for each ensemble.
import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator
# ── colour palette ──
CLR_HIST = "#6baed6" # steel blue – sampled histogram
CLR_EDGE = "#3182bd" # darker blue – histogram edge
CLR_THEO = "#e6550d" # orange – theoretical curve
CLR_MN = "#31a354" # green – Mn line
CLR_MW = "#de2d26" # red – Mw line
CLR_BOX = "#f7f7f7" # near-white – annotation box
def annotate_stats(ax, Mn, Mw, PDI, n_chains):
txt = "\n".join(
[
rf"$M_n = {Mn:.0f}$ g/mol",
rf"$M_w = {Mw:.0f}$ g/mol",
rf"PDI $= {PDI:.3f}$",
rf"$N = {n_chains}$",
]
)
ax.text(
0.97,
0.97,
txt,
transform=ax.transAxes,
ha="right",
va="top",
fontsize=6.5,
linespacing=1.4,
family="monospace",
bbox=dict(
boxstyle="round,pad=0.35",
facecolor=CLR_BOX,
edgecolor="0.75",
alpha=0.95,
linewidth=0.6,
),
)
fig, axes = plt.subplots(2, 2, figsize=(7.5, 6), constrained_layout=True)
for idx, (ax, (name, chains)) in enumerate(zip(axes.flatten(), results.items())):
dist_obj = distributions[name]
mw = np.array([c.mass for c in chains])
dps = np.array([c.dp for c in chains])
Mn = float(np.mean(mw))
Mw = float(np.sum(mw**2) / np.sum(mw))
PDI = Mw / Mn
if isinstance(dist_obj, SchulzZimmPolydisperse):
# ── continuous: Freedman–Diaconis bins ──
iqr = np.subtract(*np.percentile(mw, [75, 25]))
bw = max(2.0 * iqr / len(mw) ** (1 / 3), 20)
bins = np.arange(mw.min() - bw, mw.max() + 2 * bw, bw)
ax.hist(
mw,
bins=bins,
density=True,
color=CLR_HIST,
edgecolor=CLR_EDGE,
linewidth=0.5,
alpha=0.55,
)
M_grid = np.linspace(max(0, mw.min() * 0.3), mw.max() * 1.3, 500)
ax.plot(
M_grid,
dist_obj.mass_pdf(M_grid),
color=CLR_THEO,
linewidth=1.6,
label="Theory",
)
ax.axvline(Mn, color=CLR_MN, ls="--", lw=1, label=r"$M_n$")
ax.axvline(Mw, color=CLR_MW, ls="--", lw=1, label=r"$M_w$")
ax.set_xlabel(r"Molecular weight $M$ (g mol$^{-1}$)", fontsize=8)
ax.set_ylabel("Probability density", fontsize=8)
else:
# ── discrete: unit-width bars centred on integers ──
dp_min, dp_max = int(dps.min()), int(dps.max())
counts = np.bincount(dps)[dp_min:]
freq = counts / (counts.sum() or 1)
x_bar = np.arange(dp_min, dp_min + len(counts))
ax.bar(
x_bar,
freq,
width=1.0,
align="center",
color=CLR_HIST,
edgecolor=CLR_EDGE,
linewidth=0.5,
alpha=0.55,
)
# Theory curve extends to 99th-percentile of the sample
if isinstance(dist_obj, UniformPolydisperse):
support = np.arange(dp_min, dp_max + 1)
else:
hi = max(dp_max, int(np.percentile(dps, 99) * 1.15))
support = np.arange(max(1, dp_min), hi + 1)
pmf = dist_obj.dp_pmf(support)
ax.plot(
support,
pmf,
"-o",
color=CLR_THEO,
markersize=2.2,
linewidth=1.3,
markeredgewidth=0,
label="Theory",
)
avg_mass = float(np.mean(mw / dps))
ax.axvline(Mn / avg_mass, color=CLR_MN, ls="--", lw=1, label=r"$M_n$")
ax.axvline(Mw / avg_mass, color=CLR_MW, ls="--", lw=1, label=r"$M_w$")
# Clip x-axis so long tails don't crush the peak
x_hi = int(np.percentile(dps, 99.5)) + 2
ax.set_xlim(max(0, dp_min - 1), x_hi)
ax.xaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_xlabel(r"Degree of polymerization $n$", fontsize=8)
ax.set_ylabel("Probability", fontsize=8)
ax.set_title(name, fontsize=9.5, fontweight="semibold", pad=6)
ax.tick_params(labelsize=7)
ax.spines[["top", "right"]].set_visible(False)
annotate_stats(ax, Mn, Mw, PDI, len(chains))
if idx == 0:
ax.legend(fontsize=7, loc="upper left", framealpha=0.85)
plt.savefig("05_polydisperse_distributions.png", dpi=200, bbox_inches="tight")
plt.show()
Radical addition couples monomers without leaving groups¶
The reaction here is radical addition: each connection removes one hydrogen from each backbone carbon and forms a new C–C bond. This differs from the dehydration condensation used in earlier guides — there are no hydroxyl leaving groups, only hydrogen removal from both sides.
The typifier is passed to PolymerBuilder so that incremental re-typification handles the junction atoms at each coupling step. There is no whole-chain typification after building; only the atoms immediately flanking each new bond are re-typed.
from molpy.builder.polymer import (
Connector,
CovalentSeparator,
LinearOrienter,
Placer,
PolymerBuilder,
)
from molpy.reacter import (
Reacter,
form_single_bond,
select_hydrogens,
select_self,
)
rxn = Reacter(
name="addition",
anchor_selector_left=select_self,
anchor_selector_right=select_self,
leaving_selector_left=select_hydrogens(1),
leaving_selector_right=select_hydrogens(1),
bond_former=form_single_bond,
)
rules = {(l, r): (">", "<") for l in library for r in library}
connector = Connector(port_map=rules, reacter=rxn)
placer = Placer(
separator=CovalentSeparator(buffer=-0.1),
orienter=LinearOrienter(),
)
builder = PolymerBuilder(
library=library,
connector=connector,
placer=placer,
typifier=typifier, # incremental re-typification at junctions
)
sz_chains = results["Schulz-Zimm"]
atomistic_chains = []
n_chains = 10 # truncated for this tutorial; use len(sz_chains) for a production run
for i, chain in enumerate(sz_chains[:n_chains]):
labels = " ".join(f"[#{m}]" for m in chain.monomers)
cgsmiles = "{" + labels + "}"
result = builder.build(cgsmiles)
atomistic_chains.append(result.polymer)
if (i + 1) % 5 == 0:
print(f" built {i + 1}/{len(sz_chains)} chains ...")
total_atoms = sum(len(c.atoms) for c in atomistic_chains)
print(f"built {len(atomistic_chains)} chains, total atoms: {total_atoms}")
built 5/374 chains ...
built 10/374 chains ... built 10 chains, total atoms: 2080
Packing and exporting follow the same pattern as earlier guides¶
The box size follows from total molecular weight and target density. Each chain is added to the packer as an individual target with count 1, and the packed frame is written as a LAMMPS data file together with the force field.
from pathlib import Path
from molpy.pack import InsideBoxConstraint, Molpack
total_mw = sum(
sum(Element(a.get("element")).mass for a in c.atoms) for c in atomistic_chains
)
target_density = 0.05 # g/cm^3 (use ~1.0 for production)
volume = (total_mw / 6.022e23) / target_density * 1e24
box_length = volume ** (1 / 3)
packer = Molpack(workdir=Path("05_output/packmol"))
constraint = InsideBoxConstraint(
length=np.array([box_length] * 3),
origin=np.zeros(3),
)
for chain in atomistic_chains:
packer.add_target(chain.to_frame(), number=1, constraint=constraint)
packed = packer.optimize(max_steps=10000, seed=42)
packed.box = mp.Box.cubic(length=box_length)
mp.io.write_lammps_system("05_output/lammps", packed, ff)
print(f"packed: {packed['atoms'].nrows} atoms, box: {box_length:.1f} A")
packed: 2080 atoms, box: 71.5 A
The engine assembles a runnable input script from the exported data¶
Writing the data file is only half the story. To actually run the simulation, LAMMPS needs an input script that says how to read that file, which force field styles to activate, and what protocol to follow. MolPy models this through LAMMPSEngine, which pairs a Script object with subprocess management.
A Script is an editable, ordered list of lines that can be built programmatically and saved to disk without executing anything. This separation matters: you can inspect, modify, and version-control the script before committing to a run. When you are ready, engine.run() writes the script to the working directory and launches lmp -in input.lmp -log log.lammps -screen none.
The code below builds a minimal OPLS-AA equilibration protocol for the packed system. The force field styles must match those written by write_lammps_system—harmonic bonds and angles, opls dihedrals, lj/cut/coul/long non-bonds—because LAMMPS validates style consistency when it reads the data file.
from molpy.core.script import Script
from molpy.engine import LAMMPSEngine
# Build the LAMMPS input script line-by-line.
# Script.from_text() dedents and normalises the block.
lmp_script = Script.from_text(
name="input",
language="other",
text="""
# Polydisperse PS/PMA system — generated by MolPy
units real
atom_style full
read_data lammps.data
include lammps.ff
pair_style lj/cut/coul/long 12.0
pair_modify mix arithmetic tail yes
kspace_style pppm 1e-4
bond_style harmonic
angle_style harmonic
dihedral_style opls
improper_style cvff
# Energy minimisation before dynamics
minimize 1.0e-4 1.0e-6 10000 100000
timestep 1.0
thermo 1000
thermo_style custom step temp press etotal
# NVT equilibration at 300 K
fix nvt all nvt temp 300.0 300.0 100.0
run 100000
""",
)
# Save the script alongside the data files without launching LAMMPS.
# check_executable=False lets the call succeed in notebooks where lmp
# may not be on PATH.
engine = LAMMPSEngine("lmp", check_executable=False)
script_path = lmp_script.save("05_output/lammps/input.lmp")
print("Input script written to:", script_path)
print(lmp_script.preview(max_lines=12))
# To run the simulation, replace the two lines above with:
# result = engine.run(lmp_script, workdir="05_output/lammps")
# print("Exit code:", result.returncode)
Input script written to: 05_output/lammps/input.lmp 1 | 2 | # Polydisperse PS/PMA system — generated by MolPy 3 | units real 4 | atom_style full 5 | 6 | read_data lammps.data 7 | include lammps.ff 8 | 9 | pair_style lj/cut/coul/long 12.0 10 | pair_modify mix arithmetic tail yes 11 | kspace_style pppm 1e-4 12 | ... (15 more lines)
GBigSMILES encodes the whole recipe in one string¶
If the repeat units, their weights, the distribution, and the target mass are already known, the entire specification can be written as a single GBigSMILES string. The parser extracts the monomer identities, stochastic weights, distribution parameters, and system mass directly from the notation, making the string self-documenting and portable.
gbigsmiles = (
"CCOC(=O)C(C)(C)" # initiator
"{[>]"
"[<|8|]CC([>|8|])c1ccccc1," # styrene (80 mol%)
" [<|2|]CC([>|2|])C(=O)OC" # methyl acrylate (20 mol%)
" [<]}"
"|schulz_zimm(1400,1500)|" # Mn=1400, Mw=1500
"[Br]" # end group
".|5e5|" # total system mass
)
ir = mp.parser.parse_gbigsmiles(gbigsmiles)
print(f"molecules: {len(ir.molecules)}, total_mass: {ir.total_mass}")
molecules: 1, total_mass: 500000.0
Troubleshooting¶
| Step | Check |
|---|---|
| Monomer mass wrong | Verify monomer has explicit hydrogens before mass calculation |
| SystemPlanner total mass off | Check max_rel_error setting |
| Chain topology missing | Call get_topo(gen_angle=True, gen_dihe=True) before typification |
| Untyped atoms at junction | Pass typifier to PolymerBuilder for incremental re-typification |
| Packing fails | Lower target density or increase max_steps |
See also: Topology-Driven Assembly, Crosslinked Networks.