Polydisperse Polymer Systems¶
"How do I declare a molecular-weight distribution and sample a polydisperse chain ensemble in MolPy?"
This guide demonstrates MolPy’s workflow from a GBigSMILES distribution declaration to a sampled chain ensemble, and then compares $M_n$ / $M_w$ / PDI against the theoretical distribution.
What We Will Do¶
- Declare four common distributions in GBigSMILES (Schulz–Zimm / Uniform / Poisson / Flory–Schulz)
- Convert the parsed
DistributionIRinto a samplable distribution object (create_polydisperse_from_ir) - Sample a chain ensemble with
PolydisperseChainGenerator+SystemPlanner - Compute $M_n$, $M_w$, and PDI from sampled chains and plot sample vs theory correctly (PMF vs PDF)
SystemPlanner (system level) repeatedly requests chains from PolydisperseChainGenerator (chain level) until the target total mass is met. The chain generator requests monomer sequences from a SequenceGenerator (sequence level).
Setup: Imports¶
We will use:
- GBigSMILES parser: parses a string into an IR (including distribution parameters)
- Distribution factory:
create_polydisperse_from_ir(turnsDistributionIRinto a samplable distribution object) - Chain generator:
PolydisperseChainGenerator(samples DP or mass, then generates a chain) - System planner:
SystemPlanner(accumulates chain mass until the target mass is met) - Plotting: Matplotlib
from random import Random
import matplotlib.pyplot as plt
import numpy as np
import scienceplots
plt.style.use(["science", 'nature', "no-latex"])
import molpy as mp
from molpy.builder.polymer.sequence_generator import WeightedSequenceGenerator
from molpy.builder.polymer.system import (
FlorySchulzPolydisperse,
PoissonPolydisperse,
PolydisperseChainGenerator,
SchulzZimmPolydisperse,
SystemPlanner,
UniformPolydisperse,
create_polydisperse_from_ir,
)
from molpy.core.element import Element
Step 1: Declare Distributions in GBigSMILES¶
We use a compact GBigSMILES form like:
{repeat_units}|distribution(params)|end_group|target_mass|
Where:
repeat_units: repeat unit set (can include multiple monomers)distribution(params): distribution type and parametersend_group: end group (we use[H].as a simple example)target_mass: target total mass for the system (used bySystemPlanner.target_total_mass)
We compare four distributions:
- Schulz–Zimm (continuous, mass space):
schulz_zimm(Mn, Mw) - Uniform (discrete, DP space):
uniform(min_dp, max_dp) - Poisson (discrete, DP space):
poisson(mean_dp) - Flory–Schulz (discrete, DP space):
flory_schulz(a)Checkpoint: you can parse each string and find aDistributionIR(name, params)in the resulting IR.
def _build_distributions() -> list[dict]:
"""Configure the four distributions to test (GBigSMILES strings)."""
return [
{
"name": "Schulz-Zimm",
"gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|schulz_zimm(4800, 6000)|[H].|1e8|",
},
{
"name": "Uniform",
"gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|uniform(10,60)|[H].|1e8|",
},
{
"name": "Poisson",
"gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|poisson(30)|[H].|1e8|",
},
{
"name": "Flory-Schulz",
"gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|flory_schulz(0.06)|[H].|1e8|",
},
]
Step 2: Parse GBigSMILES and Extract Distribution Information¶
For each distribution, we need to:
- Parse the GBigSMILES string to get the
DistributionIR - Extract repeat units and compute monomer masses
- Create a samplable distribution object using
create_polydisperse_from_ir
Checkpoint: After parsing, you should have a list of distribution objects ready for sampling.
distributions_to_test = _build_distributions()
random_seed = 43
results: list[dict] = []
for dist_config in distributions_to_test:
# Parse GBigSMiles string
ir = mp.parser.parse_gbigsmiles(dist_config["gbigsmiles"])
# Extract component information
component_ir = ir.molecules[0]
mol_ir = component_ir.molecule
target_mass = component_ir.target_mass
# Extract repeat units and convert to Atomistic structures
repeat_units = mol_ir.structure.stochastic_objects[0].repeat_units
stoch_obj = mol_ir.structure.stochastic_objects[0]
# Convert graphs to Atomistic structures to compute masses
from molpy.parser.smiles.converter import create_monomer_from_repeat_unit
monomer_structures = []
for unit in repeat_units:
monomer = create_monomer_from_repeat_unit(unit, stoch_obj)
if monomer is not None:
monomer_structures.append(monomer)
# Compute monomer masses by summing atomic masses
monomer_masses_list = []
for monomer in monomer_structures:
total_mass = 0.0
for atom in monomer.atoms:
element_symbol = atom.get("element") or atom.get("symbol")
if element_symbol:
try:
elem = Element(element_symbol)
total_mass += elem.mass
except KeyError:
# Skip unknown elements
pass
monomer_masses_list.append(total_mass)
# Equal weights for all monomers (can be customized)
weights_dict = {str(i): 1.0 for i in range(len(monomer_structures))}
# Extract distribution IR
distribution_ir = None
for meta in mol_ir.stochastic_metadata:
if meta.distribution:
distribution_ir = meta.distribution
break
if distribution_ir is None:
raise ValueError(f"No distribution IR found for {dist_config['name']}")
# Create samplable distribution object
dist_obj = create_polydisperse_from_ir(distribution_ir, random_seed=random_seed)
results.append(
{
"name": dist_config["name"],
"dist_obj": dist_obj,
"weights": weights_dict,
"monomer_masses": {str(i): m for i, m in enumerate(monomer_masses_list)},
"target_mass": float(target_mass) if target_mass is not None else None,
}
)
print(f"Parsed {len(results)} distributions:")
for r in results:
print(f" - {r['name']}: {type(r['dist_obj']).__name__}")
Parsed 4 distributions: - Schulz-Zimm: SchulzZimmPolydisperse - Uniform: UniformPolydisperse - Poisson: PoissonPolydisperse - Flory-Schulz: FlorySchulzPolydisperse
Step 3: Build Sequence Generator¶
The WeightedSequenceGenerator controls the composition of repeat units in each chain.
Here we use equal weights for all monomers, but you can customize this to control copolymer composition.
# Build sequence generators for each distribution
for result in results:
seq_gen = WeightedSequenceGenerator(monomer_weights=result["weights"])
result["seq_generator"] = seq_gen
Step 4: Build Chain Generator¶
The PolydisperseChainGenerator samples chain size (DP or mass) from the distribution and generates a chain sequence.
It requires:
- A sequence generator (from step 4)
- Monomer masses (to convert DP to chain mass)
- End group mass (we use 0.0 for simplicity)
- The distribution object (from step 3)
# Build chain generators for each distribution
for result in results:
chain_gen = PolydisperseChainGenerator(
seq_generator=result["seq_generator"],
monomer_mass=result["monomer_masses"],
end_group_mass=0.0,
distribution=result["dist_obj"],
)
result["chain_generator"] = chain_gen
Step 5: Sample Chains with System Planner¶
The SystemPlanner accumulates chains until the target total mass is reached.
It repeatedly requests chains from the PolydisperseChainGenerator until:
- The accumulated mass reaches the target (within
max_rel_error) - Or the maximum number of iterations is reached
Checkpoint: After planning, you should have a list of chains with varying molecular weights and DPs.
# Sample chains for each distribution
for result in results:
target_mass = result.get("target_mass", None)
rng = Random(random_seed)
planner = SystemPlanner(
chain_generator=result["chain_generator"],
target_total_mass=target_mass if target_mass is not None else 1e7,
max_rel_error=0.02,
)
system_plan = planner.plan_system(rng)
chains = system_plan.chains
result["molecular_weights"] = [chain.mass for chain in chains]
result["dps"] = [chain.dp for chain in chains]
result["n_chains"] = len(chains)
print("Sampling complete:")
for r in results:
print(f" - {r['name']}: {r['n_chains']} chains, total mass = {sum(r['molecular_weights']):.2e} g/mol")
Sampling complete: - Schulz-Zimm: 21200 chains, total mass = 1.00e+08 g/mol - Uniform: 18125 chains, total mass = 1.00e+08 g/mol - Poisson: 21062 chains, total mass = 1.00e+08 g/mol - Flory-Schulz: 19528 chains, total mass = 1.00e+08 g/mol
Step 6: Compute Statistics (Mn, Mw, and PDI)¶
From the sampled chains, we compute:
- $M_n$ (number-average molecular weight): $\bar{M}_n = \frac{\sum M_i}{N}$
- $M_w$ (weight-average molecular weight): $\bar{M}_w = \frac{\sum M_i^2}{\sum M_i}$
- PDI (polydispersity index): $\mathrm{PDI} = \frac{M_w}{M_n}$
These values can be compared against the theoretical distribution parameters.
# Compute Mn, Mw, PDI from sampled chains
for result in results:
mw_array = np.array(result["molecular_weights"], dtype=float)
Mn = float(np.mean(mw_array))
Mw = float(np.sum(mw_array**2) / np.sum(mw_array))
PDI = Mw / Mn
result["Mn"] = Mn
result["Mw"] = Mw
result["PDI"] = PDI
print("Statistics:")
for r in results:
print(f" - {r['name']}:")
print(f" Mn = {r['Mn']:.0f} g/mol, Mw = {r['Mw']:.0f} g/mol, PDI = {r['PDI']:.3f}")
Statistics:
- Schulz-Zimm:
Mn = 4717 g/mol, Mw = 5922 g/mol, PDI = 1.256
- Uniform:
Mn = 5517 g/mol, Mw = 6505 g/mol, PDI = 1.179
- Poisson:
Mn = 4748 g/mol, Mw = 4908 g/mol, PDI = 1.034
- Flory-Schulz:
Mn = 5121 g/mol, Mw = 7677 g/mol, PDI = 1.499
Step 7: Visualize Sampled vs Theoretical Distributions¶
We plot histograms of the sampled chains overlaid with the theoretical PDF (for continuous distributions) or PMF (for discrete distributions).
Important:
- Schulz–Zimm is a continuous distribution in mass space (PDF). We plot
mass_pdf(M)vs histogram of molecular weights. - Uniform, Poisson, Flory–Schulz are discrete distributions in DP space (PMF). We plot
dp_pmf(n)vs histogram of DPs.
Do not overlay a PMF on a PDF (different units).
# Helper function: add statistics box to each subplot
def annotate_stats(ax, Mn: float, Mw: float, PDI: float, n_chains: int) -> None:
"""Add statistics box at top-right of axis."""
Mn_txt = f"{Mn:.0f}\\,\\mathrm{{g/mol}}"
Mw_txt = f"{Mw:.0f}\\,\\mathrm{{g/mol}}"
txt = (
rf"$M_n={Mn_txt}$" "\n"
rf"$M_w={Mw_txt}$" "\n"
rf"$\mathrm{{PDI}}={PDI:.3f}$" "\n"
rf"$N={n_chains}$"
)
ax.text(
0.96, 0.96, txt,
transform=ax.transAxes,
ha="right", va="top",
bbox=dict(boxstyle="round,pad=0.25", facecolor="white", edgecolor="none", alpha=0.90),
)
# Create figure with 2x2 subplots
fig, axes = plt.subplots(
2, 2,
figsize=(5.5, 5.2),
constrained_layout=True,
sharex=False,
sharey=False,
)
axes = axes.flatten()
for idx, (ax, result) in enumerate(zip(axes, results)):
dist_obj = result["dist_obj"]
if isinstance(dist_obj, SchulzZimmPolydisperse):
# Continuous distribution in mass space (PDF)
bins = 60
n_grid = 700
x_pad_lo, x_pad_hi = 0.3, 1.3
mw = np.asarray(result["molecular_weights"], dtype=float)
ax.hist(mw, bins=bins, density=True, histtype="step", label="Chain hist" if idx == 0 else "")
M_min = max(0.0, mw.min() * x_pad_lo)
M_max = mw.max() * x_pad_hi
M_grid = np.linspace(M_min, M_max, n_grid)
pdf = dist_obj.mass_pdf(M_grid)
ax.plot(M_grid, pdf, label="Fit" if idx == 0 else "")
Mn_val = result["Mn"]
Mw_val = result["Mw"]
pdf_Mn = float(dist_obj.mass_pdf(Mn_val))
pdf_Mw = float(dist_obj.mass_pdf(Mw_val))
ax.plot([Mn_val, Mn_val], [0, pdf_Mn], color="C2", linestyle="--")
ax.plot([Mw_val, Mw_val], [0, pdf_Mw], color="C3", linestyle="--")
# Add text labels for Mn and Mw
ylim = ax.get_ylim()
ax.text(Mn_val * 1.15, pdf_Mn * 1.05, f"$M_n$", color="C2", ha="center", va="bottom")
ax.text(Mw_val * 1.1, pdf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")
ax.set_xlabel(r"Molecular weight $M$ (g/mol)")
ax.set_ylabel(r"PDF $f_M(M)$")
elif isinstance(dist_obj, UniformPolydisperse):
# Discrete distribution in DP space (PMF)
dp = np.asarray(result["dps"], dtype=int)
dp_min, dp_max = int(dp.min()), int(dp.max())
bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")
support = np.arange(dp_min, dp_max + 1)
pmf = dist_obj.dp_pmf(support)
ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")
# Calculate DP values for Mn and Mw
monomer_masses = list(result["monomer_masses"].values())
avg_monomer_mass = np.mean(monomer_masses)
dp_Mn = result["Mn"] / avg_monomer_mass
dp_Mw = result["Mw"] / avg_monomer_mass
# Find closest DP in support for plotting
dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
dp_Mn_plot = support[dp_Mn_idx]
dp_Mw_plot = support[dp_Mw_idx]
pmf_Mn = pmf[dp_Mn_idx]
pmf_Mw = pmf[dp_Mw_idx]
ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")
ax.set_xlabel(r"Degree of polymerization $n$")
ax.set_ylabel(r"PMF $P(n)$")
ax.set_ylim(0, pmf.max() * 1.5)
elif isinstance(dist_obj, PoissonPolydisperse):
# Discrete distribution in DP space (PMF)
dp = np.asarray(result["dps"], dtype=int)
dp_min, dp_max = int(dp.min()), int(dp.max())
bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")
theory_min = 1
theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
support = np.arange(theory_min, theory_max + 1)
pmf = dist_obj.dp_pmf(support)
ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")
# Calculate DP values for Mn and Mw
monomer_masses = list(result["monomer_masses"].values())
avg_monomer_mass = np.mean(monomer_masses)
dp_Mn = result["Mn"] / avg_monomer_mass
dp_Mw = result["Mw"] / avg_monomer_mass
# Find closest DP in support for plotting
dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
dp_Mn_plot = support[dp_Mn_idx]
dp_Mw_plot = support[dp_Mw_idx]
pmf_Mn = pmf[dp_Mn_idx]
pmf_Mw = pmf[dp_Mw_idx]
ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")
ax.set_xlabel(r"Degree of polymerization $n$")
ax.set_ylabel(r"PMF $P(n)$")
elif isinstance(dist_obj, FlorySchulzPolydisperse):
# Discrete distribution in DP space (PMF)
dp = np.asarray(result["dps"], dtype=int)
dp_min, dp_max = int(dp.min()), int(dp.max())
bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")
theory_min = 1
theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
support = np.arange(theory_min, theory_max + 1)
pmf = dist_obj.dp_pmf(support)
ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")
# Calculate DP values for Mn and Mw
monomer_masses = list(result["monomer_masses"].values())
avg_monomer_mass = np.mean(monomer_masses)
dp_Mn = result["Mn"] / avg_monomer_mass
dp_Mw = result["Mw"] / avg_monomer_mass
# Find closest DP in support for plotting
dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
dp_Mn_plot = support[dp_Mn_idx]
dp_Mw_plot = support[dp_Mw_idx]
pmf_Mn = pmf[dp_Mn_idx]
pmf_Mw = pmf[dp_Mw_idx]
ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")
ax.set_xlabel(r"Degree of polymerization $n$")
ax.set_ylabel(r"PMF $P(n)$")
else:
raise TypeError(f"Unsupported distribution type: {type(dist_obj)}")
ax.set_title(result["name"])
ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
annotate_stats(ax, result["Mn"], result["Mw"], result["PDI"], result["n_chains"])
if idx == 0:
ax.legend(loc="best")
plt.show()
Interpretation¶
- PDF vs PMF: use
mass_pdf(M)for continuous mass distributions (Schulz–Zimm), anddp_pmf(n)for discrete DP distributions (Uniform/Poisson/Flory–Schulz). Do not overlay a PMF on a PDF (different units). - Target total mass: a large
target_massgenerates many chains. For quick smoke tests, reduce thetarget_total_massor use a smaller value in the GBigSMILES string. - Reproducibility: this example fixes both the distribution
random_seedand the PythonRandom(...)seed so runs are repeatable.
Extensions¶
- Turn a system plan into an atomistic structure: feed
SystemPlan.chainsinto aPolymerBuilder(atomistic build layer), then pack with Packmol to create an initial configuration. - Fit to experimental distributions: implement a custom distribution that provides
mass_pdf/sample_massand plug it into the same pipeline. - Control copolymer composition: adjust
WeightedSequenceGeneratorweights to change the repeat-unit composition.
Summary¶
You now have the standard path for generating a polydisperse polymer system plan with MolPy:
parse_gbigsmilesparses repeat units andDistributionIRfrom a GBigSMILES stringcreate_polydisperse_from_irturnsDistributionIRinto a samplable distribution (DP space or mass space)PolydisperseChainGeneratorsamples chain size and generates sequences + chain massSystemPlanneraccumulates chains until the target total mass is reached- Finally, compute $M_n$ / $M_w$ / PDI from the sampled chains and visualize histogram vs theoretical PMF/PDF
To convert this chain plan into an atomistic system suitable for LAMMPS, continue with the atomistic build + packing guides.