Polydisperse Polymer Systems¶

"How do I declare a molecular-weight distribution and sample a polydisperse chain ensemble in MolPy?"

This guide demonstrates MolPy’s workflow from a GBigSMILES distribution declaration to a sampled chain ensemble, and then compares $M_n$ / $M_w$ / PDI against the theoretical distribution.

What We Will Do¶

Declare four common distributions in GBigSMILES (Schulz–Zimm / Uniform / Poisson / Flory–Schulz)
Convert the parsed DistributionIR into a samplable distribution object (create_polydisperse_from_ir)
Sample a chain ensemble with PolydisperseChainGenerator + SystemPlanner
Compute $M_n$, $M_w$, and PDI from sampled chains and plot sample vs theory correctly (PMF vs PDF)

SystemPlanner (system level) repeatedly requests chains from PolydisperseChainGenerator (chain level) until the target total mass is met. The chain generator requests monomer sequences from a SequenceGenerator (sequence level).

Setup: Imports¶

We will use:

GBigSMILES parser: parses a string into an IR (including distribution parameters)
Distribution factory: create_polydisperse_from_ir (turns DistributionIR into a samplable distribution object)
Chain generator: PolydisperseChainGenerator (samples DP or mass, then generates a chain)
System planner: SystemPlanner (accumulates chain mass until the target mass is met)
Plotting: Matplotlib

In [1]:

Copied!





from random import Random

import matplotlib.pyplot as plt
import numpy as np
import scienceplots

plt.style.use(["science", 'nature', "no-latex"])

import molpy as mp
from molpy.builder.polymer.sequence_generator import WeightedSequenceGenerator
from molpy.builder.polymer.system import (
    FlorySchulzPolydisperse,
    PoissonPolydisperse,
    PolydisperseChainGenerator,
    SchulzZimmPolydisperse,
    SystemPlanner,
    UniformPolydisperse,
    create_polydisperse_from_ir,
 )
from molpy.core.element import Element
from random import Random

import matplotlib.pyplot as plt
import numpy as np
import scienceplots

plt.style.use(["science", 'nature', "no-latex"])

import molpy as mp
from molpy.builder.polymer.sequence_generator import WeightedSequenceGenerator
from molpy.builder.polymer.system import (
    FlorySchulzPolydisperse,
    PoissonPolydisperse,
    PolydisperseChainGenerator,
    SchulzZimmPolydisperse,
    SystemPlanner,
    UniformPolydisperse,
    create_polydisperse_from_ir,
 )
from molpy.core.element import Element

Step 1: Declare Distributions in GBigSMILES¶

We use a compact GBigSMILES form like:

{repeat_units}|distribution(params)|end_group|target_mass|

Where:

repeat_units: repeat unit set (can include multiple monomers)
distribution(params): distribution type and parameters
end_group: end group (we use [H]. as a simple example)
target_mass: target total mass for the system (used by SystemPlanner.target_total_mass)

We compare four distributions:

Schulz–Zimm (continuous, mass space): schulz_zimm(Mn, Mw)
Uniform (discrete, DP space): uniform(min_dp, max_dp)
Poisson (discrete, DP space): poisson(mean_dp)
Flory–Schulz (discrete, DP space): flory_schulz(a) Checkpoint: you can parse each string and find a DistributionIR(name, params) in the resulting IR.

In [2]:

Copied!





def _build_distributions() -> list[dict]:
    """Configure the four distributions to test (GBigSMILES strings)."""
    return [
        {
            "name": "Schulz-Zimm",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|schulz_zimm(4800, 6000)|[H].|1e8|",
        },
        {
            "name": "Uniform",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|uniform(10,60)|[H].|1e8|",
        },
        {
            "name": "Poisson",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|poisson(30)|[H].|1e8|",
        },
        {
            "name": "Flory-Schulz",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|flory_schulz(0.06)|[H].|1e8|",
        },
    ]
def _build_distributions() -> list[dict]:
    """Configure the four distributions to test (GBigSMILES strings)."""
    return [
        {
            "name": "Schulz-Zimm",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|schulz_zimm(4800, 6000)|[H].|1e8|",
        },
        {
            "name": "Uniform",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|uniform(10,60)|[H].|1e8|",
        },
        {
            "name": "Poisson",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|poisson(30)|[H].|1e8|",
        },
        {
            "name": "Flory-Schulz",
            "gbigsmiles": "{[<]OCCOCCOCCOCCO[>],[<]OCC(c1ccccc1)CO[>]}|flory_schulz(0.06)|[H].|1e8|",
        },
    ]

Step 2: Parse GBigSMILES and Extract Distribution Information¶

For each distribution, we need to:

Parse the GBigSMILES string to get the DistributionIR
Extract repeat units and compute monomer masses
Create a samplable distribution object using create_polydisperse_from_ir

Checkpoint: After parsing, you should have a list of distribution objects ready for sampling.

In [3]:

Copied!





distributions_to_test = _build_distributions()
random_seed = 43

results: list[dict] = []

for dist_config in distributions_to_test:
    # Parse GBigSMiles string
    ir = mp.parser.parse_gbigsmiles(dist_config["gbigsmiles"])

    # Extract component information
    component_ir = ir.molecules[0]
    mol_ir = component_ir.molecule
    target_mass = component_ir.target_mass

    # Extract repeat units and convert to Atomistic structures
    repeat_units = mol_ir.structure.stochastic_objects[0].repeat_units
    stoch_obj = mol_ir.structure.stochastic_objects[0]
    
    # Convert graphs to Atomistic structures to compute masses
    from molpy.parser.smiles.converter import create_monomer_from_repeat_unit
    monomer_structures = []
    for unit in repeat_units:
        monomer = create_monomer_from_repeat_unit(unit, stoch_obj)
        if monomer is not None:
            monomer_structures.append(monomer)
    
    # Compute monomer masses by summing atomic masses
    monomer_masses_list = []
    for monomer in monomer_structures:
        total_mass = 0.0
        for atom in monomer.atoms:
            element_symbol = atom.get("element") or atom.get("symbol")
            if element_symbol:
                try:
                    elem = Element(element_symbol)
                    total_mass += elem.mass
                except KeyError:
                    # Skip unknown elements
                    pass
        monomer_masses_list.append(total_mass)

    # Equal weights for all monomers (can be customized)
    weights_dict = {str(i): 1.0 for i in range(len(monomer_structures))}

    # Extract distribution IR
    distribution_ir = None
    for meta in mol_ir.stochastic_metadata:
        if meta.distribution:
            distribution_ir = meta.distribution
            break
    if distribution_ir is None:
        raise ValueError(f"No distribution IR found for {dist_config['name']}")

    # Create samplable distribution object
    dist_obj = create_polydisperse_from_ir(distribution_ir, random_seed=random_seed)

    results.append(
        {
            "name": dist_config["name"],
            "dist_obj": dist_obj,
            "weights": weights_dict,
            "monomer_masses": {str(i): m for i, m in enumerate(monomer_masses_list)},
            "target_mass": float(target_mass) if target_mass is not None else None,
        }
    )

print(f"Parsed {len(results)} distributions:")
for r in results:
    print(f"  - {r['name']}: {type(r['dist_obj']).__name__}")
distributions_to_test = _build_distributions()
random_seed = 43

results: list[dict] = []

for dist_config in distributions_to_test:
    # Parse GBigSMiles string
    ir = mp.parser.parse_gbigsmiles(dist_config["gbigsmiles"])

    # Extract component information
    component_ir = ir.molecules[0]
    mol_ir = component_ir.molecule
    target_mass = component_ir.target_mass

    # Extract repeat units and convert to Atomistic structures
    repeat_units = mol_ir.structure.stochastic_objects[0].repeat_units
    stoch_obj = mol_ir.structure.stochastic_objects[0]
    
    # Convert graphs to Atomistic structures to compute masses
    from molpy.parser.smiles.converter import create_monomer_from_repeat_unit
    monomer_structures = []
    for unit in repeat_units:
        monomer = create_monomer_from_repeat_unit(unit, stoch_obj)
        if monomer is not None:
            monomer_structures.append(monomer)
    
    # Compute monomer masses by summing atomic masses
    monomer_masses_list = []
    for monomer in monomer_structures:
        total_mass = 0.0
        for atom in monomer.atoms:
            element_symbol = atom.get("element") or atom.get("symbol")
            if element_symbol:
                try:
                    elem = Element(element_symbol)
                    total_mass += elem.mass
                except KeyError:
                    # Skip unknown elements
                    pass
        monomer_masses_list.append(total_mass)

    # Equal weights for all monomers (can be customized)
    weights_dict = {str(i): 1.0 for i in range(len(monomer_structures))}

    # Extract distribution IR
    distribution_ir = None
    for meta in mol_ir.stochastic_metadata:
        if meta.distribution:
            distribution_ir = meta.distribution
            break
    if distribution_ir is None:
        raise ValueError(f"No distribution IR found for {dist_config['name']}")

    # Create samplable distribution object
    dist_obj = create_polydisperse_from_ir(distribution_ir, random_seed=random_seed)

    results.append(
        {
            "name": dist_config["name"],
            "dist_obj": dist_obj,
            "weights": weights_dict,
            "monomer_masses": {str(i): m for i, m in enumerate(monomer_masses_list)},
            "target_mass": float(target_mass) if target_mass is not None else None,
        }
    )

print(f"Parsed {len(results)} distributions:")
for r in results:
    print(f"  - {r['name']}: {type(r['dist_obj']).__name__}")

Parsed 4 distributions:
  - Schulz-Zimm: SchulzZimmPolydisperse
  - Uniform: UniformPolydisperse
  - Poisson: PoissonPolydisperse
  - Flory-Schulz: FlorySchulzPolydisperse

Step 3: Build Sequence Generator¶

The WeightedSequenceGenerator controls the composition of repeat units in each chain.

Here we use equal weights for all monomers, but you can customize this to control copolymer composition.

In [4]:

Copied!





# Build sequence generators for each distribution
for result in results:
    seq_gen = WeightedSequenceGenerator(monomer_weights=result["weights"])
    result["seq_generator"] = seq_gen
# Build sequence generators for each distribution
for result in results:
    seq_gen = WeightedSequenceGenerator(monomer_weights=result["weights"])
    result["seq_generator"] = seq_gen

Step 4: Build Chain Generator¶

The PolydisperseChainGenerator samples chain size (DP or mass) from the distribution and generates a chain sequence.

It requires:

A sequence generator (from step 4)
Monomer masses (to convert DP to chain mass)
End group mass (we use 0.0 for simplicity)
The distribution object (from step 3)

In [5]:

Copied!





# Build chain generators for each distribution
for result in results:
    chain_gen = PolydisperseChainGenerator(
        seq_generator=result["seq_generator"],
        monomer_mass=result["monomer_masses"],
        end_group_mass=0.0,
        distribution=result["dist_obj"],
    )
    result["chain_generator"] = chain_gen
# Build chain generators for each distribution
for result in results:
    chain_gen = PolydisperseChainGenerator(
        seq_generator=result["seq_generator"],
        monomer_mass=result["monomer_masses"],
        end_group_mass=0.0,
        distribution=result["dist_obj"],
    )
    result["chain_generator"] = chain_gen

Step 5: Sample Chains with System Planner¶

The SystemPlanner accumulates chains until the target total mass is reached.

It repeatedly requests chains from the PolydisperseChainGenerator until:

The accumulated mass reaches the target (within max_rel_error)
Or the maximum number of iterations is reached

Checkpoint: After planning, you should have a list of chains with varying molecular weights and DPs.

In [6]:

Copied!





# Sample chains for each distribution
for result in results:
    target_mass = result.get("target_mass", None)
    rng = Random(random_seed)
    planner = SystemPlanner(
        chain_generator=result["chain_generator"],
        target_total_mass=target_mass if target_mass is not None else 1e7,
        max_rel_error=0.02,
    )
    system_plan = planner.plan_system(rng)
    chains = system_plan.chains

    result["molecular_weights"] = [chain.mass for chain in chains]
    result["dps"] = [chain.dp for chain in chains]
    result["n_chains"] = len(chains)

print("Sampling complete:")
for r in results:
    print(f"  - {r['name']}: {r['n_chains']} chains, total mass = {sum(r['molecular_weights']):.2e} g/mol")
# Sample chains for each distribution
for result in results:
    target_mass = result.get("target_mass", None)
    rng = Random(random_seed)
    planner = SystemPlanner(
        chain_generator=result["chain_generator"],
        target_total_mass=target_mass if target_mass is not None else 1e7,
        max_rel_error=0.02,
    )
    system_plan = planner.plan_system(rng)
    chains = system_plan.chains

    result["molecular_weights"] = [chain.mass for chain in chains]
    result["dps"] = [chain.dp for chain in chains]
    result["n_chains"] = len(chains)

print("Sampling complete:")
for r in results:
    print(f"  - {r['name']}: {r['n_chains']} chains, total mass = {sum(r['molecular_weights']):.2e} g/mol")

Sampling complete:
  - Schulz-Zimm: 21200 chains, total mass = 1.00e+08 g/mol
  - Uniform: 18125 chains, total mass = 1.00e+08 g/mol
  - Poisson: 21062 chains, total mass = 1.00e+08 g/mol
  - Flory-Schulz: 19528 chains, total mass = 1.00e+08 g/mol

Step 6: Compute Statistics (Mn, Mw, and PDI)¶

From the sampled chains, we compute:

$M_n$ (number-average molecular weight): $\bar{M}_n = \frac{\sum M_i}{N}$
$M_w$ (weight-average molecular weight): $\bar{M}_w = \frac{\sum M_i^2}{\sum M_i}$
PDI (polydispersity index): $\mathrm{PDI} = \frac{M_w}{M_n}$

These values can be compared against the theoretical distribution parameters.

In [7]:

Copied!





# Compute Mn, Mw, PDI from sampled chains
for result in results:
    mw_array = np.array(result["molecular_weights"], dtype=float)
    Mn = float(np.mean(mw_array))
    Mw = float(np.sum(mw_array**2) / np.sum(mw_array))
    PDI = Mw / Mn

    result["Mn"] = Mn
    result["Mw"] = Mw
    result["PDI"] = PDI

print("Statistics:")
for r in results:
    print(f"  - {r['name']}:")
    print(f"    Mn = {r['Mn']:.0f} g/mol, Mw = {r['Mw']:.0f} g/mol, PDI = {r['PDI']:.3f}")
# Compute Mn, Mw, PDI from sampled chains
for result in results:
    mw_array = np.array(result["molecular_weights"], dtype=float)
    Mn = float(np.mean(mw_array))
    Mw = float(np.sum(mw_array**2) / np.sum(mw_array))
    PDI = Mw / Mn

    result["Mn"] = Mn
    result["Mw"] = Mw
    result["PDI"] = PDI

print("Statistics:")
for r in results:
    print(f"  - {r['name']}:")
    print(f"    Mn = {r['Mn']:.0f} g/mol, Mw = {r['Mw']:.0f} g/mol, PDI = {r['PDI']:.3f}")

Statistics:
  - Schulz-Zimm:
    Mn = 4717 g/mol, Mw = 5922 g/mol, PDI = 1.256
  - Uniform:
    Mn = 5517 g/mol, Mw = 6505 g/mol, PDI = 1.179
  - Poisson:
    Mn = 4748 g/mol, Mw = 4908 g/mol, PDI = 1.034
  - Flory-Schulz:
    Mn = 5121 g/mol, Mw = 7677 g/mol, PDI = 1.499

Step 7: Visualize Sampled vs Theoretical Distributions¶

We plot histograms of the sampled chains overlaid with the theoretical PDF (for continuous distributions) or PMF (for discrete distributions).

Important:

Schulz–Zimm is a continuous distribution in mass space (PDF). We plot mass_pdf(M) vs histogram of molecular weights.
Uniform, Poisson, Flory–Schulz are discrete distributions in DP space (PMF). We plot dp_pmf(n) vs histogram of DPs.

Do not overlay a PMF on a PDF (different units).

In [8]:

Copied!





# Helper function: add statistics box to each subplot
def annotate_stats(ax, Mn: float, Mw: float, PDI: float, n_chains: int) -> None:
    """Add statistics box at top-right of axis."""
    Mn_txt = f"{Mn:.0f}\\,\\mathrm{{g/mol}}"
    Mw_txt = f"{Mw:.0f}\\,\\mathrm{{g/mol}}"

    txt = (
        rf"$M_n={Mn_txt}$" "\n"
        rf"$M_w={Mw_txt}$" "\n"
        rf"$\mathrm{{PDI}}={PDI:.3f}$" "\n"
        rf"$N={n_chains}$"
    )
    ax.text(
        0.96, 0.96, txt,
        transform=ax.transAxes,
        ha="right", va="top",
        bbox=dict(boxstyle="round,pad=0.25", facecolor="white", edgecolor="none", alpha=0.90),
    )
# Helper function: add statistics box to each subplot
def annotate_stats(ax, Mn: float, Mw: float, PDI: float, n_chains: int) -> None:
    """Add statistics box at top-right of axis."""
    Mn_txt = f"{Mn:.0f}\\,\\mathrm{{g/mol}}"
    Mw_txt = f"{Mw:.0f}\\,\\mathrm{{g/mol}}"

    txt = (
        rf"$M_n={Mn_txt}$" "\n"
        rf"$M_w={Mw_txt}$" "\n"
        rf"$\mathrm{{PDI}}={PDI:.3f}$" "\n"
        rf"$N={n_chains}$"
    )
    ax.text(
        0.96, 0.96, txt,
        transform=ax.transAxes,
        ha="right", va="top",
        bbox=dict(boxstyle="round,pad=0.25", facecolor="white", edgecolor="none", alpha=0.90),
    )

In [9]:

Copied!





# Create figure with 2x2 subplots
fig, axes = plt.subplots(
    2, 2,
    figsize=(5.5, 5.2),
    constrained_layout=True,
    sharex=False,
    sharey=False,
)
axes = axes.flatten()

for idx, (ax, result) in enumerate(zip(axes, results)):
    dist_obj = result["dist_obj"]

    if isinstance(dist_obj, SchulzZimmPolydisperse):
        # Continuous distribution in mass space (PDF)
        bins = 60
        n_grid = 700
        x_pad_lo, x_pad_hi = 0.3, 1.3

        mw = np.asarray(result["molecular_weights"], dtype=float)
        ax.hist(mw, bins=bins, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        M_min = max(0.0, mw.min() * x_pad_lo)
        M_max = mw.max() * x_pad_hi
        M_grid = np.linspace(M_min, M_max, n_grid)
        pdf = dist_obj.mass_pdf(M_grid)
        ax.plot(M_grid, pdf, label="Fit" if idx == 0 else "")

        Mn_val = result["Mn"]
        Mw_val = result["Mw"]
        pdf_Mn = float(dist_obj.mass_pdf(Mn_val))
        pdf_Mw = float(dist_obj.mass_pdf(Mw_val))
        ax.plot([Mn_val, Mn_val], [0, pdf_Mn], color="C2", linestyle="--")
        ax.plot([Mw_val, Mw_val], [0, pdf_Mw], color="C3", linestyle="--")
        
        # Add text labels for Mn and Mw
        ylim = ax.get_ylim()
        ax.text(Mn_val * 1.15, pdf_Mn * 1.05, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(Mw_val * 1.1, pdf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Molecular weight $M$ (g/mol)")
        ax.set_ylabel(r"PDF $f_M(M)$")

    elif isinstance(dist_obj, UniformPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        support = np.arange(dp_min, dp_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")
        ax.set_ylim(0, pmf.max() * 1.5)

    elif isinstance(dist_obj, PoissonPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        theory_min = 1
        theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
        support = np.arange(theory_min, theory_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")

    elif isinstance(dist_obj, FlorySchulzPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        theory_min = 1
        theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
        support = np.arange(theory_min, theory_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")
        
    else:
        raise TypeError(f"Unsupported distribution type: {type(dist_obj)}")

    ax.set_title(result["name"])
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
    annotate_stats(ax, result["Mn"], result["Mw"], result["PDI"], result["n_chains"])
    if idx == 0:
        ax.legend(loc="best")

plt.show()
# Create figure with 2x2 subplots
fig, axes = plt.subplots(
    2, 2,
    figsize=(5.5, 5.2),
    constrained_layout=True,
    sharex=False,
    sharey=False,
)
axes = axes.flatten()

for idx, (ax, result) in enumerate(zip(axes, results)):
    dist_obj = result["dist_obj"]

    if isinstance(dist_obj, SchulzZimmPolydisperse):
        # Continuous distribution in mass space (PDF)
        bins = 60
        n_grid = 700
        x_pad_lo, x_pad_hi = 0.3, 1.3

        mw = np.asarray(result["molecular_weights"], dtype=float)
        ax.hist(mw, bins=bins, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        M_min = max(0.0, mw.min() * x_pad_lo)
        M_max = mw.max() * x_pad_hi
        M_grid = np.linspace(M_min, M_max, n_grid)
        pdf = dist_obj.mass_pdf(M_grid)
        ax.plot(M_grid, pdf, label="Fit" if idx == 0 else "")

        Mn_val = result["Mn"]
        Mw_val = result["Mw"]
        pdf_Mn = float(dist_obj.mass_pdf(Mn_val))
        pdf_Mw = float(dist_obj.mass_pdf(Mw_val))
        ax.plot([Mn_val, Mn_val], [0, pdf_Mn], color="C2", linestyle="--")
        ax.plot([Mw_val, Mw_val], [0, pdf_Mw], color="C3", linestyle="--")
        
        # Add text labels for Mn and Mw
        ylim = ax.get_ylim()
        ax.text(Mn_val * 1.15, pdf_Mn * 1.05, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(Mw_val * 1.1, pdf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Molecular weight $M$ (g/mol)")
        ax.set_ylabel(r"PDF $f_M(M)$")

    elif isinstance(dist_obj, UniformPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        support = np.arange(dp_min, dp_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")
        ax.set_ylim(0, pmf.max() * 1.5)

    elif isinstance(dist_obj, PoissonPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        theory_min = 1
        theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
        support = np.arange(theory_min, theory_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")

    elif isinstance(dist_obj, FlorySchulzPolydisperse):
        # Discrete distribution in DP space (PMF)
        dp = np.asarray(result["dps"], dtype=int)
        dp_min, dp_max = int(dp.min()), int(dp.max())
        bins_edges = np.arange(dp_min - 0.5, dp_max + 1.5, 1.0)
        ax.hist(dp, bins=bins_edges, density=True, histtype="step", label="Chain hist" if idx == 0 else "")

        theory_min = 1
        theory_max = max(dp_max, int(dp.mean() + 4 * dp.std()))
        support = np.arange(theory_min, theory_max + 1)
        pmf = dist_obj.dp_pmf(support)
        ax.plot(support, pmf, marker="o", label="Fit" if idx == 0 else "")

        # Calculate DP values for Mn and Mw
        monomer_masses = list(result["monomer_masses"].values())
        avg_monomer_mass = np.mean(monomer_masses)
        dp_Mn = result["Mn"] / avg_monomer_mass
        dp_Mw = result["Mw"] / avg_monomer_mass
        
        # Find closest DP in support for plotting
        dp_Mn_idx = np.argmin(np.abs(support - dp_Mn))
        dp_Mw_idx = np.argmin(np.abs(support - dp_Mw))
        dp_Mn_plot = support[dp_Mn_idx]
        dp_Mw_plot = support[dp_Mw_idx]
        pmf_Mn = pmf[dp_Mn_idx]
        pmf_Mw = pmf[dp_Mw_idx]
        
        ax.plot([dp_Mn_plot, dp_Mn_plot], [0, pmf_Mn], color="C2", linestyle="--")
        ax.plot([dp_Mw_plot, dp_Mw_plot], [0, pmf_Mw], color="C3", linestyle="--")
        ax.text(dp_Mn_plot * 1.1, pmf_Mn * 1.1, f"$M_n$", color="C2", ha="center", va="bottom")
        ax.text(dp_Mw_plot * 1.1, pmf_Mw * 1.1, f"$M_w$", color="C3", ha="center", va="bottom")

        ax.set_xlabel(r"Degree of polymerization $n$")
        ax.set_ylabel(r"PMF $P(n)$")
        
    else:
        raise TypeError(f"Unsupported distribution type: {type(dist_obj)}")

    ax.set_title(result["name"])
    ax.ticklabel_format(axis="y", style="sci", scilimits=(0, 0))
    annotate_stats(ax, result["Mn"], result["Mw"], result["PDI"], result["n_chains"])
    if idx == 0:
        ax.legend(loc="best")

plt.show()

No description has been provided for this image

Interpretation¶

PDF vs PMF: use mass_pdf(M) for continuous mass distributions (Schulz–Zimm), and dp_pmf(n) for discrete DP distributions (Uniform/Poisson/Flory–Schulz). Do not overlay a PMF on a PDF (different units).
Target total mass: a large target_mass generates many chains. For quick smoke tests, reduce the target_total_mass or use a smaller value in the GBigSMILES string.
Reproducibility: this example fixes both the distribution random_seed and the Python Random(...) seed so runs are repeatable.

Extensions¶

Turn a system plan into an atomistic structure: feed SystemPlan.chains into a PolymerBuilder (atomistic build layer), then pack with Packmol to create an initial configuration.
Fit to experimental distributions: implement a custom distribution that provides mass_pdf/sample_mass and plug it into the same pipeline.
Control copolymer composition: adjust WeightedSequenceGenerator weights to change the repeat-unit composition.

Summary¶

You now have the standard path for generating a polydisperse polymer system plan with MolPy:

parse_gbigsmiles parses repeat units and DistributionIR from a GBigSMILES string
create_polydisperse_from_ir turns DistributionIR into a samplable distribution (DP space or mass space)
PolydisperseChainGenerator samples chain size and generates sequences + chain mass
SystemPlanner accumulates chains until the target total mass is reached
Finally, compute $M_n$ / $M_w$ / PDI from the sampled chains and visualize histogram vs theoretical PMF/PDF

To convert this chain plan into an atomistic system suitable for LAMMPS, continue with the atomistic build + packing guides.