Skip to content

Builder

The builder module provides tools for building molecular systems, such as crystals and polymers.

Crystal

Crystal lattice builder module - LAMMPS-style crystal structure generator.

This module provides tools for creating crystal structures: - Define Bravais lattices with basis sites - Predefined common lattice types (SC, BCC, FCC, rocksalt) - Define regions in lattice or Cartesian coordinates - Efficient vectorized unit cell tiling and atom generation

Example

lat = Lattice.cubic_fcc(a=3.52, species="Ni") region = BlockRegion(0, 10, 0, 10, 0, 10, coord_system="lattice") builder = CrystalBuilder(lat) structure = builder.build_block(region)

BlockRegion

BlockRegion(xmin, xmax, ymin, ymax, zmin, zmax, coord_system='lattice')

Bases: Region

Axis-aligned box region

Define a box region specified by x, y, z ranges.

Parameters

xmin, xmax : float x-direction range [xmin, xmax] ymin, ymax : float y-direction range [ymin, ymax] zmin, zmax : float z-direction range [zmin, zmax] coord_system : CoordSystem, optional Coordinate system, default is "lattice"

Examples

Region in lattice coordinates

region = BlockRegion(0, 10, 0, 10, 0, 10, coord_system="lattice")

Region in Cartesian coordinates

region = BlockRegion(0, 30, 0, 30, 0, 30, coord_system="cartesian")

Initialize box region

Parameters

xmin, xmax : float x-direction range ymin, ymax : float y-direction range zmin, zmax : float z-direction range coord_system : CoordSystem, optional Coordinate system

Source code in src/molpy/builder/crystal.py
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
def __init__(
    self,
    xmin: float,
    xmax: float,
    ymin: float,
    ymax: float,
    zmin: float,
    zmax: float,
    coord_system: CoordSystem = "lattice",
) -> None:
    """
    Initialize box region

    Parameters
    ----------
    xmin, xmax : float
        x-direction range
    ymin, ymax : float
        y-direction range
    zmin, zmax : float
        z-direction range
    coord_system : CoordSystem, optional
        Coordinate system
    """
    super().__init__(coord_system)
    self.xmin = xmin
    self.xmax = xmax
    self.ymin = ymin
    self.ymax = ymax
    self.zmin = zmin
    self.zmax = zmax

contains_mask

contains_mask(points)

Check if points are in the box (vectorized)

Parameters

points : np.ndarray Point coordinates array of shape (N, 3)

Returns

np.ndarray Boolean array of shape (N,)

Examples

region = BlockRegion(0, 10, 0, 10, 0, 10) points = np.array([[5, 5, 5], [15, 5, 5]]) mask = region.contains_mask(points) print(mask) # [True, False]

Source code in src/molpy/builder/crystal.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
def contains_mask(self, points: np.ndarray) -> np.ndarray:
    """
    Check if points are in the box (vectorized)

    Parameters
    ----------
    points : np.ndarray
        Point coordinates array of shape (N, 3)

    Returns
    -------
    np.ndarray
        Boolean array of shape (N,)

    Examples
    --------
    >>> region = BlockRegion(0, 10, 0, 10, 0, 10)
    >>> points = np.array([[5, 5, 5], [15, 5, 5]])
    >>> mask = region.contains_mask(points)
    >>> print(mask)  # [True, False]
    """
    points = np.asarray(points, dtype=float)
    if points.ndim == 1:
        points = points.reshape(1, -1)

    x = points[:, 0]
    y = points[:, 1]
    z = points[:, 2]

    # Vectorized boundary check
    mask = (
        (self.xmin <= x)
        & (x <= self.xmax)
        & (self.ymin <= y)
        & (y <= self.ymax)
        & (self.zmin <= z)
        & (z <= self.zmax)
    )

    return mask

CrystalBuilder

CrystalBuilder(lattice)

Crystal structure builder

Efficiently generate crystal structures using NumPy vectorized operations. Supports tiling lattices and creating atoms in specified regions.

Parameters

lattice : Lattice Lattice definition to use

Examples

Create a simple FCC structure

lat = Lattice.cubic_fcc(a=3.52, species="Ni") region = BlockRegion(0, 10, 0, 10, 0, 10, coord_system="lattice") builder = CrystalBuilder(lat) structure = builder.build_block(region) print(len(structure.atoms))

Initialize crystal builder

Parameters

lattice : Lattice Lattice definition

Source code in src/molpy/builder/crystal.py
504
505
506
507
508
509
510
511
512
513
def __init__(self, lattice: Lattice) -> None:
    """
    Initialize crystal builder

    Parameters
    ----------
    lattice : Lattice
        Lattice definition
    """
    self.lattice = lattice

build_block

build_block(region, *, i_range=None, j_range=None, k_range=None)

Build crystal structure within a box region

This method efficiently generates crystal structures using vectorized operations: 1. Determine cell index ranges to tile 2. Use NumPy meshgrid and broadcasting to generate all atom positions 3. Apply region filtering 4. Create and return Atomistic structure

Parameters

region : BlockRegion Box defining the region for atom generation i_range, j_range, k_range : range | None, optional Explicitly specify cell index ranges. If not provided: - For "lattice" coordinate system: inferred from region boundaries - For "cartesian" coordinate system: must be provided, otherwise raises error

Returns

Atomistic Generated crystal structure containing atoms and box information

Raises

ValueError If coord_system == "cartesian" and explicit ranges are not provided

Examples

Using lattice coordinates (auto-infer ranges)

lat = Lattice.cubic_sc(a=2.0, species="Cu") region = BlockRegion(0, 10, 0, 10, 0, 10, coord_system="lattice") builder = CrystalBuilder(lat) structure = builder.build_block(region)

Using explicit ranges

structure = builder.build_block( ... region, ... i_range=range(0, 5), ... j_range=range(0, 5), ... k_range=range(0, 5) ... )

Cartesian coordinates (must provide ranges)

region_cart = BlockRegion(0, 20, 0, 20, 0, 20, coord_system="cartesian") structure = builder.build_block( ... region_cart, ... i_range=range(0, 10), ... j_range=range(0, 10), ... k_range=range(0, 10) ... )

Notes
  • This method uses no Python loops, fully based on NumPy vectorized operations
  • Generated structure contains:
  • Atom positions (Cartesian coordinates)
  • Atom species
  • Box information (lattice vectors)
  • For empty basis (no basis sites), returns empty Atomistic structure
Source code in src/molpy/builder/crystal.py
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
def build_block(
    self,
    region: BlockRegion,
    *,
    i_range: range | None = None,
    j_range: range | None = None,
    k_range: range | None = None,
) -> Atomistic:
    """
    Build crystal structure within a box region

    This method efficiently generates crystal structures using vectorized operations:
    1. Determine cell index ranges to tile
    2. Use NumPy meshgrid and broadcasting to generate all atom positions
    3. Apply region filtering
    4. Create and return Atomistic structure

    Parameters
    ----------
    region : BlockRegion
        Box defining the region for atom generation
    i_range, j_range, k_range : range | None, optional
        Explicitly specify cell index ranges. If not provided:
        - For "lattice" coordinate system: inferred from region boundaries
        - For "cartesian" coordinate system: must be provided, otherwise raises error

    Returns
    -------
    Atomistic
        Generated crystal structure containing atoms and box information

    Raises
    ------
    ValueError
        If coord_system == "cartesian" and explicit ranges are not provided

    Examples
    --------
    >>> # Using lattice coordinates (auto-infer ranges)
    >>> lat = Lattice.cubic_sc(a=2.0, species="Cu")
    >>> region = BlockRegion(0, 10, 0, 10, 0, 10, coord_system="lattice")
    >>> builder = CrystalBuilder(lat)
    >>> structure = builder.build_block(region)
    >>>
    >>> # Using explicit ranges
    >>> structure = builder.build_block(
    ...     region,
    ...     i_range=range(0, 5),
    ...     j_range=range(0, 5),
    ...     k_range=range(0, 5)
    ... )
    >>>
    >>> # Cartesian coordinates (must provide ranges)
    >>> region_cart = BlockRegion(0, 20, 0, 20, 0, 20, coord_system="cartesian")
    >>> structure = builder.build_block(
    ...     region_cart,
    ...     i_range=range(0, 10),
    ...     j_range=range(0, 10),
    ...     k_range=range(0, 10)
    ... )

    Notes
    -----
    - This method uses no Python loops, fully based on NumPy vectorized operations
    - Generated structure contains:
      - Atom positions (Cartesian coordinates)
      - Atom species
      - Box information (lattice vectors)
    - For empty basis (no basis sites), returns empty Atomistic structure
    """
    basis_sites = self.lattice.basis

    # If no basis sites, return empty structure
    if len(basis_sites) == 0:
        structure = Atomistic()
        structure["box"] = Box(matrix=self.lattice.cell)
        return structure

    # Step 1: Determine cell index ranges
    if i_range is not None and j_range is not None and k_range is not None:
        # Use explicitly provided ranges
        i_arr = np.array(list(i_range))
        j_arr = np.array(list(j_range))
        k_arr = np.array(list(k_range))
    elif region.coord_system == "lattice":
        # Infer ranges from lattice coordinate region
        i_min = int(np.floor(region.xmin))
        i_max = int(np.ceil(region.xmax))
        j_min = int(np.floor(region.ymin))
        j_max = int(np.ceil(region.ymax))
        k_min = int(np.floor(region.zmin))
        k_max = int(np.ceil(region.zmax))

        i_arr = np.arange(i_min, i_max)
        j_arr = np.arange(j_min, j_max)
        k_arr = np.arange(k_min, k_max)
    else:
        # Cartesian coordinate system requires explicit ranges
        raise ValueError(
            "For cartesian coordinate system, you must provide explicit "
            "i_range, j_range, and k_range parameters."
        )

    # Step 2: Build cell index grid using NumPy
    I, J, K = np.meshgrid(i_arr, j_arr, k_arr, indexing="ij")
    cells = np.stack([I, J, K], axis=-1).reshape(-1, 3)  # (Nc, 3)

    # Step 3: Combine cells and basis via broadcasting (no Python loops)
    basis_fracs = np.array(
        [list(s.frac) for s in basis_sites], dtype=float
    )  # (Nb, 3)

    # Broadcasting: cells[:, None, :] is (Nc, 1, 3), basis_fracs[None, :, :] is (1, Nb, 3)
    # Result is (Nc, Nb, 3)
    frac_lattice = cells[:, None, :] + basis_fracs[None, :, :]  # (Nc, Nb, 3)
    frac_lattice = frac_lattice.reshape(-1, 3)  # (N, 3) where N = Nc * Nb

    # Step 4: Region filtering + coordinate system conversion
    if region.coord_system == "lattice":
        # Filter in lattice coordinates
        mask = region.contains_mask(frac_lattice)
        frac_selected = frac_lattice[mask]

        # Convert to Cartesian coordinates
        cart_selected = self.lattice.frac_to_cart(frac_selected)
    else:
        # coord_system == "cartesian"
        # First convert all coordinates to Cartesian
        cart_all = self.lattice.frac_to_cart(frac_lattice)
        mask = region.contains_mask(cart_all)
        cart_selected = cart_all[mask]
        frac_selected = frac_lattice[mask]

    # Step 5: Get species and site info via vectorization
    Nc = cells.shape[0]
    species_array = np.array(
        [s.species for s in basis_sites], dtype=object
    )  # (Nb,)
    site_array = np.array(basis_sites, dtype=object)  # (Nb,)

    # Tile to match each cell
    species_tiled = np.tile(species_array, Nc)  # (N,)
    site_tiled = np.tile(site_array, Nc)  # (N,)

    # Apply filtering
    species_selected = species_tiled[mask]  # (M,)
    site_selected = site_tiled[mask]  # (M,)

    # Step 6: Build project's Atomistic structure type
    structure = Atomistic()

    # Add atoms
    for i in range(len(cart_selected)):
        xyz = cart_selected[i]
        species = species_selected[i]
        site = site_selected[i]

        structure.def_atom(
            xyz=xyz.tolist(),
            symbol=species,
            charge=site.charge,
            label=site.label,
        )

    # Set box (using lattice matrix)
    structure["box"] = Box(matrix=self.lattice.cell)

    return structure

Lattice

Lattice(a1, a2, a3, basis)

Bravais lattice with basis sites.

This class defines a crystal lattice structure, including lattice vectors and basis sites. Lattice vectors define the shape and size of the unit cell, while basis sites define the positions of atoms within the cell (in fractional coordinates).

Parameters

a1, a2, a3 : np.ndarray Lattice vectors, each is a NumPy array of shape (3,) basis : list[Site] List of basis sites in fractional coordinates

Attributes

a1, a2, a3 : np.ndarray Lattice vectors basis : list[Site] List of basis sites

Examples

Create simple cubic lattice

lat = Lattice.cubic_sc(a=2.0, species="Cu")

Create face-centered cubic lattice

lat = Lattice.cubic_fcc(a=3.52, species="Ni")

Create rocksalt structure

lat = Lattice.rocksalt(a=5.64, species_a="Na", species_b="Cl")

Initialize lattice

Parameters

a1, a2, a3 : np.ndarray Lattice vectors of shape (3,) basis : list[Site] List of basis sites

Source code in src/molpy/builder/crystal.py
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def __init__(
    self, a1: np.ndarray, a2: np.ndarray, a3: np.ndarray, basis: list[Site]
) -> None:
    """
    Initialize lattice

    Parameters
    ----------
    a1, a2, a3 : np.ndarray
        Lattice vectors of shape (3,)
    basis : list[Site]
        List of basis sites
    """
    self.a1 = np.asarray(a1, dtype=float)
    self.a2 = np.asarray(a2, dtype=float)
    self.a3 = np.asarray(a3, dtype=float)
    self.basis = list(basis)

    # Validate shape
    assert self.a1.shape == (3,), "a1 must be a 1D array of length 3"
    assert self.a2.shape == (3,), "a2 must be a 1D array of length 3"
    assert self.a3.shape == (3,), "a3 must be a 1D array of length 3"

cell property

cell

Return 3×3 cell matrix with lattice vectors as rows

Returns

np.ndarray Matrix of shape (3, 3), each row is a lattice vector [a1; a2; a3]

add_site

add_site(site)

Add a basis site

Parameters

site : Site Basis site to add

Source code in src/molpy/builder/crystal.py
133
134
135
136
137
138
139
140
141
142
def add_site(self, site: Site) -> None:
    """
    Add a basis site

    Parameters
    ----------
    site : Site
        Basis site to add
    """
    self.basis.append(site)

cubic_bcc classmethod

cubic_bcc(a, species)

Create body-centered cubic (Body-Centered Cubic, BCC) lattice

Body-centered cubic lattice has two atoms per unit cell: one at corner and one at body center.

Parameters

a : float Lattice constant (in Å) species : str Atomic species (e.g., "Fe", "W")

Returns

Lattice Body-centered cubic lattice

Examples

lat = Lattice.cubic_bcc(a=3.0, species="Fe") print(len(lat.basis)) # 2

Source code in src/molpy/builder/crystal.py
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
@classmethod
def cubic_bcc(cls, a: float, species: str) -> Lattice:
    """
    Create body-centered cubic (Body-Centered Cubic, BCC) lattice

    Body-centered cubic lattice has two atoms per unit cell: one at corner and one at body center.

    Parameters
    ----------
    a : float
        Lattice constant (in Å)
    species : str
        Atomic species (e.g., "Fe", "W")

    Returns
    -------
    Lattice
        Body-centered cubic lattice

    Examples
    --------
    >>> lat = Lattice.cubic_bcc(a=3.0, species="Fe")
    >>> print(len(lat.basis))  # 2
    """
    a1 = np.array([a, 0.0, 0.0])
    a2 = np.array([0.0, a, 0.0])
    a3 = np.array([0.0, 0.0, a])

    basis = [
        Site(label="A", species=species, frac=(0.0, 0.0, 0.0)),
        Site(label="B", species=species, frac=(0.5, 0.5, 0.5)),
    ]

    return cls(a1=a1, a2=a2, a3=a3, basis=basis)

cubic_fcc classmethod

cubic_fcc(a, species)

Create face-centered cubic (Face-Centered Cubic, FCC) lattice

Face-centered cubic lattice has four atoms per unit cell: one at corner and one at each face center.

Parameters

a : float Lattice constant (in Å) species : str Atomic species (e.g., "Ni", "Cu", "Al")

Returns

Lattice Face-centered cubic lattice

Examples

lat = Lattice.cubic_fcc(a=3.52, species="Ni") print(len(lat.basis)) # 4

Source code in src/molpy/builder/crystal.py
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
@classmethod
def cubic_fcc(cls, a: float, species: str) -> Lattice:
    """
    Create face-centered cubic (Face-Centered Cubic, FCC) lattice

    Face-centered cubic lattice has four atoms per unit cell: one at corner and one at each face center.

    Parameters
    ----------
    a : float
        Lattice constant (in Å)
    species : str
        Atomic species (e.g., "Ni", "Cu", "Al")

    Returns
    -------
    Lattice
        Face-centered cubic lattice

    Examples
    --------
    >>> lat = Lattice.cubic_fcc(a=3.52, species="Ni")
    >>> print(len(lat.basis))  # 4
    """
    a1 = np.array([a, 0.0, 0.0])
    a2 = np.array([0.0, a, 0.0])
    a3 = np.array([0.0, 0.0, a])

    basis = [
        Site(label="A", species=species, frac=(0.0, 0.0, 0.0)),
        Site(label="B", species=species, frac=(0.5, 0.5, 0.0)),
        Site(label="C", species=species, frac=(0.5, 0.0, 0.5)),
        Site(label="D", species=species, frac=(0.0, 0.5, 0.5)),
    ]

    return cls(a1=a1, a2=a2, a3=a3, basis=basis)

cubic_sc classmethod

cubic_sc(a, species)

Create simple cubic (Simple Cubic, SC) lattice

Simple cubic lattice is the simplest lattice type with one atom per unit cell.

Parameters

a : float Lattice constant (in Å) species : str Atomic species (e.g., "Cu", "Fe")

Returns

Lattice Simple cubic lattice

Examples

lat = Lattice.cubic_sc(a=2.0, species="Cu") print(len(lat.basis)) # 1

Source code in src/molpy/builder/crystal.py
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
@classmethod
def cubic_sc(cls, a: float, species: str) -> Lattice:
    """
    Create simple cubic (Simple Cubic, SC) lattice

    Simple cubic lattice is the simplest lattice type with one atom per unit cell.

    Parameters
    ----------
    a : float
        Lattice constant (in Å)
    species : str
        Atomic species (e.g., "Cu", "Fe")

    Returns
    -------
    Lattice
        Simple cubic lattice

    Examples
    --------
    >>> lat = Lattice.cubic_sc(a=2.0, species="Cu")
    >>> print(len(lat.basis))  # 1
    """
    a1 = np.array([a, 0.0, 0.0])
    a2 = np.array([0.0, a, 0.0])
    a3 = np.array([0.0, 0.0, a])

    basis = [Site(label="A", species=species, frac=(0.0, 0.0, 0.0))]

    return cls(a1=a1, a2=a2, a3=a3, basis=basis)

frac_to_cart

frac_to_cart(frac)

Convert fractional coordinates to Cartesian coordinates

Fractional coordinates (u, v, w) represent position relative to lattice vectors: cart = ua1 + va2 + w*a3 = frac @ cell

Parameters

frac : np.ndarray Fractional coordinates of shape (N, 3) or (3,), containing (u, v, w) values

Returns

np.ndarray Cartesian coordinates, same shape as input

Examples

lat = Lattice.cubic_sc(a=2.0, species="Cu") frac = np.array([0.5, 0.5, 0.5]) cart = lat.frac_to_cart(frac) print(cart) # [1.0, 1.0, 1.0]

Source code in src/molpy/builder/crystal.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def frac_to_cart(self, frac: np.ndarray) -> np.ndarray:
    """
    Convert fractional coordinates to Cartesian coordinates

    Fractional coordinates (u, v, w) represent position relative to lattice vectors:
    cart = u*a1 + v*a2 + w*a3 = frac @ cell

    Parameters
    ----------
    frac : np.ndarray
        Fractional coordinates of shape (N, 3) or (3,), containing (u, v, w) values

    Returns
    -------
    np.ndarray
        Cartesian coordinates, same shape as input

    Examples
    --------
    >>> lat = Lattice.cubic_sc(a=2.0, species="Cu")
    >>> frac = np.array([0.5, 0.5, 0.5])
    >>> cart = lat.frac_to_cart(frac)
    >>> print(cart)  # [1.0, 1.0, 1.0]
    """
    frac = np.asarray(frac, dtype=float)
    # Use single matrix multiplication: cart = frac @ cell
    return frac @ self.cell

rocksalt classmethod

rocksalt(a, species_a, species_b)

Create rocksalt (NaCl) structure

Rocksalt structure consists of two interpenetrating FCC sublattices. Each unit cell contains 4 A atoms and 4 B atoms.

Parameters

a : float Lattice constant (in Å) species_a : str First atomic species (e.g., "Na") species_b : str Second atomic species (e.g., "Cl")

Returns

Lattice Rocksalt structure lattice

Examples

lat = Lattice.rocksalt(a=5.64, species_a="Na", species_b="Cl") print(len(lat.basis)) # 8

Source code in src/molpy/builder/crystal.py
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
@classmethod
def rocksalt(cls, a: float, species_a: str, species_b: str) -> Lattice:
    """
    Create rocksalt (NaCl) structure

    Rocksalt structure consists of two interpenetrating FCC sublattices.
    Each unit cell contains 4 A atoms and 4 B atoms.

    Parameters
    ----------
    a : float
        Lattice constant (in Å)
    species_a : str
        First atomic species (e.g., "Na")
    species_b : str
        Second atomic species (e.g., "Cl")

    Returns
    -------
    Lattice
        Rocksalt structure lattice

    Examples
    --------
    >>> lat = Lattice.rocksalt(a=5.64, species_a="Na", species_b="Cl")
    >>> print(len(lat.basis))  # 8
    """
    a1 = np.array([a, 0.0, 0.0])
    a2 = np.array([0.0, a, 0.0])
    a3 = np.array([0.0, 0.0, a])

    # FCC sites for species_a
    basis_a = [
        Site(label="A1", species=species_a, frac=(0.0, 0.0, 0.0)),
        Site(label="A2", species=species_a, frac=(0.5, 0.5, 0.0)),
        Site(label="A3", species=species_a, frac=(0.5, 0.0, 0.5)),
        Site(label="A4", species=species_a, frac=(0.0, 0.5, 0.5)),
    ]

    # Offset FCC sites for species_b (shifted by 0.5 along x direction)
    basis_b = [
        Site(label="B1", species=species_b, frac=(0.5, 0.0, 0.0)),
        Site(label="B2", species=species_b, frac=(0.0, 0.5, 0.0)),
        Site(label="B3", species=species_b, frac=(0.0, 0.0, 0.5)),
        Site(label="B4", species=species_b, frac=(0.5, 0.5, 0.5)),
    ]

    basis = basis_a + basis_b

    return cls(a1=a1, a2=a2, a3=a3, basis=basis)

Region

Region(coord_system='lattice')

Bases: ABC

Abstract geometric region class

Define a spatial region that can be represented in lattice or Cartesian coordinates.

Parameters

coord_system : CoordSystem Coordinate system, "lattice" or "cartesian" - "lattice": point coordinates in lattice units - "cartesian": point coordinates in Cartesian coordinates (Å)

Notes

Subclasses must implement contains_mask method using NumPy vectorized operations to efficiently check if multiple points are in the region.

Initialize region

Parameters

coord_system : CoordSystem, optional Coordinate system, default is "lattice"

Source code in src/molpy/builder/crystal.py
347
348
349
350
351
352
353
354
355
356
def __init__(self, coord_system: CoordSystem = "lattice") -> None:
    """
    Initialize region

    Parameters
    ----------
    coord_system : CoordSystem, optional
        Coordinate system, default is "lattice"
    """
    self.coord_system = coord_system

contains_mask abstractmethod

contains_mask(points)

Check if points are in the region (vectorized)

Parameters

points : np.ndarray Point coordinates array of shape (N, 3)

Returns

np.ndarray Boolean array of shape (N,), True indicates point is in region

Notes
  • If coord_system == "lattice": points are in lattice units
  • If coord_system == "cartesian": points are in Cartesian coordinates
  • Must use vectorized operations, no Python loops
Source code in src/molpy/builder/crystal.py
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
@abstractmethod
def contains_mask(self, points: np.ndarray) -> np.ndarray:
    """
    Check if points are in the region (vectorized)

    Parameters
    ----------
    points : np.ndarray
        Point coordinates array of shape (N, 3)

    Returns
    -------
    np.ndarray
        Boolean array of shape (N,), True indicates point is in region

    Notes
    -----
    - If coord_system == "lattice": points are in lattice units
    - If coord_system == "cartesian": points are in Cartesian coordinates
    - Must use vectorized operations, no Python loops
    """
    pass

Site dataclass

Site(label, species, frac, charge=0.0, meta=None)

Lattice basis site in fractional coordinates.

Attributes

label : str Site identifier or name (e.g., "A", "B1") species : str Chemical species or type name (e.g., "Ni", "Na", "Cl") frac : tuple[float, float, float] Fractional coordinates (u, v, w) relative to the Bravais cell, typically in [0, 1) charge : float, optional Charge, default is 0.0 meta : dict[str, Any] | None, optional Optional metadata dictionary

Examples

site = Site(label="A", species="Cu", frac=(0.0, 0.0, 0.0)) site_charged = Site(label="Na", species="Na", frac=(0.0, 0.0, 0.0), charge=1.0)

Polymer

Polymer assembly module.

Provides linear polymer assembly with both topology-only and chemical reaction connectors, plus optional geometric placement via Placer strategies.

AutoConnector

Bases: Connector

BigSMILES-guided automatic port selection.

Strategy: 1. If left has port with role='right' and right has role='left' -> use those 2. Else if each side has exactly one unconsumed port -> use that pair 3. Else raise AmbiguousPortsError

This implements the common case where: - BigSMILES uses [<] for "left" role and [>] for "right" role - We connect left's "right" port to right's "left" port

Ports are stored directly on atoms using the "port" or "ports" attribute.

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Select ports using BigSMILES role heuristics.

Source code in src/molpy/builder/polymer/connectors.py
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],
    right_ports: Mapping[str, list[PortInfo]],
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """Select ports using BigSMILES role heuristics."""

    # Strategy 1: Try role-based selection (BigSMILES < and >)
    # Case 1a: left has role='right', right has role='left' (normal chain extension)
    # Flatten lists and find ports with matching roles
    left_right_role = [
        (name, idx)
        for name, port_list in left_ports.items()
        for idx, p in enumerate(port_list)
        if p.role == "right"
    ]
    right_left_role = [
        (name, idx)
        for name, port_list in right_ports.items()
        for idx, p in enumerate(port_list)
        if p.role == "left"
    ]

    if len(left_right_role) == 1 and len(right_left_role) == 1:
        return (
            left_right_role[0][0],
            left_right_role[0][1],
            right_left_role[0][0],
            right_left_role[0][1],
            None,
        )

    # Case 1b: left has role='left', right has role='right' (terminus connection)
    # E.g., HO[*:t] (left terminus) connects to CC[>] (right-directed monomer)
    left_left_role = [
        (name, idx)
        for name, port_list in left_ports.items()
        for idx, p in enumerate(port_list)
        if p.role == "left"
    ]
    right_right_role = [
        (name, idx)
        for name, port_list in right_ports.items()
        for idx, p in enumerate(port_list)
        if p.role == "right"
    ]

    if len(left_left_role) == 1 and len(right_right_role) == 1:
        return (
            left_left_role[0][0],
            left_left_role[0][1],
            right_right_role[0][0],
            right_right_role[0][1],
            None,
        )

    # Strategy 2: If both sides have the same port name(s) (e.g., all $), randomly select one
    # Since all ports are equivalent, just pick the first available one
    if len(left_ports) == 1 and len(right_ports) == 1:
        left_name = next(iter(left_ports.keys()))
        right_name = next(iter(right_ports.keys()))
        # If same port name or both are $, they're compatible - use first available
        if left_name == right_name or (left_name == "$" and right_name == "$"):
            # Use the first port in each list (all are equivalent)
            return (left_name, 0, right_name, 0, None)

    # Strategy 3: Try to match port names - if same name exists on both sides, use it
    common_port_names = set(left_ports.keys()) & set(right_ports.keys())
    if common_port_names:
        # Use first common port name
        port_name = next(iter(common_port_names))
        return (port_name, 0, port_name, 0, None)

    # Strategy 4: If both sides have $ ports, use them (all ports are equivalent)
    if "$" in left_ports and "$" in right_ports:
        return ("$", 0, "$", 0, None)

    # Strategy 5: Ambiguous - cannot decide
    # Count total ports (flattened)
    left_total = sum(len(port_list) for port_list in left_ports.values())
    right_total = sum(len(port_list) for port_list in right_ports.values())

    raise AmbiguousPortsError(
        f"Cannot auto-select ports between {ctx.get('left_label')} and {ctx.get('right_label')}: "
        f"left has {left_total} available ports across {len(left_ports)} port names {list(left_ports.keys())}, "
        f"right has {right_total} available ports across {len(right_ports)} port names {list(right_ports.keys())}. "
        "Use TableConnector or CallbackConnector to specify explicit rules."
    )

CallbackConnector

CallbackConnector(fn)

Bases: Connector

User-defined callback for port selection.

Example

def my_selector(left, right, left_ports, right_ports, ctx): # Custom logic here return ("port_out", "port_in", "-")

connector = CallbackConnector(my_selector)

Initialize callback connector.

Parameters:

Name Type Description Default
fn Callable[[Atomistic, Atomistic, Mapping[str, PortInfo], Mapping[str, PortInfo], ConnectorContext], tuple[str, str] | tuple[str, str, BondKind]]

Callable that takes (left, right, left_ports, right_ports, ctx) and returns (left_port, right_port [, bond_kind])

required
Source code in src/molpy/builder/polymer/connectors.py
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
def __init__(
    self,
    fn: Callable[
        [
            Atomistic,
            Atomistic,
            Mapping[str, PortInfo],
            Mapping[str, PortInfo],
            ConnectorContext,
        ],
        tuple[str, str] | tuple[str, str, BondKind],
    ],
):
    """
    Initialize callback connector.

    Args:
        fn: Callable that takes (left, right, left_ports, right_ports, ctx)
            and returns (left_port, right_port [, bond_kind])
    """
    self.fn = fn

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Select ports using user callback.

Source code in src/molpy/builder/polymer/connectors.py
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],
    right_ports: Mapping[str, list[PortInfo]],
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """Select ports using user callback."""

    result = self.fn(left, right, left_ports, right_ports, ctx)

    # Callback can return either (name, idx, name, idx) or (name, idx, name, idx, bond_kind)
    if len(result) == 4:
        return (result[0], result[1], result[2], result[3], None)
    else:
        return (result[0], result[1], result[2], result[3], result[4])

Chain dataclass

Chain(dp, monomers, mass)

Represents a single polymer chain.

Attributes:

Name Type Description
dp int

Degree of polymerization (number of monomers)

monomers list[str]

List of monomer identifiers in the chain

mass float

Total mass of the chain (g/mol)

ChainConnector

ChainConnector(connectors)

Bases: Connector

Try a list of connectors in order; first one that succeeds wins.

Example

connector = ChainConnector([ TableConnector(specific_rules), AutoConnector(), ])

Initialize chain connector.

Parameters:

Name Type Description Default
connectors Iterable[Connector]

List of connectors to try in order

required
Source code in src/molpy/builder/polymer/connectors.py
311
312
313
314
315
316
317
318
def __init__(self, connectors: Iterable[Connector]):
    """
    Initialize chain connector.

    Args:
        connectors: List of connectors to try in order
    """
    self.connectors = list(connectors)

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Try connectors in order until one succeeds.

Source code in src/molpy/builder/polymer/connectors.py
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],
    right_ports: Mapping[str, list[PortInfo]],
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """Try connectors in order until one succeeds."""

    errors = []
    for connector in self.connectors:
        try:
            return connector.select_ports(left, right, left_ports, right_ports, ctx)
        except (
            AmbiguousPortsError,
            MissingConnectorRule,
            NoCompatiblePortsError,
        ) as e:
            errors.append(f"{type(connector).__name__}: {e}")
            continue

    # All connectors failed
    raise AmbiguousPortsError(
        f"All connectors failed for ({ctx.get('left_label')}, {ctx.get('right_label')}): "
        f"{'; '.join(errors)}"
    )

ConnectionMetadata dataclass

ConnectionMetadata(port_L, port_R, reaction_name, formed_bonds=list(), new_angles=list(), new_dihedrals=list(), modified_atoms=set(), requires_retype=False, entity_maps=list())

Metadata about a single monomer connection step.

Attributes:

Name Type Description
port_L str

Name of the port used on the left monomer

port_R str

Name of the port used on the right monomer

reaction_name str

Name of the reaction used

formed_bonds list[Any]

List of newly formed bonds

new_angles list[Any]

List of newly created angles

new_dihedrals list[Any]

List of newly created dihedrals

modified_atoms set[Atom]

Set of atoms whose types may have changed

requires_retype bool

Whether retypification is needed

entity_maps list[dict[Atom, Atom]]

List of entity mappings for port remapping

ConnectionResult dataclass

ConnectionResult(product, metadata)

Result of connecting two monomers.

Attributes:

Name Type Description
product Atomistic

The resulting Atomistic assembly after connection

metadata ConnectionMetadata

Metadata about the connection

Connector

Abstract base for port selection between two adjacent Atomistic structures.

This is topology-only: connectors decide WHICH ports to connect, not HOW to position them geometrically.

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Select which ports to connect between left and right structures.

Parameters:

Name Type Description Default
left Atomistic

Left Atomistic structure in the sequence

required
right Atomistic

Right Atomistic structure in the sequence

required
left_ports Mapping[str, list[PortInfo]]

Available (unconsumed) ports on left structure (port name -> list of PortInfo)

required
right_ports Mapping[str, list[PortInfo]]

Available (unconsumed) ports on right structure (port name -> list of PortInfo)

required
ctx ConnectorContext

Shared context with step info, sequence, etc.

required

Returns:

Type Description
str

Tuple of (left_port_name, left_port_index, right_port_name, right_port_index, optional_bond_kind_override)

int

The indices specify which port to use when multiple ports have the same name.

Raises:

Type Description
AmbiguousPortsError

Cannot uniquely determine ports

NoCompatiblePortsError

No valid port pair found

MissingConnectorRule

Required rule not found (TableConnector)

Source code in src/molpy/builder/polymer/connectors.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],  # unconsumed ports
    right_ports: Mapping[str, list[PortInfo]],  # unconsumed ports
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """
    Select which ports to connect between left and right structures.

    Args:
        left: Left Atomistic structure in the sequence
        right: Right Atomistic structure in the sequence
        left_ports: Available (unconsumed) ports on left structure (port name -> list of PortInfo)
        right_ports: Available (unconsumed) ports on right structure (port name -> list of PortInfo)
        ctx: Shared context with step info, sequence, etc.

    Returns:
        Tuple of (left_port_name, left_port_index, right_port_name, right_port_index, optional_bond_kind_override)
        The indices specify which port to use when multiple ports have the same name.

    Raises:
        AmbiguousPortsError: Cannot uniquely determine ports
        NoCompatiblePortsError: No valid port pair found
        MissingConnectorRule: Required rule not found (TableConnector)
    """
    raise NotImplementedError("Subclasses must implement select_ports()")

ConnectorContext

Bases: dict[str, Any]

Shared context passed to connectors during linear build.

Contains information like: - step: int (current connection step index) - sequence: str (full sequence being built) - left_label: str (label of left monomer) - right_label: str (label of right monomer) - audit: list (accumulated connection records)

FlorySchulzPolydisperse

FlorySchulzPolydisperse(a, random_seed=None)

Bases: DPDistribution

Flory-Schulz distribution for degree of polymerization (DP).

Implements :class:DPDistribution - sampling is done directly in DP space. In this formulation the Flory-Schulz distribution is a geometric distribution over the chain length :math:N, commonly used for step-growth polymerization.

The probability mass function (PMF) is

.. math::

P(N = k) = (1 - p)^{k-1} p, \qquad k = 1, 2, \dots,

where :math:p \in (0, 1) is related to the extent of reaction.

Parameters

p: Success probability :math:p in the geometric PMF above (:math:0 < p < 1). random_seed: Optional random seed used when sampling.

Initialize Flory-Schulz DP distribution.

Parameters:

Name Type Description Default
p

Success probability (0 < p < 1), related to extent of reaction

required
random_seed int | None

Random seed for reproducibility (optional)

None
Source code in src/molpy/builder/polymer/system.py
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
def __init__(
    self,
    a: float,  # Success probability (0 < a < 1)
    random_seed: int | None = None,
):
    """
    Initialize Flory-Schulz DP distribution.

    Args:
        p: Success probability (0 < p < 1), related to extent of reaction
        random_seed: Random seed for reproducibility (optional)
    """
    if not (0 < a < 1):
        raise ValueError(f"a must be in (0, 1), got {a}")

    self.a = a
    self.random_seed = random_seed

sample_dp

sample_dp(rng)

Sample degree of polymerization from Flory-Schulz distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

Source code in src/molpy/builder/polymer/system.py
674
675
676
677
678
679
680
681
682
683
684
def sample_dp(self, rng: np.random.Generator) -> int:
    """Sample degree of polymerization from Flory-Schulz distribution.

    Args:
        rng: NumPy random number generator

    Returns:
        Degree of polymerization (>= 1)
    """
    a = self.a
    return max(1, int(rng.geometric(p=a) + rng.geometric(p=a) - 1))

GrowthKernel

Bases: Protocol

Protocol for local transition function in port-level stochastic growth.

A GrowthKernel decides which monomer (if any) to add next for a given reactive port on the growing polymer. This encapsulates the reaction probability logic from G-BigSMILES notation.

choose_next_for_port

choose_next_for_port(polymer, port, candidates, rng=None)

Choose next monomer for a given port.

Parameters:

Name Type Description Default
polymer Atomistic

Current polymer structure

required
port PortInfo

Port to extend from

required
candidates Sequence[MonomerTemplate]

Available monomer templates

required
rng Generator | None

Random number generator for sampling

None

Returns:

Name Type Description
MonomerPlacement MonomerPlacement | None

Add this template at target port

None MonomerPlacement | None

Terminate this port (implicit end-group)

Source code in src/molpy/builder/polymer/growth_kernel.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def choose_next_for_port(
    self,
    polymer: Atomistic,
    port: PortInfo,
    candidates: Sequence[MonomerTemplate],
    rng: np.random.Generator | None = None,
) -> MonomerPlacement | None:
    """Choose next monomer for a given port.

    Args:
        polymer: Current polymer structure
        port: Port to extend from
        candidates: Available monomer templates
        rng: Random number generator for sampling

    Returns:
        MonomerPlacement: Add this template at target port
        None: Terminate this port (implicit end-group)
    """
    ...

MonomerPlacement dataclass

MonomerPlacement(template, target_descriptor_id)

Decision for next monomer placement during stochastic growth.

Attributes:

Name Type Description
template MonomerTemplate

MonomerTemplate to add

target_descriptor_id int

Which port descriptor on the new monomer to connect

MonomerTemplate dataclass

MonomerTemplate(label, structure, port_descriptors, mass, metadata=dict())

Template for a monomer with port descriptors and metadata.

This represents a monomer type that can be instantiated multiple times during stochastic growth. Each instantiation creates a fresh copy of the structure.

Attributes:

Name Type Description
label str

Monomer label (e.g., "EO2", "PS")

structure Atomistic

Base Atomistic structure (will be copied on instantiation)

port_descriptors dict[int, PortDescriptor]

Mapping from descriptor_id to PortDescriptor

mass float

Molecular weight (g/mol)

metadata dict[str, Any]

Additional metadata (optional)

get_all_descriptors

get_all_descriptors()

Get all port descriptors for this template.

Returns:

Type Description
list[PortDescriptor]

List of all PortDescriptor objects

Source code in src/molpy/builder/polymer/types.py
134
135
136
137
138
139
140
def get_all_descriptors(self) -> list[PortDescriptor]:
    """Get all port descriptors for this template.

    Returns:
        List of all PortDescriptor objects
    """
    return list(self.port_descriptors.values())

get_port_by_descriptor

get_port_by_descriptor(descriptor_id)

Get port descriptor for a specific descriptor ID.

Parameters:

Name Type Description Default
descriptor_id int

Descriptor ID to look up

required

Returns:

Type Description
PortDescriptor | None

PortDescriptor if found, None otherwise

Source code in src/molpy/builder/polymer/types.py
123
124
125
126
127
128
129
130
131
132
def get_port_by_descriptor(self, descriptor_id: int) -> PortDescriptor | None:
    """Get port descriptor for a specific descriptor ID.

    Args:
        descriptor_id: Descriptor ID to look up

    Returns:
        PortDescriptor if found, None otherwise
    """
    return self.port_descriptors.get(descriptor_id)

instantiate

instantiate()

Create a fresh copy of the structure.

Returns:

Type Description
Atomistic

New Atomistic instance with independent atoms and bonds

Source code in src/molpy/builder/polymer/types.py
115
116
117
118
119
120
121
def instantiate(self) -> Atomistic:
    """Create a fresh copy of the structure.

    Returns:
        New Atomistic instance with independent atoms and bonds
    """
    return self.structure.copy()

PoissonPolydisperse

PoissonPolydisperse(lambda_param, random_seed=None)

Bases: DPDistribution

Poisson distribution for the degree of polymerization (DP).

Implements :class:DPDistribution - sampling is done directly in DP space. The number of repeat units is modeled as a Poisson process with mean :math:\lambda.

The (untruncated) Poisson probability mass function is

.. math::

P(N = k)
= \frac{\lambda^{k} e^{-\lambda}}{k!},
\qquad k = 0, 1, 2, \dots

In this implementation we restrict to :math:k \ge 1 when sampling chains, i.e. a sampled value :math:k = 0 is mapped to :math:k = 1 so that every chain contains at least one monomer.

Parameters

lambda_param: Mean :math:\lambda of the Poisson distribution. random_seed: Optional random seed used when sampling.

Initialize Poisson DP distribution.

Parameters:

Name Type Description Default
lambda_param float

Mean (lambda) parameter of Poisson distribution

required
random_seed int | None

Random seed for reproducibility (optional)

None
Source code in src/molpy/builder/polymer/system.py
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
def __init__(
    self,
    lambda_param: float,  # Mean of Poisson distribution
    random_seed: int | None = None,
):
    """
    Initialize Poisson DP distribution.

    Args:
        lambda_param: Mean (lambda) parameter of Poisson distribution
        random_seed: Random seed for reproducibility (optional)
    """
    if lambda_param <= 0:
        raise ValueError(f"lambda_param must be > 0, got {lambda_param}")

    self.lambda_param = lambda_param
    self.random_seed = random_seed

dp_pmf

dp_pmf(dp_array)

Compute the probability mass function (PMF) over DP values.

Poisson PMF: P(k; λ) = (λ^k * e^(-λ)) / k! for k >= 1

Parameters:

Name Type Description Default
dp_array ndarray

Array of DP values

required

Returns:

Type Description
ndarray

Array of PMF values

Source code in src/molpy/builder/polymer/system.py
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
def dp_pmf(self, dp_array: np.ndarray) -> np.ndarray:
    """
    Compute the probability mass function (PMF) over DP values.

    Poisson PMF: P(k; λ) = (λ^k * e^(-λ)) / k! for k >= 1

    Args:
        dp_array: Array of DP values

    Returns:
        Array of PMF values
    """
    k = np.asarray(dp_array, dtype=int)
    pmf = np.zeros_like(k, dtype=float)

    lam = float(self.lambda_param)
    if lam <= 0:
        return pmf

    mask = k >= 1
    ks = k[mask]

    # log P(k) = k log lam - lam - log(k!)
    log_p = ks * np.log(lam) - lam - np.array([math.lgamma(int(x) + 1) for x in ks])
    p = np.exp(log_p)

    # zero-truncated normalization: divide by (1 - e^{-lam})
    p /= 1.0 - np.exp(-lam)

    pmf[mask] = p
    return pmf

sample_dp

sample_dp(rng)

Sample degree of polymerization from Poisson distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

Source code in src/molpy/builder/polymer/system.py
608
609
610
611
612
613
614
615
616
617
618
def sample_dp(self, rng: np.random.Generator) -> int:
    """Sample degree of polymerization from Poisson distribution.

    Args:
        rng: NumPy random number generator

    Returns:
        Degree of polymerization (>= 1)
    """
    dp = rng.poisson(self.lambda_param)
    return max(1, int(dp))

PolydisperseChainGenerator

PolydisperseChainGenerator(seq_generator, monomer_mass, end_group_mass=0.0, distribution=None)

Middle layer: Chain-level generator.

Responsible for: - Sampling chain size: - Either in DP-space via a DPDistribution (sample_dp) - Or in mass-space via a MassDistribution (sample_mass) - Using a SequenceGenerator to build the chain sequence - Computing the mass of a chain using monomer mass table and optional end-group mass

Does NOT know anything about total system mass. Only returns one chain at a time.

Initialize polydisperse chain generator.

Parameters:

Name Type Description Default
seq_generator SequenceGenerator

Sequence generator for generating monomer sequences

required
monomer_mass dict[str, float]

Dictionary mapping monomer identifiers to their masses (g/mol)

required
end_group_mass float

Mass of end groups (g/mol), default 0.0

0.0
distribution DPDistribution | MassDistribution | None

Distribution implementing DPDistribution or MassDistribution protocol

None
Source code in src/molpy/builder/polymer/system.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
def __init__(
    self,
    seq_generator: SequenceGenerator,
    monomer_mass: dict[str, float],
    end_group_mass: float = 0.0,
    distribution: DPDistribution | MassDistribution | None = None,
):
    """
    Initialize polydisperse chain generator.

    Args:
        seq_generator: Sequence generator for generating monomer sequences
        monomer_mass: Dictionary mapping monomer identifiers to their masses (g/mol)
        end_group_mass: Mass of end groups (g/mol), default 0.0
        distribution: Distribution implementing DPDistribution or MassDistribution protocol
    """
    self.seq_generator = seq_generator
    self.monomer_mass = monomer_mass
    self.end_group_mass = end_group_mass

    if distribution is None:
        raise ValueError("distribution must be provided")
    self.distribution = distribution

build_chain

build_chain(rng)

Sample DP, generate monomer sequence, and compute mass.

Parameters:

Name Type Description Default
rng Random

Random number generator

required

Returns:

Type Description
Chain

Chain object with dp, monomers, and mass

Source code in src/molpy/builder/polymer/system.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def build_chain(self, rng: Random) -> Chain:
    """
    Sample DP, generate monomer sequence, and compute mass.

    Args:
        rng: Random number generator

    Returns:
        Chain object with dp, monomers, and mass
    """
    has_sample_dp = callable(getattr(self.distribution, "sample_dp", None))
    has_sample_mass = callable(getattr(self.distribution, "sample_mass", None))

    # Pure DPDistribution path: sample DP once and build fixed-length chain
    if has_sample_dp and not has_sample_mass:
        dp = self.sample_dp(rng)
        monomers = self.seq_generator.generate_sequence(dp, rng)
        mass = self._compute_mass(monomers)
        return Chain(dp=dp, monomers=monomers, mass=mass)

    # Pure MassDistribution path: sample a target mass and grow chain incrementally
    if has_sample_mass and not has_sample_dp:
        target_mass = self.sample_mass(rng)

        monomers: list[str] = []
        current_mass = self.end_group_mass

        # Conservative safety cap to prevent pathological infinite loops
        max_steps = 10_000
        steps = 0

        while steps < max_steps:
            steps += 1

            # Propose adding a single monomer
            next_label = self.seq_generator.generate_sequence(1, rng)[0]
            proposed_mass = current_mass + self.monomer_mass.get(next_label, 0.0)

            # Always accept at least one monomer, even if it overshoots
            if not monomers:
                monomers.append(next_label)
                current_mass = proposed_mass
                if current_mass >= target_mass:
                    break
                continue

            # For subsequent monomers, reject if this would overshoot the target
            if proposed_mass > target_mass:
                break

            monomers.append(next_label)
            current_mass = proposed_mass

            # If we have reached or are extremely close to the target, stop
            if current_mass >= target_mass:
                break

        dp = len(monomers)
        mass = current_mass
        return Chain(dp=dp, monomers=monomers, mass=mass)

    # Either the distribution implements neither method or both, which is invalid
    raise TypeError(
        f"Distribution {type(self.distribution).__name__} must implement exactly one of "
        "'sample_dp' or 'sample_mass'."
    )

sample_dp

sample_dp(rng)

Sample a degree of polymerization from the distribution.

Parameters:

Name Type Description Default
rng Random

Random number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

Source code in src/molpy/builder/polymer/system.py
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
def sample_dp(self, rng: Random) -> int:
    """
    Sample a degree of polymerization from the distribution.

    Args:
        rng: Random number generator

    Returns:
        Degree of polymerization (>= 1)
    """
    if self.distribution is None:
        raise ValueError("distribution must be set")

    # Convert Random to numpy Generator
    seed = rng.randint(0, 2**31 - 1)
    np_rng = np.random.Generator(np.random.PCG64(seed))

    # Determine what capabilities the distribution actually provides
    has_sample_dp = callable(getattr(self.distribution, "sample_dp", None))
    has_sample_mass = callable(getattr(self.distribution, "sample_mass", None))

    # DP-based distributions may only be used via sample_dp
    if has_sample_dp and not has_sample_mass:
        return self.distribution.sample_dp(np_rng)
    # Mass-distributions are not valid for DP sampling; caller should use sample_mass path.
    raise TypeError(
        f"Distribution {type(self.distribution).__name__} does not support 'sample_dp'. "
        "Use mass-based sampling via 'sample_mass' and the corresponding build_chain logic."
    )

sample_mass

sample_mass(rng)

Sample a target chain mass from a mass-based distribution.

Parameters:

Name Type Description Default
rng Random

Random number generator

required

Returns:

Type Description
float

Target chain mass in g/mol (>= 0)

Source code in src/molpy/builder/polymer/system.py
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
def sample_mass(self, rng: Random) -> float:
    """
    Sample a target chain mass from a mass-based distribution.

    Args:
        rng: Random number generator

    Returns:
        Target chain mass in g/mol (>= 0)
    """
    if self.distribution is None:
        raise ValueError("distribution must be set")

    has_sample_mass = callable(getattr(self.distribution, "sample_mass", None))
    if not has_sample_mass:
        raise TypeError(
            f"Distribution {type(self.distribution).__name__} does not support 'sample_mass'. "
            "Use DP-based sampling via 'sample_dp' instead."
        )

    seed = rng.randint(0, 2**31 - 1)
    np_rng = np.random.Generator(np.random.PCG64(seed))
    target_mass = float(self.distribution.sample_mass(np_rng))
    return max(0.0, target_mass)

PolymerBuildResult dataclass

PolymerBuildResult(polymer, connection_history=list(), total_steps=0)

Result of building a polymer.

Attributes:

Name Type Description
polymer Atomistic

The assembled Atomistic structure

connection_history list[ConnectionMetadata]

List of connection metadata for each step

total_steps int

Total number of connection steps performed

PolymerBuilder

PolymerBuilder(library, connector, typifier=None, placer=None)

Build polymers from CGSmiles notation with support for arbitrary topologies.

This builder parses CGSmiles strings and constructs polymers using a graph-based approach, supporting: - Linear chains: {[#A][#B][#C]} - Branched structures: {#A[#C]} - Cyclic structures: {[#A]1[#B][#C]1} - Repeat operators: {[#A]|10}

Example

builder = PolymerBuilder( ... library={"EO2": eo2_monomer, "PS": ps_monomer}, ... connector=connector, ... typifier=typifier, ... ) result = builder.build("{[#EO2]|8[#PS]}")

Initialize the polymer builder.

Parameters:

Name Type Description Default
library Mapping[str, Atomistic]

Mapping from CGSmiles labels to Atomistic monomer structures

required
connector ReacterConnector

ReacterConnector for port selection and chemical reactions

required
typifier TypifierBase | None

Optional typifier for automatic retypification

None
placer Placer | None

Optional Placer for positioning structures before connection

None
Source code in src/molpy/builder/polymer/polymer_builder.py
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def __init__(
    self,
    library: Mapping[str, Atomistic],
    connector: ReacterConnector,
    typifier: TypifierBase | None = None,
    placer: Placer | None = None,
):
    """
    Initialize the polymer builder.

    Args:
        library: Mapping from CGSmiles labels to Atomistic monomer structures
        connector: ReacterConnector for port selection and chemical reactions
        typifier: Optional typifier for automatic retypification
        placer: Optional Placer for positioning structures before connection
    """
    self.library = library
    self.connector = connector
    self.typifier = typifier
    self.placer = placer

build

build(cgsmiles)

Build a polymer from a CGSmiles string.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation string (e.g., "{[#EO2]|8[#PS]}")

required

Returns:

Type Description
PolymerBuildResult

PolymerBuildResult containing the assembled polymer and metadata

Raises:

Type Description
ValueError

If CGSmiles is invalid

SequenceError

If labels in CGSmiles are not found in library

Source code in src/molpy/builder/polymer/polymer_builder.py
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
def build(self, cgsmiles: str) -> PolymerBuildResult:
    """
    Build a polymer from a CGSmiles string.

    Args:
        cgsmiles: CGSmiles notation string (e.g., "{[#EO2]|8[#PS]}")

    Returns:
        PolymerBuildResult containing the assembled polymer and metadata

    Raises:
        ValueError: If CGSmiles is invalid
        SequenceError: If labels in CGSmiles are not found in library
    """
    # Parse CGSmiles
    ir = parse_cgsmiles(cgsmiles)

    # Validate
    self._validate_ir(ir)

    # Build polymer from graph
    polymer, history = self._build_from_graph(ir.base_graph)

    return PolymerBuildResult(
        polymer=polymer,
        connection_history=history,
        total_steps=len(history),
    )

PortDescriptor dataclass

PortDescriptor(descriptor_id, port_name, role=None, bond_kind=None, compat=None)

Descriptor for a reactive port on a monomer template.

Attributes:

Name Type Description
descriptor_id int

Unique ID within template (e.g., 0, 1, 2)

port_name str

Port name on atom (e.g., "<", ">", "branch")

role str | None

Port role (e.g., "left", "right", "branch")

bond_kind str | None

Bond type (e.g., "-", "=", "#")

compat set[str] | None

Compatibility set for port matching

ProbabilityTableKernel

ProbabilityTableKernel(probability_tables, end_group_templates=None)

GrowthKernel based on G-BigSMILES probability tables.

This kernel uses pre-computed probability tables that map each port descriptor to weighted choices over (template, target_descriptor_id) pairs. Weights are integers that are normalized to probabilities during sampling.

Initialize probability table kernel.

Parameters:

Name Type Description Default
probability_tables dict[int, list[tuple[MonomerTemplate, int, int]]]

Maps descriptor_id -> [(template, target_desc, integer_weight)] Integer weights are normalized to probabilities during sampling.

required
end_group_templates dict[int, MonomerTemplate] | None

Maps descriptor_id -> end-group template (no ports)

None
Source code in src/molpy/builder/polymer/growth_kernel.py
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init__(
    self,
    probability_tables: dict[int, list[tuple[MonomerTemplate, int, int]]],
    end_group_templates: dict[int, MonomerTemplate] | None = None,
):
    """Initialize probability table kernel.

    Args:
        probability_tables: Maps descriptor_id -> [(template, target_desc, integer_weight)]
            Integer weights are normalized to probabilities during sampling.
        end_group_templates: Maps descriptor_id -> end-group template (no ports)
    """
    self.tables = probability_tables
    self.end_groups = end_group_templates or {}

choose_next_for_port

choose_next_for_port(polymer, port, candidates, rng=None)

Choose next monomer based on probability table.

Parameters:

Name Type Description Default
polymer Atomistic

Current polymer structure

required
port PortInfo

Port to extend from

required
candidates Sequence[MonomerTemplate]

Available monomer templates

required
rng Generator | None

Random number generator (uses default if None)

None

Returns:

Type Description
MonomerPlacement | None

MonomerPlacement or None (terminate)

Source code in src/molpy/builder/polymer/growth_kernel.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
def choose_next_for_port(
    self,
    polymer: Atomistic,
    port: PortInfo,
    candidates: Sequence[MonomerTemplate],
    rng: np.random.Generator | None = None,
) -> MonomerPlacement | None:
    """Choose next monomer based on probability table.

    Args:
        polymer: Current polymer structure
        port: Port to extend from
        candidates: Available monomer templates
        rng: Random number generator (uses default if None)

    Returns:
        MonomerPlacement or None (terminate)
    """
    if rng is None:
        rng = np.random.default_rng()

    # Extract descriptor ID from port metadata
    descriptor_id = self._get_descriptor_id(port)

    # Get probability table for this descriptor
    if descriptor_id not in self.tables:
        return None  # Terminate

    options = self.tables[descriptor_id]

    # Filter by available candidates and non-zero weights
    valid_options = [
        (tmpl, target_desc, weight)
        for tmpl, target_desc, weight in options
        if tmpl in candidates and weight > 0
    ]

    if not valid_options:
        return None  # Terminate

    # Normalize integer weights to probabilities and sample
    templates, target_descs, weights = zip(*valid_options)
    weights_array = np.array(weights, dtype=float)
    weights_array /= weights_array.sum()  # Normalize to probabilities

    idx = rng.choice(len(valid_options), p=weights_array)
    return MonomerPlacement(
        template=templates[idx], target_descriptor_id=target_descs[idx]
    )

ReacterConnector

ReacterConnector(default, port_map, overrides=None)

Bases: Connector

Connector that uses chemical reactions (Reacter) for polymer assembly.

This connector integrates port selection and chemical reaction execution. It manages multiple Reacter instances and port mapping strategies for different structure pairs.

Port Selection Strategy: Port selection is handled via a port_map which must be a dict: - Dict: Explicit mapping {('A','B'): ('1','2'), ...}

There is NO 'auto' mode - port selection must be explicit via port_map. Ports are stored directly on atoms using the "port" or "ports" attribute.

Attributes:

Name Type Description
default

Default Reacter for most connections

overrides

Dict mapping (left_type, right_type) -> specialized Reacter

port_map

Dict mapping (left_type, right_type) -> (port_L, port_R)

Example

from molpy.reacter import Reacter from molpy.reacter.selectors import select_port_atom, select_one_hydrogen from molpy.reacter.transformers import form_single_bond

default_reacter = Reacter( ... name="C-C_coupling", ... port_selector_left=select_port_atom, ... port_selector_right=select_port_atom, ... leaving_selector_left=select_one_hydrogen, ... leaving_selector_right=select_one_hydrogen, ... bond_former=form_single_bond, ... )

Explicit port mapping for all structure pairs

connector = ReacterConnector( ... default=default_reacter, ... port_map={ ... ('A', 'B'): ('port_1', 'port_2'), ... ('B', 'C'): ('port_3', 'port_4'), ... }, ... overrides={('B', 'C'): special_reacter}, ... )

Initialize ReacterConnector.

Parameters:

Name Type Description Default
default Reacter

Default Reacter for most connections

required
port_map dict[tuple[str, str], tuple[str, str]]

Mapping from (left_type, right_type) to (port_L, port_R)

required
overrides dict[tuple[str, str], Reacter] | None

Optional mapping from (left_type, right_type) to specialized Reacter instances

None

Raises:

Type Description
TypeError

If port_map is not a dict

Source code in src/molpy/builder/polymer/connectors.py
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
def __init__(
    self,
    default: "Reacter",
    port_map: dict[tuple[str, str], tuple[str, str]],
    overrides: dict[tuple[str, str], "Reacter"] | None = None,
):
    """
    Initialize ReacterConnector.

    Args:
        default: Default Reacter for most connections
        port_map: Mapping from (left_type, right_type) to (port_L, port_R)
        overrides: Optional mapping from (left_type, right_type) to
            specialized Reacter instances

    Raises:
        TypeError: If port_map is not a dict
    """
    if not isinstance(port_map, dict):
        raise TypeError(f"port_map must be dict, got {type(port_map).__name__}")

    self.default = default
    self.overrides = overrides or {}
    self.port_map = port_map
    self._history: list = []  # List of ReactionResult objects

connect

connect(left, right, left_type, right_type, port_atom_L, port_atom_R, typifier=None)

Execute chemical reaction between two Atomistic structures.

This method performs the full chemical reaction including: 1. Selecting appropriate reacter based on structure types 2. Executing reaction (merging, bond making, removing leaving groups) 3. Computing new topology (angles, dihedrals) 4. Collecting metadata for retypification

Parameters:

Name Type Description Default
left Atomistic

Left Atomistic structure

required
right Atomistic

Right Atomistic structure

required
left_type str

Type label of left structure

required
right_type str

Type label of right structure

required
port_atom_L Entity

Port atom from left structure

required
port_atom_R Entity

Port atom from right structure

required
typifier TypifierBase | None

Optional typifier

None

Returns:

Type Description
ConnectionResult

ConnectionResult containing product and metadata

Source code in src/molpy/builder/polymer/connectors.py
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
def connect(
    self,
    left: Atomistic,
    right: Atomistic,
    left_type: str,
    right_type: str,
    port_atom_L: Entity,
    port_atom_R: Entity,
    typifier: TypifierBase | None = None,
) -> ConnectionResult:
    """
    Execute chemical reaction between two Atomistic structures.

    This method performs the full chemical reaction including:
    1. Selecting appropriate reacter based on structure types
    2. Executing reaction (merging, bond making, removing leaving groups)
    3. Computing new topology (angles, dihedrals)
    4. Collecting metadata for retypification

    Args:
        left: Left Atomistic structure
        right: Right Atomistic structure
        left_type: Type label of left structure
        right_type: Type label of right structure
        port_atom_L: Port atom from left structure
        port_atom_R: Port atom from right structure
        typifier: Optional typifier

    Returns:
        ConnectionResult containing product and metadata
    """
    from molpy.reacter.base import ReactionResult

    # Select reacter
    reacter = self.get_reacter(left_type, right_type)

    # Execute reaction
    product_set: ReactionResult = reacter.run(
        left,
        right,
        port_atom_L=port_atom_L,
        port_atom_R=port_atom_R,
        compute_topology=True,
        typifier=typifier,
    )

    # Store in history
    self._history.append(product_set)

    # Create metadata dataclass
    port_name_L = port_atom_L.get("port", "unknown")
    port_name_R = port_atom_R.get("port", "unknown")
    metadata = ConnectionMetadata(
        port_L=port_name_L,
        port_R=port_name_R,
        reaction_name=reacter.name,
        formed_bonds=product_set.topology_changes.new_bonds,
        new_angles=product_set.topology_changes.new_angles,
        new_dihedrals=product_set.topology_changes.new_dihedrals,
        modified_atoms=product_set.topology_changes.modified_atoms,
        requires_retype=product_set.metadata.requires_retype,
        entity_maps=product_set.metadata.entity_maps,
    )

    return ConnectionResult(
        product=product_set.product_info.product, metadata=metadata
    )

get_all_modified_atoms

get_all_modified_atoms()

Get all atoms modified across all reactions.

Returns:

Type Description
set[Entity]

Set of all atoms that were modified during reactions

Source code in src/molpy/builder/polymer/connectors.py
557
558
559
560
561
562
563
564
565
566
def get_all_modified_atoms(self) -> set[Entity]:
    """Get all atoms modified across all reactions.

    Returns:
        Set of all atoms that were modified during reactions
    """
    all_atoms: set[Entity] = set()
    for result in self._history:
        all_atoms.update(result.topology_changes.modified_atoms)
    return all_atoms

get_history

get_history()

Get all reaction history (list of ReactionResult).

Source code in src/molpy/builder/polymer/connectors.py
553
554
555
def get_history(self) -> list:
    """Get all reaction history (list of ReactionResult)."""
    return self._history

get_reacter

get_reacter(left_type, right_type)

Get appropriate reacter for a structure pair.

Parameters:

Name Type Description Default
left_type str

Type label of left structure (e.g., 'A', 'B')

required
right_type str

Type label of right structure

required

Returns:

Type Description
Reacter

The appropriate Reacter (override if exists, else default)

Source code in src/molpy/builder/polymer/connectors.py
420
421
422
423
424
425
426
427
428
429
430
431
432
def get_reacter(self, left_type: str, right_type: str) -> "Reacter":
    """
    Get appropriate reacter for a structure pair.

    Args:
        left_type: Type label of left structure (e.g., 'A', 'B')
        right_type: Type label of right structure

    Returns:
        The appropriate Reacter (override if exists, else default)
    """
    key = (left_type, right_type)
    return self.overrides.get(key, self.default)

needs_retypification

needs_retypification()

Check if any reactions require retypification.

Returns:

Type Description
bool

True if any reaction requires retypification

Source code in src/molpy/builder/polymer/connectors.py
568
569
570
571
572
573
574
def needs_retypification(self) -> bool:
    """Check if any reactions require retypification.

    Returns:
        True if any reaction requires retypification
    """
    return any(r.metadata.requires_retype for r in self._history)

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Select ports using the configured port_map.

Parameters:

Name Type Description Default
left Atomistic

Left Atomistic structure

required
right Atomistic

Right Atomistic structure

required
left_ports Mapping[str, list[PortInfo]]

Available ports on left (port name -> list of PortInfo)

required
right_ports Mapping[str, list[PortInfo]]

Available ports on right (port name -> list of PortInfo)

required
ctx ConnectorContext

Connector context with structure type information

required

Returns:

Type Description
str

Tuple of (port_L, port_L_idx, port_R, port_R_idx, None)

int

Uses index 0 when multiple ports share the same name.

Raises:

Type Description
ValueError

If port mapping not found or ports invalid

Source code in src/molpy/builder/polymer/connectors.py
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],
    right_ports: Mapping[str, list[PortInfo]],
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """
    Select ports using the configured port_map.

    Args:
        left: Left Atomistic structure
        right: Right Atomistic structure
        left_ports: Available ports on left (port name -> list of PortInfo)
        right_ports: Available ports on right (port name -> list of PortInfo)
        ctx: Connector context with structure type information

    Returns:
        Tuple of (port_L, port_L_idx, port_R, port_R_idx, None)
        Uses index 0 when multiple ports share the same name.

    Raises:
        ValueError: If port mapping not found or ports invalid
    """
    # Get monomer types from context
    left_type = ctx.get("left_label", "")
    right_type = ctx.get("right_label", "")

    # Look up explicit mapping
    key = (left_type, right_type)
    if key not in self.port_map:
        raise ValueError(
            f"No port mapping defined for ({left_type}, {right_type}). "
            f"Available mappings: {list(self.port_map.keys())}"
        )
    port_L, port_R = self.port_map[key]

    # Validate ports exist
    if port_L not in left_ports:
        raise ValueError(
            f"Selected port '{port_L}' not found in left structure ({left_type})"
        )
    if port_R not in right_ports:
        raise ValueError(
            f"Selected port '{port_R}' not found in right structure ({right_type})"
        )

    # Use first port (index 0) when multiple ports share the same name
    return port_L, 0, port_R, 0, None  # bond_kind determined by reacter

SchulzZimmPolydisperse

SchulzZimmPolydisperse(Mn, Mw, random_seed=None)

Bases: MassDistribution

Schulz-Zimm molecular weight distribution for polydisperse polymer chains.

Implements :class:MassDistribution - sampling is done directly in molecular-weight (:math:M) space.

Following the notation in the paper, the probability density (often also called “PMF” there) is

.. math::

\operatorname{PMF}(M)
= \frac{z^{z+1}}{\Gamma(z+1)}
  \frac{M^{z-1}}{M_n^{z}}
  \exp\left(-\frac{z M}{M_n}\right),

where

.. math::

z = \frac{M_n}{M_w - M_n},

:math:M_n is the number-average molecular weight, and :math:M_w is the weight-average molecular weight.

This expression is mathematically equivalent to a Gamma distribution with shape :math:z and scale

.. math::

\theta = \frac{M_n}{z} = M_w - M_n,

which satisfies the prescribed :math:M_n and :math:M_w with polydispersity index :math:\text{PDI} = M_w / M_n.

Parameters

Mn: Number-average molecular weight :math:M_n (g/mol). Mw: Weight-average molecular weight :math:M_w (g/mol), must satisfy :math:M_w > M_n. random_seed: Optional random seed used when sampling.

Initialize Schulz-Zimm polydisperse distribution.

Parameters:

Name Type Description Default
Mn float

Number-average molecular weight (g/mol)

required
Mw float

Weight-average molecular weight (g/mol)

required
random_seed int | None

Random seed for reproducibility (optional)

None
Source code in src/molpy/builder/polymer/system.py
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
def __init__(
    self,
    Mn: float,
    Mw: float,
    random_seed: int | None = None,
):
    """
    Initialize Schulz-Zimm polydisperse distribution.

    Args:
        Mn: Number-average molecular weight (g/mol)
        Mw: Weight-average molecular weight (g/mol)
        random_seed: Random seed for reproducibility (optional)
    """
    if Mw <= Mn:
        raise ValueError(
            f"Mw ({Mw}) must be greater than Mn ({Mn}) for valid Schulz-Zimm distribution"
        )

    self.Mn = Mn
    self.Mw = Mw
    self.random_seed = random_seed

    # Calculate Gamma-equivalent parameters.
    # Using the paper's notation:
    #   z = Mn / (Mw - Mn)
    # and the Schulz-Zimm PDF can be written as a Gamma distribution
    # with shape parameter z and scale theta = Mn / z = Mw - Mn.
    self.z = Mn / (Mw - Mn)
    self.theta = Mw - Mn
    self.PDI = Mw / Mn

mass_pdf

mass_pdf(mass_array)

Probability density function for mass values.

This implements the Gamma PDF described in the class docstring and returns :math:f(M) evaluated at the entries of mass_array.

Args

mass_array: Array of molecular weights :math:M (g/mol).

Returns

numpy.ndarray Array of probability density values with the same shape as mass_array.

Source code in src/molpy/builder/polymer/system.py
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def mass_pdf(self, mass_array: np.ndarray) -> np.ndarray:
    """Probability density function for mass values.

    This implements the Gamma PDF described in the class docstring and
    returns :math:`f(M)` evaluated at the entries of ``mass_array``.

    Args
    ----
    mass_array:
        Array of molecular weights :math:`M` (g/mol).

    Returns
    -------
    numpy.ndarray
        Array of probability density values with the same shape as
        ``mass_array``.
    """
    M = np.asarray(mass_array, dtype=float)
    if self.Mn <= 0 or self.Mw <= self.Mn:
        return np.zeros_like(M)

    z = self.z
    theta = self.theta
    out = np.zeros_like(M, dtype=float)

    mask = M > 0
    Ms = M[mask]

    log_pdf = (
        (z - 1.0) * np.log(Ms) - Ms / theta - z * np.log(theta) - math.lgamma(z)
    )
    out[mask] = np.exp(log_pdf)
    return out

sample_mass

sample_mass(rng)

Sample molecular weight directly from Schulz-Zimm distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
float

Molecular weight (g/mol)

Source code in src/molpy/builder/polymer/system.py
380
381
382
383
384
385
386
387
388
389
390
def sample_mass(self, rng: np.random.Generator) -> float:
    """Sample molecular weight directly from Schulz-Zimm distribution.

    Args:
        rng: NumPy random number generator

    Returns:
        Molecular weight (g/mol)
    """
    # Schulz-Zimm samples from Gamma distribution directly
    return rng.gamma(shape=self.z, scale=self.theta)

SequenceGenerator

Bases: Protocol

Protocol for sequence generators.

A sequence generator controls how monomers are arranged in a single chain. It encapsulates monomer reaction/selection probabilities and generates monomer sequences given a desired degree of polymerization (DP).

This is the bottom layer in the three-layer architecture, responsible only for local monomer selection and reaction probabilities.

expected_composition

expected_composition()

Optional: Return expected long-chain monomer fractions.

Used for rough mass estimates. Returns a dictionary mapping monomer identifiers to their expected fraction in long chains.

Returns:

Type Description
dict[str, float]

Dictionary mapping monomer identifiers to expected fractions

Source code in src/molpy/builder/polymer/sequence_generator.py
46
47
48
49
50
51
52
53
54
55
56
def expected_composition(self) -> dict[str, float]:
    """
    Optional: Return expected long-chain monomer fractions.

    Used for rough mass estimates. Returns a dictionary mapping
    monomer identifiers to their expected fraction in long chains.

    Returns:
        Dictionary mapping monomer identifiers to expected fractions
    """
    ...

generate_sequence

generate_sequence(dp, rng)

Generate a monomer sequence of specified degree of polymerization.

Parameters:

Name Type Description Default
dp int

Degree of polymerization (number of monomers)

required
rng Random

Random number generator for reproducible sampling

required

Returns:

Type Description
list[str]

List of monomer identifiers (or monomer keys as strings)

Source code in src/molpy/builder/polymer/sequence_generator.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def generate_sequence(
    self,
    dp: int,
    rng: Random,
) -> list[str]:
    """
    Generate a monomer sequence of specified degree of polymerization.

    Args:
        dp: Degree of polymerization (number of monomers)
        rng: Random number generator for reproducible sampling

    Returns:
        List of monomer identifiers (or monomer keys as strings)
    """
    ...

StochasticChain dataclass

StochasticChain(polymer, dp, mass, growth_history=list())

Result of stochastic BFS growth.

Attributes:

Name Type Description
polymer Atomistic

The assembled Atomistic structure

dp int

Degree of polymerization (number of monomers added)

mass float

Total molecular weight (g/mol)

growth_history list[dict[str, Any]]

Metadata for each monomer addition step

SystemPlan dataclass

SystemPlan(chains, total_mass, target_mass)

Represents a complete system plan with all chains.

Attributes:

Name Type Description
chains list[Chain]

List of all chains in the system

total_mass float

Total mass of all chains (g/mol)

target_mass float

Target total mass that was requested (g/mol)

SystemPlanner

SystemPlanner(chain_generator, target_total_mass, max_rel_error=0.02, max_chains=None, enable_trimming=True)

Top layer: System-level planner.

Responsible for: - Enforcing a target total mass for the overall system - Iteratively requesting chains from PolydisperseChainGenerator - Maintaining a running sum of total mass - Stopping when mass reaches target window, and optionally trimming the final chain

Does NOT micromanage sequence probabilities or DP distribution; only orchestrates at the ensemble level.

Initialize system planner.

Parameters:

Name Type Description Default
chain_generator PolydisperseChainGenerator

Chain generator for building chains

required
target_total_mass float

Target total system mass (g/mol)

required
max_rel_error float

Maximum relative error allowed (default 0.02 = 2%)

0.02
max_chains int | None

Maximum number of chains to generate (None = no limit)

None
enable_trimming bool

Whether to enable chain trimming to better hit target mass

True
Source code in src/molpy/builder/polymer/system.py
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
def __init__(
    self,
    chain_generator: PolydisperseChainGenerator,
    target_total_mass: float,
    max_rel_error: float = 0.02,
    max_chains: int | None = None,
    enable_trimming: bool = True,
):
    """
    Initialize system planner.

    Args:
        chain_generator: Chain generator for building chains
        target_total_mass: Target total system mass (g/mol)
        max_rel_error: Maximum relative error allowed (default 0.02 = 2%)
        max_chains: Maximum number of chains to generate (None = no limit)
        enable_trimming: Whether to enable chain trimming to better hit target mass
    """
    self.chain_generator = chain_generator
    self.target_total_mass = target_total_mass
    self.max_rel_error = max_rel_error
    self.max_chains = max_chains
    self.enable_trimming = enable_trimming

plan_system

plan_system(rng)

Repeatedly ask chain_generator for new chains until accumulated mass reaches target_total_mass within max_rel_error.

Parameters:

Name Type Description Default
rng Random

Random number generator

required

Returns:

Type Description
SystemPlan

SystemPlan with all chains and total mass

Source code in src/molpy/builder/polymer/system.py
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
def plan_system(self, rng: Random) -> SystemPlan:
    """
    Repeatedly ask chain_generator for new chains until accumulated mass
    reaches target_total_mass within max_rel_error.

    Args:
        rng: Random number generator

    Returns:
        SystemPlan with all chains and total mass
    """
    total_mass = 0.0
    chains: list[Chain] = []
    max_allowed_mass = self.target_total_mass * (1 + self.max_rel_error)

    while total_mass < self.target_total_mass:
        # Check max_chains constraint
        if self.max_chains is not None and len(chains) >= self.max_chains:
            break

        # Generate next chain
        chain = self.chain_generator.build_chain(rng)

        # Check if adding this chain would exceed the maximum allowed mass
        if total_mass + chain.mass <= max_allowed_mass:
            chains.append(chain)
            total_mass += chain.mass
        else:
            # Try to trim this chain if enabled
            if self.enable_trimming:
                remaining_mass = self.target_total_mass - total_mass
                trimmed = self._try_trim_chain(chain, remaining_mass, rng)
                if trimmed is not None:
                    chains.append(trimmed)
                    total_mass += trimmed.mass
            break

    return SystemPlan(
        chains=chains,
        total_mass=total_mass,
        target_mass=self.target_total_mass,
    )

TableConnector

TableConnector(rules, fallback=None)

Bases: Connector

Rule-based port selection using a lookup table.

Maps (left_label, right_label) -> (left_port, right_port [, bond_kind])

Example

rules = { ("A", "B"): ("1", "2"), ("B", "A"): ("3", "1", "="), # with bond kind override ("T", "A"): ("t", "1"), } connector = TableConnector(rules, fallback=AutoConnector())

Initialize table connector.

Parameters:

Name Type Description Default
rules Mapping[tuple[str, str], tuple[str, str] | tuple[str, str, BondKind]]

Mapping from (left_label, right_label) to port specifications

required
fallback Connector | None

Optional connector to try if pair not in rules

None
Source code in src/molpy/builder/polymer/connectors.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
def __init__(
    self,
    rules: Mapping[tuple[str, str], tuple[str, str] | tuple[str, str, BondKind]],
    fallback: Connector | None = None,
):
    """
    Initialize table connector.

    Args:
        rules: Mapping from (left_label, right_label) to port specifications
        fallback: Optional connector to try if pair not in rules
    """
    self.rules = dict(rules)  # Convert to dict for internal use
    self.fallback = fallback

select_ports

select_ports(left, right, left_ports, right_ports, ctx)

Select ports using table lookup.

Source code in src/molpy/builder/polymer/connectors.py
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
def select_ports(
    self,
    left: Atomistic,
    right: Atomistic,
    left_ports: Mapping[str, list[PortInfo]],
    right_ports: Mapping[str, list[PortInfo]],
    ctx: ConnectorContext,
) -> tuple[str, int, str, int, BondKind | None]:
    """Select ports using table lookup."""

    left_label = ctx.get("left_label", "")
    right_label = ctx.get("right_label", "")
    key = (left_label, right_label)

    if key in self.rules:
        rule = self.rules[key]
        # Use first port (index 0) when multiple ports have the same name
        if len(rule) == 2:
            return (rule[0], 0, rule[1], 0, None)
        else:
            return (rule[0], 0, rule[1], 0, rule[2])

    # Try fallback if available
    if self.fallback is not None:
        return self.fallback.select_ports(left, right, left_ports, right_ports, ctx)

    # No rule and no fallback
    raise MissingConnectorRule(
        f"No rule found for ({left_label}, {right_label}) and no fallback connector"
    )

UniformPolydisperse

UniformPolydisperse(min_dp, max_dp, random_seed=None)

Bases: DPDistribution

Uniform distribution over degree of polymerization (DP).

Implements :class:DPDistribution - sampling is done directly in DP space. All integer DP values between :math:N_{\min} and :math:N_{\max} (inclusive) are equally likely.

The probability mass function (PMF) is

.. math::

P(N = k) =
\begin{cases}
    \dfrac{1}{N_{\max} - N_{\min} + 1},
        & N_{\min} \le k \le N_{\max}, \\
    0, & \text{otherwise},
\end{cases}

where :math:N denotes the degree of polymerization.

Parameters

min_dp: Lower bound :math:N_{\min} for the degree of polymerization (must be :math:\ge 1). max_dp: Upper bound :math:N_{\max} for the degree of polymerization (must satisfy :math:N_{\max} \ge N_{\min}). random_seed: Optional random seed used when sampling.

Initialize uniform DP distribution.

Parameters:

Name Type Description Default
min_dp int

Minimum degree of polymerization (must be >= 1)

required
max_dp int

Maximum degree of polymerization (must be >= min_dp)

required
random_seed int | None

Random seed for reproducible sampling (optional)

None

Raises:

Type Description
ValueError

If min_dp < 1 or max_dp < min_dp

Source code in src/molpy/builder/polymer/system.py
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
def __init__(
    self,
    min_dp: int,
    max_dp: int,
    random_seed: int | None = None,
):
    """
    Initialize uniform DP distribution.

    Args:
        min_dp: Minimum degree of polymerization (must be >= 1)
        max_dp: Maximum degree of polymerization (must be >= min_dp)
        random_seed: Random seed for reproducible sampling (optional)

    Raises:
        ValueError: If min_dp < 1 or max_dp < min_dp
    """
    if min_dp < 1:
        raise ValueError(f"min_dp must be >= 1, got {min_dp}")
    if max_dp < min_dp:
        raise ValueError(f"max_dp ({max_dp}) must be >= min_dp ({min_dp})")

    self.min_dp = min_dp
    self.max_dp = max_dp
    self.random_seed = random_seed

dp_pmf

dp_pmf(dp_array)

Compute the probability mass function (PMF) over DP values.

The PMF assigns equal probability to all integer DP values between min_dp and max_dp (inclusive), and zero probability outside this range.

Parameters:

Name Type Description Default
dp_array ndarray

Array of DP values (typically integer, but can be float)

required

Returns:

Type Description
ndarray

Array of PMF values, same shape as dp_array.

ndarray

PMF[i] = 1 / (max_dp - min_dp + 1) if min_dp <= dp_array[i] <= max_dp, 0 otherwise.

Source code in src/molpy/builder/polymer/system.py
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
def dp_pmf(self, dp_array: np.ndarray) -> np.ndarray:
    """
    Compute the probability mass function (PMF) over DP values.

    The PMF assigns equal probability to all integer DP values between
    min_dp and max_dp (inclusive), and zero probability outside this range.

    Args:
        dp_array: Array of DP values (typically integer, but can be float)

    Returns:
        Array of PMF values, same shape as dp_array.
        PMF[i] = 1 / (max_dp - min_dp + 1) if min_dp <= dp_array[i] <= max_dp,
                0 otherwise.
    """
    dp_array = np.asarray(dp_array, dtype=float)
    pmf = np.zeros_like(dp_array, dtype=float)

    # Count number of valid integer DP values in the range
    n_valid = self.max_dp - self.min_dp + 1
    uniform_prob = 1.0 / n_valid

    # Assign probability to DP values within the valid range
    # Use np.round to handle float DP values (rounds to nearest integer)
    dp_rounded = np.round(dp_array).astype(int)
    mask = (dp_rounded >= self.min_dp) & (dp_rounded <= self.max_dp)
    pmf[mask] = uniform_prob

    return pmf

sample_dp

sample_dp(rng)

Sample degree of polymerization from uniform distribution.

Parameters:

Name Type Description Default
rng Generator

NumPy random number generator

required

Returns:

Type Description
int

Degree of polymerization (>= 1)

Source code in src/molpy/builder/polymer/system.py
517
518
519
520
521
522
523
524
525
526
def sample_dp(self, rng: np.random.Generator) -> int:
    """Sample degree of polymerization from uniform distribution.

    Args:
        rng: NumPy random number generator

    Returns:
        Degree of polymerization (>= 1)
    """
    return int(rng.integers(self.min_dp, self.max_dp + 1))

WeightedSequenceGenerator

WeightedSequenceGenerator(weights=None, n_monomers=None, monomer_weights=None)

Sequence generator based on monomer weights/proportions.

This generator selects monomers based on their relative weights. Each selection is independent (no memory of previous selections).

Conforms to the SequenceGenerator protocol.

Initialize weighted sequence generator.

Parameters:

Name Type Description Default
weights dict[int, float] | None

Dictionary mapping monomer index to selection weight (legacy format)

None
n_monomers int | None

Total number of available monomers (legacy format)

None
monomer_weights dict[str, float] | None

Dictionary mapping monomer identifier to selection weight (new format)

None

Note: Either (weights, n_monomers) or monomer_weights should be provided. If monomer_weights is provided, it takes precedence.

Source code in src/molpy/builder/polymer/sequence_generator.py
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def __init__(
    self,
    weights: dict[int, float] | None = None,
    n_monomers: int | None = None,
    monomer_weights: dict[str, float] | None = None,
):
    """
    Initialize weighted sequence generator.

    Args:
        weights: Dictionary mapping monomer index to selection weight (legacy format)
        n_monomers: Total number of available monomers (legacy format)
        monomer_weights: Dictionary mapping monomer identifier to selection weight (new format)

    Note: Either (weights, n_monomers) or monomer_weights should be provided.
    If monomer_weights is provided, it takes precedence.
    """
    if monomer_weights is not None:
        # New format: use string identifiers
        self.monomer_weights = monomer_weights
        self.monomer_ids = sorted(monomer_weights.keys())
        self.n_monomers = len(self.monomer_ids)

        # Normalize weights to probabilities
        total_weight = sum(monomer_weights.values())
        if total_weight > 0:
            self.probs = [
                monomer_weights[mid] / total_weight for mid in self.monomer_ids
            ]
        else:
            # Equal probability if all weights are zero
            self.probs = [1.0 / self.n_monomers] * self.n_monomers
    elif weights is not None and n_monomers is not None:
        # Legacy format: convert indices to string identifiers
        self.n_monomers = n_monomers
        self.monomer_ids = [str(i) for i in range(n_monomers)]
        self.monomer_weights = {
            str(i): weights.get(i, 1.0) for i in range(n_monomers)
        }

        # Normalize weights to probabilities
        monomer_weights_list = [weights.get(i, 1.0) for i in range(n_monomers)]
        total_weight = sum(monomer_weights_list)
        if total_weight > 0:
            self.probs = [w / total_weight for w in monomer_weights_list]
        else:
            # Equal probability if all weights are zero
            self.probs = [1.0 / n_monomers] * n_monomers
    else:
        raise ValueError(
            "Either (weights, n_monomers) or monomer_weights must be provided"
        )

expected_composition

expected_composition()

Return expected long-chain monomer fractions.

Returns:

Type Description
dict[str, float]

Dictionary mapping monomer identifiers to their expected fractions

Source code in src/molpy/builder/polymer/sequence_generator.py
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
def expected_composition(self) -> dict[str, float]:
    """
    Return expected long-chain monomer fractions.

    Returns:
        Dictionary mapping monomer identifiers to their expected fractions
    """
    total_weight = sum(self.monomer_weights.values())
    if total_weight > 0:
        return {
            mid: self.monomer_weights[mid] / total_weight
            for mid in self.monomer_ids
        }
    else:
        # Equal probability if all weights are zero
        equal_prob = 1.0 / len(self.monomer_ids)
        return {mid: equal_prob for mid in self.monomer_ids}

generate_sequence

generate_sequence(dp, rng)

Generate a sequence of specified degree of polymerization.

Parameters:

Name Type Description Default
dp int

Degree of polymerization (number of monomers)

required
rng Random

Random number generator for reproducible sampling

required

Returns:

Type Description
list[str]

list of monomer identifiers (strings)

Source code in src/molpy/builder/polymer/sequence_generator.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
def generate_sequence(
    self,
    dp: int,
    rng: Random,
) -> list[str]:
    """
    Generate a sequence of specified degree of polymerization.

    Args:
        dp: Degree of polymerization (number of monomers)
        rng: Random number generator for reproducible sampling

    Returns:
        list of monomer identifiers (strings)
    """
    # Use rng.choices for weighted random selection
    sequence = rng.choices(
        self.monomer_ids,
        weights=[self.monomer_weights[mid] for mid in self.monomer_ids],
        k=dp,
    )
    return sequence