Skip to content

Tool

High-level packaged recipes and analysis operations.

Quick reference

Recipes (polymer building)

Symbol Summary Preferred for
PrepareMonomer BigSMILES → 3D Atomistic with ports and topology Monomer template creation
polymer(notation) Auto-detect notation → built chain Quick single-chain building
polymer_system(gbigsmiles) GBigSMILES → polydisperse chain list Multi-chain system building
generate_3d(mol) Add 3D coordinates via RDKit Coordinate generation

Analysis (compute operations)

Symbol Summary Preferred for
MSD Mean squared displacement Diffusion analysis
DisplacementCorrelation Cross-displacement correlation Ion transport correlations
compute_msd(positions) Convenience function for MSD Quick MSD calculation
compute_acf(series) Autocorrelation function Time-series analysis

Canonical examples

from molpy.tool import PrepareMonomer, polymer, generate_3d

# Prepare one monomer
prep = PrepareMonomer()
eo = prep.run("{[<]CCO[>]}")

# Build a chain (auto-detects notation)
chain = polymer("{[<]CCO[>]}|10|")

# Generate 3D coordinates
mol_3d = generate_3d(mol, add_hydrogens=True)
from molpy.tool import MSD

msd = MSD(max_lag=3000)
msd_values = msd(unwrapped_positions)  # shape (max_lag,)

Key behavior

  • Tool subclasses are frozen dataclasses: configuration at init, execution via .run()
  • Compute subclasses are callable: compute(data) or compute.run(data)
  • generate_3d requires RDKit; raises ImportError if not installed

Full API

Base

base

Base abstractions for computation and tool operations.

Provides:

  • Compute: frozen-dataclass ABC for analysis operations (MSD, correlations)
  • Tool: frozen-dataclass ABC for executable tools (builders, transforms)
  • ToolRegistry: auto-discovery registry for Tool subclasses

Tool dataclass

Tool()

Bases: ABC

Base class for executable tools (builders, transforms).

Concrete subclasses are auto-registered in ToolRegistry and discovered by the MCP server. Unlike Compute (analysis-only), Tool is intended for molecular operations that produce or transform structures.

Usage::

@dataclass(frozen=True)
class MyTool(Tool):
    param: int = 10

    def run(self, input: str) -> dict:
        return {"result": input, "param": self.param}

tool = MyTool(param=5)
result = tool("hello")  # delegates to run()
run abstractmethod
run(*args, **kwargs)

Core tool logic. Subclasses must implement.

Compute dataclass

Compute()

Bases: ABC

Base class for analysis operations (MSD, correlations).

Duck-type compatible with pydantic-graph BaseNode:

  • Dataclass fields = configuration parameters (like Node fields).
  • run() method = core logic (like Node.run()).
  • get_node_id() classmethod = unique identifier.

Usage::

msd = MSD(max_lag=3000)
result = msd(positions)          # __call__ delegates to run()
get_node_id classmethod
get_node_id()

Unique node identifier (pydantic-graph compatible).

run abstractmethod
run(*args, **kwargs)

Core computation logic. Subclasses must implement.

Polymer Tools

polymer

Polymer building tools.

Recipes that wrap the parser, adapter, builder, and reacter modules into single-call operations for common polymer construction tasks.

Tools (auto-registered in ToolRegistry): - PrepareMonomer — BigSMILES → 3D Atomistic with ports - BuildPolymer — CGSmiles + library → assembled chain - PlanSystem — distribution parameters → chain plan (no atoms) - BuildSystem — G-BigSMILES → list of built chains

Convenience functions: - polymer() — auto-detect notation, build single chain - polymer_system() — G-BigSMILES → multi-chain system

BuildPolymer dataclass

BuildPolymer(reaction_preset='dehydration', use_placer=True)

Bases: Tool

Build a polymer chain from CGSmiles notation and a monomer library.

Preferred for
  • Assembling a single chain from pre-prepared monomers.
  • Iterating over a system plan to build chains one at a time.
Avoid when
  • You want end-to-end build from a string (use polymer() or BuildSystem).
  • You need custom reaction logic (use PolymerBuilder directly).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset (default "dehydration").

use_placer bool

Enable geometric placement of monomers.

run
run(cgsmiles, library)

Build a polymer chain.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer.

required

Returns:

Type Description
dict[str, Any]

Dict with "polymer" (Atomistic), "total_steps" (int),

dict[str, Any]

and "connection_history" (list).

BuildPolymerAmber dataclass

BuildPolymerAmber(reaction_preset='dehydration', force_field='gaff2', charge_method='bcc', conda_env=None, work_dir='amber_work')

Bases: Tool

Build a polymer chain using the AmberTools backend.

Uses antechamber, parmchk2, prepgen, and tleap to assemble a polymer from a CGSmiles string and a monomer library. Returns both MolPy structures and AMBER topology/coordinate files.

Preferred for
  • Polymer systems that need AMBER force field parameters (GAFF/GAFF2).
  • Workflows that feed into AMBER or LAMMPS with AMBER-style inputs.
Avoid when
  • You do not need force field parameters (use BuildPolymer).
  • AmberTools is not installed.

Attributes:

Name Type Description
reaction_preset str | None

Named preset for leaving group detection. When None, hydrogen atoms bonded to port atoms are auto-detected.

force_field str

Amber force field ("gaff" or "gaff2").

charge_method str

Antechamber charge method.

conda_env str | None

Conda environment containing AmberTools.

work_dir str

Directory for intermediate files.

run
run(cgsmiles, library)

Build a polymer using AmberTools.

Parameters:

Name Type Description Default
cgsmiles str

CGSmiles notation (e.g. "{[#EO]|10}").

required
library dict[str, Atomistic]

Mapping from label to prepared Atomistic monomer. Each monomer must have port="<" (head) and port=">" (tail) annotations.

required

Returns:

Type Description
dict[str, Any]

Dict with "frame", "forcefield", "prmtop_path",

dict[str, Any]

"inpcrd_path", "pdb_path", "monomer_count".

BuildSystem dataclass

BuildSystem(reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Bases: Tool

End-to-end polymer system construction from G-BigSMILES.

Parses a G-BigSMILES string and delegates to the GBigSmilesCompiler to produce a list of Atomistic chains.

Preferred for
  • Building a complete polydisperse system in one call.
  • When you do not need to inspect the system plan before building.
Avoid when
  • You need to inspect or modify the plan first (use PlanSystem + BuildPolymer).
  • You need the Amber backend (use BuildPolymerAmber).

Attributes:

Name Type Description
reaction_preset str

Name of reaction preset.

add_hydrogens bool

Add explicit hydrogens during monomer preparation.

optimize bool

Optimize monomer geometry.

random_seed int | None

Random seed for reproducibility.

run
run(gbigsmiles)

Build a polymer system from a G-BigSMILES string.

Parameters:

Name Type Description Default
gbigsmiles str

G-BigSMILES notation (e.g. "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|").

required

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

PlanSystem dataclass

PlanSystem(random_seed=None)

Bases: Tool

Plan a polydisperse polymer system from distribution parameters.

Returns chain specifications (DP, monomer sequence, mass) without creating any atoms. Use this to validate distribution parameters before committing to an expensive build.

Preferred for
  • Previewing system composition before building.
  • Iterating on distribution parameters cheaply.
Avoid when
  • You want chains built directly (use BuildSystem or polymer_system).

Attributes:

Name Type Description
random_seed int | None

Random seed for reproducibility.

run
run(monomer_weights, monomer_mass, distribution_type, distribution_params, target_total_mass, end_group_mass=0.0, max_rel_error=0.02)

Plan a polydisperse polymer system.

Parameters:

Name Type Description Default
monomer_weights dict[str, float]

Weight fractions for each monomer label.

required
monomer_mass dict[str, float]

Molar mass (g/mol) per monomer label.

required
distribution_type str

Distribution name (e.g. "schulz_zimm").

required
distribution_params dict[str, float]

Distribution parameters as {"p0": ..., "p1": ...}.

required
target_total_mass float

Target total system mass (g/mol).

required
end_group_mass float

Mass of end groups per chain (g/mol).

0.0
max_rel_error float

Maximum relative error for total mass.

0.02

Returns:

Type Description
dict[str, Any]

Dict with "chains" (list of chain dicts), "total_mass",

dict[str, Any]

and "target_mass".

PrepareMonomer dataclass

PrepareMonomer(add_hydrogens=True, optimize=True, gen_topology=True)

Bases: Tool

Parse a BigSMILES monomer string and produce an Atomistic structure.

Pipeline: parse BigSMILES → convert to Atomistic with port markers → generate 3D coordinates via RDKit (if available) → compute angles/dihedrals.

Preferred for
  • Preparing monomers for BuildPolymer or polymer().
  • One-step SMILES-to-3D when you need port annotations.
Avoid when
  • You already have an Atomistic struct (use RDKit adapter directly).
  • You need custom 3D embedding parameters (use Generate3D).

Attributes:

Name Type Description
add_hydrogens bool

Add explicit hydrogens during 3D generation.

optimize bool

Optimize geometry after 3D embedding.

gen_topology bool

Compute angles and dihedrals.

run
run(smiles)

Prepare a monomer from a BigSMILES string.

Parameters:

Name Type Description Default
smiles str

BigSMILES string (e.g. "{[<]CCOCC[>]}").

required

Returns:

Type Description
Atomistic

Atomistic structure with ports marked and optional 3D coordinates.

polymer

polymer(spec, *, library=None, reaction_preset='dehydration', use_placer=True, add_hydrogens=True, optimize=True, random_seed=None, backend='default', amber_config=None)

Build a single polymer chain from a string specification.

Auto-detects notation type (for the default backend):

  • G-BigSMILES (contains | annotation): polymer("{[<]CCOCC[>]}|10|")
  • CGSmiles + inline fragments (contains .{#): polymer("{[#EO]|10}.{#EO=[<]COC[>]}")
  • Pure CGSmiles (requires library kwarg): polymer("{[#EO]|10}", library={"EO": eo_monomer})

For the Amber backend:

  • polymer("{[#EO]|10}", library={"EO": eo}, backend="amber")

Parameters:

Name Type Description Default
spec str

Polymer specification string.

required
library Mapping[str, Atomistic] | None

Monomer library (required for pure CGSmiles and Amber).

None
reaction_preset str

Reaction preset name.

'dehydration'
use_placer bool

Enable geometric placement (default backend only).

True
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None
backend Backend

Builder backend — "default" or "amber".

'default'
amber_config Any

Optional AmberPolymerBuilderConfig for fine-grained control of the Amber backend. When None, defaults are used.

None

Returns:

Type Description
Atomistic | Any

Atomistic (default backend) or AmberBuildResult (amber backend).

polymer_system

polymer_system(spec, *, reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)

Build a multi-chain polymer system from G-BigSMILES.

Example::

chains = polymer_system(
    "{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
    random_seed=42,
)

Parameters:

Name Type Description Default
spec str

G-BigSMILES specification string.

required
reaction_preset str

Reaction preset name.

'dehydration'
add_hydrogens bool

Add hydrogens during 3D generation.

True
optimize bool

Optimize geometry.

True
random_seed int | None

Random seed for reproducibility.

None

Returns:

Type Description
list[Atomistic]

List of Atomistic structures (one per chain).

MSD

msd

Mean Squared Displacement computation.

Operates on plain NDArrays — no trajectory coupling.

MSD dataclass

MSD(max_lag)

Bases: Compute

Compute mean squared displacement at each time lag.

MSD(dt) = <(r_i(t+dt) - r_i(t))^2>_{i, t}

Parameters:

Name Type Description Default
max_lag int

Maximum time lag in frames.

required

Examples:

Self-diffusion::

cation_coords = unwrapped[:, cation_mask, :]  # (n_frames, n_cations, 3)
msd = MSD(max_lag=3000)
msd_values = msd(cation_coords)               # -> NDArray (max_lag,)

Polarization MSD (no dedicated class needed)::

polarization = (
    coords[:, cat_mask, :].sum(axis=1)
    - coords[:, an_mask, :].sum(axis=1)
)  # (n_frames, 3)
pmsd_values = msd(polarization[:, None, :])    # -> NDArray (max_lag,)
run
run(positions)

Compute MSD from positions.

Parameters:

Name Type Description Default
positions NDArray

Coordinate array with shape (n_frames, n_particles, n_dim). For polarization MSD, reshape a (n_frames, 3) vector to (n_frames, 1, 3).

required

Returns:

Type Description
NDArray

MSD values at each time lag, shape (max_lag,).

msd

msd(positions, *, max_lag)

Compute MSD. Shorthand for MSD(max_lag=max_lag)(positions).

Cross-Displacement Correlation

cross_correlation

Cross-displacement correlation computation.

Operates on plain NDArrays — no trajectory coupling.

DisplacementCorrelation dataclass

DisplacementCorrelation(max_lag, exclude_self=False)

Bases: Compute

Compute cross-displacement correlation between two groups.

For two groups A and B the correlation at time lag dt is:

C(dt) = <sum_i dr_i^A(dt) . sum_j dr_j^B(dt)> / N_A

where dr_i(dt) = r_i(t+dt) - r_i(t).

When exclude_self=True and both inputs are the same species, the self-terms are subtracted so only distinct correlations remain:

C_distinct(dt) = <dr_i . (sum_j dr_j - dr_i)>_{i, t}

Parameters:

Name Type Description Default
max_lag int

Maximum time lag in frames.

required
exclude_self bool

If True, subtract self-correlation (for same-species distinct diffusion).

False

Examples:

Cross-species (cation-anion)::

xdc = DisplacementCorrelation(max_lag=3000)
corr = xdc(cation_coords, anion_coords)  # -> NDArray (max_lag,)

Same-species distinct (exclude self-correlation)::

xdc = DisplacementCorrelation(max_lag=3000, exclude_self=True)
corr = xdc(cation_coords, cation_coords)  # -> NDArray (max_lag,)
run
run(positions_a, positions_b)

Compute displacement correlation.

Parameters:

Name Type Description Default
positions_a NDArray

Coordinates of group A, shape (n_frames, n_a, n_dim).

required
positions_b NDArray

Coordinates of group B, shape (n_frames, n_b, n_dim).

required

Returns:

Type Description
NDArray

Correlation values at each time lag, shape (max_lag,).

displacement_correlation

displacement_correlation(positions_a, positions_b, *, max_lag, exclude_self=False)

Compute displacement correlation.

Shorthand for DisplacementCorrelation(max_lag=max_lag, exclude_self=exclude_self)(positions_a, positions_b).

Time Series

time_series

Time-series analysis operations for trajectory data.

This module provides utilities for computing time-correlation functions, mean squared displacements, and other time-series statistics commonly used in molecular dynamics trajectory analysis.

Adapted from the tame library (https://github.com/Roy-Kid/tame).

TimeAverage

TimeAverage(shape, dtype=np.float64, dropnan='partial')

Compute running time average with NaN handling.

This class accumulates data over time and computes the average, with options for handling NaN values.

Parameters:

Name Type Description Default
shape tuple[int, ...]

Shape of data arrays to average

required
dtype dtype | type

Data type for accumulated arrays

float64
dropnan Literal['none', 'partial', 'all']

How to handle NaN values: - 'none': Include NaN values in average (result may be NaN) - 'partial': Ignore individual NaN entries - 'all': Skip entire frame if any NaN is present

'partial'

Examples:

>>> avg = TimeAverage(shape=(10,), dropnan='partial')
>>> avg.update(np.array([1.0, 2.0, np.nan, 4.0]))
>>> avg.update(np.array([2.0, 3.0, 3.0, 5.0]))
>>> result = avg.get()  # [1.5, 2.5, 3.0, 4.5]
get
get()

Get current time-averaged value.

Returns:

Type Description
NDArray

Time-averaged data array

reset
reset()

Reset accumulator to initial state.

update
update(new_data)

Add new data to running average.

Parameters:

Name Type Description Default
new_data NDArray

New data array to include in average

required

TimeCache

TimeCache(cache_size, shape, dtype=np.float64, default_val=np.nan)

Cache previous N frames of trajectory data for correlation calculations.

Uses an in-place ring buffer for O(1) per update (no array allocation).

Parameters:

Name Type Description Default
cache_size int

Number of frames to cache (maximum time lag)

required
shape tuple[int, ...]

Shape of data arrays to cache (e.g., (n_atoms, 3) for coordinates)

required
dtype dtype | type

Data type for cached arrays

float64
default_val float

Default value to fill cache initially (default: NaN)

nan

Examples:

>>> cache = TimeCache(cache_size=100, shape=(10, 3))
>>> coords = np.random.randn(10, 3)
>>> cache.update(coords)
>>> cached_data = cache.get()  # Shape: (100, 10, 3)
cache property
cache

Return data ordered newest-first (for backward compatibility).

get
get()

Get cached data array, ordered newest-first.

Returns:

Type Description
NDArray

Cached data with shape (cache_size, *data_shape)

reset
reset()

Reset cache to initial state.

update
update(new_data)

Add new frame to cache (O(1) in-place write).

Parameters:

Name Type Description Default
new_data NDArray

New data array to add (shape must match self.shape)

required

compute_acf

compute_acf(data, cache_size, dropnan='partial')

Compute autocorrelation function over trajectory.

Calculates: _{i,t}

The particle dimension is averaged, and the time dimension is accumulated using a rolling cache to compute correlations at different time lags.

Parameters:

Name Type Description Default
data NDArray

Trajectory data with shape (n_frames, n_particles, n_dim)

required
cache_size int

Maximum time lag (dt) to compute, in frames

required
dropnan Literal['none', 'partial', 'all']

How to handle NaN values in averaging

'partial'

Returns:

Type Description
NDArray

ACF array with shape (cache_size,) containing ACF at each time lag

Examples:

>>> # Velocity autocorrelation
>>> n_frames, n_particles = 1000, 100
>>> velocities = np.random.randn(n_frames, n_particles, 3)
>>> acf = compute_acf(velocities, cache_size=100)

compute_msd

compute_msd(data, cache_size, dropnan='partial')

Compute mean squared displacement over trajectory.

Calculates: <(r_i(t+dt) - r_i(t))^2>_{i,t}

The particle dimension is averaged, and the time dimension is accumulated using a rolling cache to compute correlations at different time lags.

Parameters:

Name Type Description Default
data NDArray

Trajectory data with shape (n_frames, n_particles, n_dim)

required
cache_size int

Maximum time lag (dt) to compute, in frames

required
dropnan Literal['none', 'partial', 'all']

How to handle NaN values in averaging

'partial'

Returns:

Type Description
NDArray

MSD array with shape (cache_size,) containing MSD at each time lag

Examples:

>>> # Simple 1D random walk
>>> n_frames, n_particles = 1000, 100
>>> positions = np.cumsum(np.random.randn(n_frames, n_particles, 1), axis=0)
>>> msd = compute_msd(positions, cache_size=100)
>>> # MSD should grow linearly with time for random walk

RDKit Tools

rdkit

RDKit-based molecular operations for RDKitAdapter.

This module provides frozen-dataclass tools that operate on RDKitAdapter instances. Generate3D inherits Tool and is auto-registered in ToolRegistry. OptimizeGeometry is an internal helper (not a Tool).

Generate3D dataclass

Generate3D(add_hydrogens=True, sanitize=True, embed=True, optimize=True, max_embed_attempts=10, embed_random_seed=0, max_opt_iters=200, forcefield='UFF', update_internal=True)

Bases: Tool

RDKit-based 3D generation pipeline for RDKitAdapter.

Pipeline stages (each optional): 1. Add explicit hydrogens 2. Sanitize molecule 3. Generate 3D coordinates via embedding 4. Optimize geometry with force field

Attributes:

Name Type Description
add_hydrogens bool

Whether to add explicit hydrogens before embedding

sanitize bool

Whether to sanitize the molecule

embed bool

Whether to perform 3D coordinate embedding

optimize bool

Whether to optimize geometry after embedding

max_embed_attempts int

Maximum number of embedding attempts

embed_random_seed int | None

Random seed for embedding (None for random)

max_opt_iters int

Maximum optimization iterations

forcefield str

Force field to use ("UFF" or "MMFF94")

update_internal bool

Whether to sync internal structure after modifications

Examples:

>>> op = Generate3D(add_hydrogens=True, embed=True, optimize=True)
>>> result_adapter = op(adapter)

OptimizeGeometry dataclass

OptimizeGeometry(max_opt_iters=200, forcefield='UFF', update_internal=True, raise_on_failure=False)

RDKit-based geometry optimization for RDKitAdapter.

Attributes:

Name Type Description
max_opt_iters int

Maximum optimization iterations

forcefield str

Force field to use ("UFF" or "MMFF94")

update_internal bool

Whether to sync internal structure after optimization

raise_on_failure bool

Whether to raise exception on optimization failure

Examples:

>>> optimizer = OptimizeGeometry(forcefield="UFF", max_opt_iters=200)
>>> result_adapter = optimizer(adapter)