Tool¶
High-level packaged recipes and analysis operations.
Quick reference¶
Recipes (polymer building)¶
| Symbol | Summary | Preferred for |
|---|---|---|
PrepareMonomer |
BigSMILES → 3D Atomistic with ports and topology | Monomer template creation |
polymer(notation) |
Auto-detect notation → built chain | Quick single-chain building |
polymer_system(gbigsmiles) |
GBigSMILES → polydisperse chain list | Multi-chain system building |
generate_3d(mol) |
Add 3D coordinates via RDKit | Coordinate generation |
Analysis (compute operations)¶
| Symbol | Summary | Preferred for |
|---|---|---|
MSD |
Mean squared displacement | Diffusion analysis |
DisplacementCorrelation |
Cross-displacement correlation | Ion transport correlations |
compute_msd(positions) |
Convenience function for MSD | Quick MSD calculation |
compute_acf(series) |
Autocorrelation function | Time-series analysis |
Canonical examples¶
from molpy.tool import PrepareMonomer, polymer, generate_3d
# Prepare one monomer
prep = PrepareMonomer()
eo = prep.run("{[<]CCO[>]}")
# Build a chain (auto-detects notation)
chain = polymer("{[<]CCO[>]}|10|")
# Generate 3D coordinates
mol_3d = generate_3d(mol, add_hydrogens=True)
from molpy.tool import MSD
msd = MSD(max_lag=3000)
msd_values = msd(unwrapped_positions) # shape (max_lag,)
Key behavior¶
Toolsubclasses are frozen dataclasses: configuration at init, execution via.run()Computesubclasses are callable:compute(data)orcompute.run(data)generate_3drequires RDKit; raisesImportErrorif not installed
Related¶
Full API¶
Base¶
base ¶
Base abstractions for computation and tool operations.
Provides:
Compute: frozen-dataclass ABC for analysis operations (MSD, correlations)Tool: frozen-dataclass ABC for executable tools (builders, transforms)ToolRegistry: auto-discovery registry forToolsubclasses
Tool
dataclass
¶
Bases: ABC
Base class for executable tools (builders, transforms).
Concrete subclasses are auto-registered in ToolRegistry and
discovered by the MCP server. Unlike Compute (analysis-only),
Tool is intended for molecular operations that produce or
transform structures.
Usage::
@dataclass(frozen=True)
class MyTool(Tool):
param: int = 10
def run(self, input: str) -> dict:
return {"result": input, "param": self.param}
tool = MyTool(param=5)
result = tool("hello") # delegates to run()
Compute
dataclass
¶
Bases: ABC
Base class for analysis operations (MSD, correlations).
Duck-type compatible with pydantic-graph BaseNode:
- Dataclass fields = configuration parameters (like Node fields).
run()method = core logic (likeNode.run()).get_node_id()classmethod = unique identifier.
Usage::
msd = MSD(max_lag=3000)
result = msd(positions) # __call__ delegates to run()
Polymer Tools¶
polymer ¶
Polymer building tools.
Recipes that wrap the parser, adapter, builder, and reacter modules into single-call operations for common polymer construction tasks.
Tools (auto-registered in ToolRegistry):
- PrepareMonomer — BigSMILES → 3D Atomistic with ports
- BuildPolymer — CGSmiles + library → assembled chain
- PlanSystem — distribution parameters → chain plan (no atoms)
- BuildSystem — G-BigSMILES → list of built chains
Convenience functions:
- polymer() — auto-detect notation, build single chain
- polymer_system() — G-BigSMILES → multi-chain system
BuildPolymer
dataclass
¶
Bases: Tool
Build a polymer chain from CGSmiles notation and a monomer library.
Preferred for
- Assembling a single chain from pre-prepared monomers.
- Iterating over a system plan to build chains one at a time.
Avoid when
- You want end-to-end build from a string (use polymer() or BuildSystem).
- You need custom reaction logic (use PolymerBuilder directly).
Attributes:
| Name | Type | Description |
|---|---|---|
reaction_preset |
str
|
Name of reaction preset (default |
use_placer |
bool
|
Enable geometric placement of monomers. |
run ¶
Build a polymer chain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cgsmiles
|
str
|
CGSmiles notation (e.g. |
required |
library
|
dict[str, Atomistic]
|
Mapping from label to prepared Atomistic monomer. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with |
dict[str, Any]
|
and |
BuildPolymerAmber
dataclass
¶
BuildPolymerAmber(reaction_preset='dehydration', force_field='gaff2', charge_method='bcc', conda_env=None, work_dir='amber_work')
Bases: Tool
Build a polymer chain using the AmberTools backend.
Uses antechamber, parmchk2, prepgen, and tleap to assemble a polymer from a CGSmiles string and a monomer library. Returns both MolPy structures and AMBER topology/coordinate files.
Preferred for
- Polymer systems that need AMBER force field parameters (GAFF/GAFF2).
- Workflows that feed into AMBER or LAMMPS with AMBER-style inputs.
Avoid when
- You do not need force field parameters (use BuildPolymer).
- AmberTools is not installed.
Attributes:
| Name | Type | Description |
|---|---|---|
reaction_preset |
str | None
|
Named preset for leaving group detection. When None, hydrogen atoms bonded to port atoms are auto-detected. |
force_field |
str
|
Amber force field ( |
charge_method |
str
|
Antechamber charge method. |
conda_env |
str | None
|
Conda environment containing AmberTools. |
work_dir |
str
|
Directory for intermediate files. |
run ¶
Build a polymer using AmberTools.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cgsmiles
|
str
|
CGSmiles notation (e.g. |
required |
library
|
dict[str, Atomistic]
|
Mapping from label to prepared Atomistic monomer.
Each monomer must have |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with |
dict[str, Any]
|
|
BuildSystem
dataclass
¶
Bases: Tool
End-to-end polymer system construction from G-BigSMILES.
Parses a G-BigSMILES string and delegates to the GBigSmilesCompiler to produce a list of Atomistic chains.
Preferred for
- Building a complete polydisperse system in one call.
- When you do not need to inspect the system plan before building.
Avoid when
- You need to inspect or modify the plan first (use PlanSystem + BuildPolymer).
- You need the Amber backend (use BuildPolymerAmber).
Attributes:
| Name | Type | Description |
|---|---|---|
reaction_preset |
str
|
Name of reaction preset. |
add_hydrogens |
bool
|
Add explicit hydrogens during monomer preparation. |
optimize |
bool
|
Optimize monomer geometry. |
random_seed |
int | None
|
Random seed for reproducibility. |
PlanSystem
dataclass
¶
Bases: Tool
Plan a polydisperse polymer system from distribution parameters.
Returns chain specifications (DP, monomer sequence, mass) without creating any atoms. Use this to validate distribution parameters before committing to an expensive build.
Preferred for
- Previewing system composition before building.
- Iterating on distribution parameters cheaply.
Avoid when
- You want chains built directly (use BuildSystem or polymer_system).
Attributes:
| Name | Type | Description |
|---|---|---|
random_seed |
int | None
|
Random seed for reproducibility. |
run ¶
run(monomer_weights, monomer_mass, distribution_type, distribution_params, target_total_mass, end_group_mass=0.0, max_rel_error=0.02)
Plan a polydisperse polymer system.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
monomer_weights
|
dict[str, float]
|
Weight fractions for each monomer label. |
required |
monomer_mass
|
dict[str, float]
|
Molar mass (g/mol) per monomer label. |
required |
distribution_type
|
str
|
Distribution name (e.g. |
required |
distribution_params
|
dict[str, float]
|
Distribution parameters as |
required |
target_total_mass
|
float
|
Target total system mass (g/mol). |
required |
end_group_mass
|
float
|
Mass of end groups per chain (g/mol). |
0.0
|
max_rel_error
|
float
|
Maximum relative error for total mass. |
0.02
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dict with |
dict[str, Any]
|
and |
PrepareMonomer
dataclass
¶
Bases: Tool
Parse a BigSMILES monomer string and produce an Atomistic structure.
Pipeline: parse BigSMILES → convert to Atomistic with port markers → generate 3D coordinates via RDKit (if available) → compute angles/dihedrals.
Preferred for
- Preparing monomers for BuildPolymer or polymer().
- One-step SMILES-to-3D when you need port annotations.
Avoid when
- You already have an Atomistic struct (use RDKit adapter directly).
- You need custom 3D embedding parameters (use Generate3D).
Attributes:
| Name | Type | Description |
|---|---|---|
add_hydrogens |
bool
|
Add explicit hydrogens during 3D generation. |
optimize |
bool
|
Optimize geometry after 3D embedding. |
gen_topology |
bool
|
Compute angles and dihedrals. |
polymer ¶
polymer(spec, *, library=None, reaction_preset='dehydration', use_placer=True, add_hydrogens=True, optimize=True, random_seed=None, backend='default', amber_config=None)
Build a single polymer chain from a string specification.
Auto-detects notation type (for the default backend):
- G-BigSMILES (contains
|annotation):polymer("{[<]CCOCC[>]}|10|") - CGSmiles + inline fragments (contains
.{#):polymer("{[#EO]|10}.{#EO=[<]COC[>]}") - Pure CGSmiles (requires
librarykwarg):polymer("{[#EO]|10}", library={"EO": eo_monomer})
For the Amber backend:
polymer("{[#EO]|10}", library={"EO": eo}, backend="amber")
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
str
|
Polymer specification string. |
required |
library
|
Mapping[str, Atomistic] | None
|
Monomer library (required for pure CGSmiles and Amber). |
None
|
reaction_preset
|
str
|
Reaction preset name. |
'dehydration'
|
use_placer
|
bool
|
Enable geometric placement (default backend only). |
True
|
add_hydrogens
|
bool
|
Add hydrogens during 3D generation. |
True
|
optimize
|
bool
|
Optimize geometry. |
True
|
random_seed
|
int | None
|
Random seed for reproducibility. |
None
|
backend
|
Backend
|
Builder backend — |
'default'
|
amber_config
|
Any
|
Optional |
None
|
Returns:
| Type | Description |
|---|---|
Atomistic | Any
|
Atomistic (default backend) or AmberBuildResult (amber backend). |
polymer_system ¶
polymer_system(spec, *, reaction_preset='dehydration', add_hydrogens=True, optimize=True, random_seed=None)
Build a multi-chain polymer system from G-BigSMILES.
Example::
chains = polymer_system(
"{[<]CCOCC[>]}|schulz_zimm(1500,3000)||5e5|",
random_seed=42,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
spec
|
str
|
G-BigSMILES specification string. |
required |
reaction_preset
|
str
|
Reaction preset name. |
'dehydration'
|
add_hydrogens
|
bool
|
Add hydrogens during 3D generation. |
True
|
optimize
|
bool
|
Optimize geometry. |
True
|
random_seed
|
int | None
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
list[Atomistic]
|
List of Atomistic structures (one per chain). |
MSD¶
msd ¶
Mean Squared Displacement computation.
Operates on plain NDArrays — no trajectory coupling.
MSD
dataclass
¶
Bases: Compute
Compute mean squared displacement at each time lag.
MSD(dt) = <(r_i(t+dt) - r_i(t))^2>_{i, t}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_lag
|
int
|
Maximum time lag in frames. |
required |
Examples:
Self-diffusion::
cation_coords = unwrapped[:, cation_mask, :] # (n_frames, n_cations, 3)
msd = MSD(max_lag=3000)
msd_values = msd(cation_coords) # -> NDArray (max_lag,)
Polarization MSD (no dedicated class needed)::
polarization = (
coords[:, cat_mask, :].sum(axis=1)
- coords[:, an_mask, :].sum(axis=1)
) # (n_frames, 3)
pmsd_values = msd(polarization[:, None, :]) # -> NDArray (max_lag,)
run ¶
Compute MSD from positions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
positions
|
NDArray
|
Coordinate array with shape |
required |
Returns:
| Type | Description |
|---|---|
NDArray
|
MSD values at each time lag, shape |
Cross-Displacement Correlation¶
cross_correlation ¶
Cross-displacement correlation computation.
Operates on plain NDArrays — no trajectory coupling.
DisplacementCorrelation
dataclass
¶
Bases: Compute
Compute cross-displacement correlation between two groups.
For two groups A and B the correlation at time lag dt is:
C(dt) = <sum_i dr_i^A(dt) . sum_j dr_j^B(dt)> / N_A
where dr_i(dt) = r_i(t+dt) - r_i(t).
When exclude_self=True and both inputs are the same species,
the self-terms are subtracted so only distinct correlations remain:
C_distinct(dt) = <dr_i . (sum_j dr_j - dr_i)>_{i, t}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
max_lag
|
int
|
Maximum time lag in frames. |
required |
exclude_self
|
bool
|
If True, subtract self-correlation (for same-species distinct diffusion). |
False
|
Examples:
Cross-species (cation-anion)::
xdc = DisplacementCorrelation(max_lag=3000)
corr = xdc(cation_coords, anion_coords) # -> NDArray (max_lag,)
Same-species distinct (exclude self-correlation)::
xdc = DisplacementCorrelation(max_lag=3000, exclude_self=True)
corr = xdc(cation_coords, cation_coords) # -> NDArray (max_lag,)
run ¶
Compute displacement correlation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
positions_a
|
NDArray
|
Coordinates of group A, shape |
required |
positions_b
|
NDArray
|
Coordinates of group B, shape |
required |
Returns:
| Type | Description |
|---|---|
NDArray
|
Correlation values at each time lag, shape |
displacement_correlation ¶
Compute displacement correlation.
Shorthand for
DisplacementCorrelation(max_lag=max_lag, exclude_self=exclude_self)(positions_a, positions_b).
Time Series¶
time_series ¶
Time-series analysis operations for trajectory data.
This module provides utilities for computing time-correlation functions, mean squared displacements, and other time-series statistics commonly used in molecular dynamics trajectory analysis.
Adapted from the tame library (https://github.com/Roy-Kid/tame).
TimeAverage ¶
Compute running time average with NaN handling.
This class accumulates data over time and computes the average, with options for handling NaN values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shape
|
tuple[int, ...]
|
Shape of data arrays to average |
required |
dtype
|
dtype | type
|
Data type for accumulated arrays |
float64
|
dropnan
|
Literal['none', 'partial', 'all']
|
How to handle NaN values: - 'none': Include NaN values in average (result may be NaN) - 'partial': Ignore individual NaN entries - 'all': Skip entire frame if any NaN is present |
'partial'
|
Examples:
>>> avg = TimeAverage(shape=(10,), dropnan='partial')
>>> avg.update(np.array([1.0, 2.0, np.nan, 4.0]))
>>> avg.update(np.array([2.0, 3.0, 3.0, 5.0]))
>>> result = avg.get() # [1.5, 2.5, 3.0, 4.5]
TimeCache ¶
Cache previous N frames of trajectory data for correlation calculations.
Uses an in-place ring buffer for O(1) per update (no array allocation).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_size
|
int
|
Number of frames to cache (maximum time lag) |
required |
shape
|
tuple[int, ...]
|
Shape of data arrays to cache (e.g., (n_atoms, 3) for coordinates) |
required |
dtype
|
dtype | type
|
Data type for cached arrays |
float64
|
default_val
|
float
|
Default value to fill cache initially (default: NaN) |
nan
|
Examples:
>>> cache = TimeCache(cache_size=100, shape=(10, 3))
>>> coords = np.random.randn(10, 3)
>>> cache.update(coords)
>>> cached_data = cache.get() # Shape: (100, 10, 3)
get ¶
Get cached data array, ordered newest-first.
Returns:
| Type | Description |
|---|---|
NDArray
|
Cached data with shape (cache_size, *data_shape) |
update ¶
Add new frame to cache (O(1) in-place write).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
new_data
|
NDArray
|
New data array to add (shape must match self.shape) |
required |
compute_acf ¶
Compute autocorrelation function over trajectory.
Calculates:
The particle dimension is averaged, and the time dimension is accumulated using a rolling cache to compute correlations at different time lags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
NDArray
|
Trajectory data with shape (n_frames, n_particles, n_dim) |
required |
cache_size
|
int
|
Maximum time lag (dt) to compute, in frames |
required |
dropnan
|
Literal['none', 'partial', 'all']
|
How to handle NaN values in averaging |
'partial'
|
Returns:
| Type | Description |
|---|---|
NDArray
|
ACF array with shape (cache_size,) containing ACF at each time lag |
Examples:
compute_msd ¶
Compute mean squared displacement over trajectory.
Calculates: <(r_i(t+dt) - r_i(t))^2>_{i,t}
The particle dimension is averaged, and the time dimension is accumulated using a rolling cache to compute correlations at different time lags.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
NDArray
|
Trajectory data with shape (n_frames, n_particles, n_dim) |
required |
cache_size
|
int
|
Maximum time lag (dt) to compute, in frames |
required |
dropnan
|
Literal['none', 'partial', 'all']
|
How to handle NaN values in averaging |
'partial'
|
Returns:
| Type | Description |
|---|---|
NDArray
|
MSD array with shape (cache_size,) containing MSD at each time lag |
Examples:
RDKit Tools¶
rdkit ¶
RDKit-based molecular operations for RDKitAdapter.
This module provides frozen-dataclass tools that operate on RDKitAdapter
instances. Generate3D inherits Tool and is auto-registered in
ToolRegistry. OptimizeGeometry is an internal helper (not a Tool).
Generate3D
dataclass
¶
Generate3D(add_hydrogens=True, sanitize=True, embed=True, optimize=True, max_embed_attempts=10, embed_random_seed=0, max_opt_iters=200, forcefield='UFF', update_internal=True)
Bases: Tool
RDKit-based 3D generation pipeline for RDKitAdapter.
Pipeline stages (each optional): 1. Add explicit hydrogens 2. Sanitize molecule 3. Generate 3D coordinates via embedding 4. Optimize geometry with force field
Attributes:
| Name | Type | Description |
|---|---|---|
add_hydrogens |
bool
|
Whether to add explicit hydrogens before embedding |
sanitize |
bool
|
Whether to sanitize the molecule |
embed |
bool
|
Whether to perform 3D coordinate embedding |
optimize |
bool
|
Whether to optimize geometry after embedding |
max_embed_attempts |
int
|
Maximum number of embedding attempts |
embed_random_seed |
int | None
|
Random seed for embedding (None for random) |
max_opt_iters |
int
|
Maximum optimization iterations |
forcefield |
str
|
Force field to use ("UFF" or "MMFF94") |
update_internal |
bool
|
Whether to sync internal structure after modifications |
Examples:
OptimizeGeometry
dataclass
¶
RDKit-based geometry optimization for RDKitAdapter.
Attributes:
| Name | Type | Description |
|---|---|---|
max_opt_iters |
int
|
Maximum optimization iterations |
forcefield |
str
|
Force field to use ("UFF" or "MMFF94") |
update_internal |
bool
|
Whether to sync internal structure after optimization |
raise_on_failure |
bool
|
Whether to raise exception on optimization failure |
Examples:
>>> optimizer = OptimizeGeometry(forcefield="UFF", max_opt_iters=200)
>>> result_adapter = optimizer(adapter)