Compute & Result Pattern¶
This notebook explains MolPy's Compute & Result pattern for defining reusable, composable computational operations.
The pattern separates:
- Compute: The operation/algorithm (how to calculate)
- Result: The output data structure (what you get)
This separation provides:
- Type safety with generic types
- Reusable computation logic
- Structured, self-documenting results
- Easy testing and composition
When to Use This Pattern¶
| Use Case | Use Compute | Use Function | Use Method |
|---|---|---|---|
| Simple one-off calculation | ❌ | ✅ | ✅ |
| Reusable with configuration | ✅ | ⚠️ | ❌ |
| Complex multi-step algorithm | ✅ | ⚠️ | ❌ |
| Needs setup/cleanup | ✅ | ❌ | ❌ |
| Structured output | ✅ | ⚠️ | ⚠️ |
| Composable operations | ✅ | ⚠️ | ❌ |
| External dependencies | ✅ | ✅ | ❌ |
Key principle: Use Compute when you need configuration + reusability + structure. Use functions for simple calculations, methods for core data class operations.
Quick Start¶
Here's a minimal example showing the pattern in action:
from dataclasses import dataclass
import numpy as np
from molpy.compute import Compute, Result
from molpy.core import Frame, Block
# 1. Define the result
@dataclass
class CenterOfMassResult(Result):
"""Result of center of mass calculation."""
com: np.ndarray # Center of mass coordinates
total_mass: float # Total mass
# 2. Define the compute operation
class ComputeCenterOfMass(Compute[Frame, CenterOfMassResult]):
"""Compute center of mass for a frame."""
def compute(self, input: Frame) -> CenterOfMassResult:
"""Calculate center of mass."""
atoms = input["atoms"]
positions = np.column_stack([atoms["x"], atoms["y"], atoms["z"]])
masses = np.array(atoms.get("mass", np.ones(len(atoms))))
total_mass = masses.sum()
com = (positions * masses[:, np.newaxis]).sum(axis=0) / total_mass
return CenterOfMassResult(com=com, total_mass=total_mass)
# 3. Use it
frame = Frame()
frame["atoms"] = Block({
"x": [0.0, 1.0, 2.0],
"y": [0.0, 0.0, 0.0],
"z": [0.0, 0.0, 0.0],
"mass": [1.0, 1.0, 1.0]
})
compute_com = ComputeCenterOfMass()
result = compute_com(frame)
print(f"Center of mass: {result.com}")
print(f"Total mass: {result.total_mass}")
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[1], line 7 4 from molpy.core import Frame, Block 6 # 1. Define the result ----> 7 @dataclass 8 class CenterOfMassResult(Result): 9 """Result of center of mass calculation.""" 10 com: np.ndarray # Center of mass coordinates File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1305, in dataclass(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, weakref_slot) 1302 return wrap 1304 # We're called as @dataclass without parens. -> 1305 return wrap(cls) File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1295, in dataclass.<locals>.wrap(cls) 1294 def wrap(cls): -> 1295 return _process_class(cls, init, repr, eq, order, unsafe_hash, 1296 frozen, match_args, kw_only, slots, 1297 weakref_slot) File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1078, in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, weakref_slot) 1074 if init: 1075 # Does this class have a post-init function? 1076 has_post_init = hasattr(cls, _POST_INIT_NAME) -> 1078 _init_fn(all_init_fields, 1079 std_init_fields, 1080 kw_only_init_fields, 1081 frozen, 1082 has_post_init, 1083 # The name to use for the "self" 1084 # param in __init__. Use "self" 1085 # if possible. 1086 '__dataclass_self__' if 'self' in fields 1087 else 'self', 1088 func_builder, 1089 slots, 1090 ) 1092 _set_new_attribute(cls, '__replace__', _replace) 1094 # Get the fields as a list, and include only real fields. This is 1095 # used in all of the following methods. File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:627, in _init_fn(fields, std_fields, kw_only_fields, frozen, has_post_init, self_name, func_builder, slots) 625 seen_default = f 626 elif seen_default: --> 627 raise TypeError(f'non-default argument {f.name!r} ' 628 f'follows default argument {seen_default.name!r}') 630 locals = {**{f'__dataclass_type_{f.name}__': f.type for f in fields}, 631 **{'__dataclass_HAS_DEFAULT_FACTORY__': _HAS_DEFAULT_FACTORY, 632 '__dataclass_builtins_object__': object, 633 } 634 } 636 body_lines = [] TypeError: non-default argument 'com' follows default argument 'meta'
The Pattern Architecture¶
Compute Base Class¶
All compute operations inherit from Compute[InT, OutT]:
class Compute[InT, OutT](ABC):
"""Abstract base class for compute operations."""
def __call__(self, input: InT) -> OutT:
"""Execute the computation."""
self.before(input)
result = self.compute(input)
self.after(input, result)
return result
@abstractmethod
def compute(self, input: InT) -> OutT:
"""Core computation logic (must override)."""
...
def before(self, input: InT) -> None:
"""Optional setup hook."""
pass
def after(self, input: InT, result: OutT) -> None:
"""Optional cleanup hook."""
pass
Result Base Class¶
Results are dataclasses that hold computation outputs:
@dataclass
class Result:
"""Base class for computation results."""
meta: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
"""Convert result to dictionary."""
return {k: v for k, v in self.__dict__.items()}
Lifecycle Hooks¶
The before() and after() hooks enable setup and cleanup. This is useful for:
- Input validation
- Resource allocation
- Caching
- Logging
- Cleanup
from dataclasses import dataclass
import numpy as np
from molpy.compute import Compute, Result
from molpy.core import Frame, Block
@dataclass
class DistanceMatrixResult(Result):
"""Result of distance matrix calculation."""
distances: np.ndarray # Pairwise distance matrix
n_atoms: int # Number of atoms
class ComputeDistanceMatrix(Compute[Frame, DistanceMatrixResult]):
"""Compute pairwise distance matrix with validation and caching."""
def before(self, input: Frame) -> None:
"""Validate input and allocate cache."""
if "atoms" not in input.blocks():
raise ValueError("Frame must have 'atoms' block")
atoms = input["atoms"]
if not all(k in atoms for k in ["x", "y", "z"]):
raise ValueError("Atoms must have x, y, z coordinates")
# Allocate cache for intermediate results
n = len(atoms)
self._cache = np.zeros((n, n))
print(f"Allocated cache for {n} atoms")
def compute(self, input: Frame) -> DistanceMatrixResult:
"""Calculate pairwise distances."""
atoms = input["atoms"]
positions = np.column_stack([atoms["x"], atoms["y"], atoms["z"]])
# Compute pairwise distances
n = len(positions)
for i in range(n):
for j in range(i + 1, n):
dist = np.linalg.norm(positions[i] - positions[j])
self._cache[i, j] = dist
self._cache[j, i] = dist
return DistanceMatrixResult(
distances=self._cache.copy(),
n_atoms=n
)
def after(self, input: Frame, result: DistanceMatrixResult) -> None:
"""Cleanup and logging."""
del self._cache
print(f"Computed distances for {result.n_atoms} atoms")
# Example usage
frame = Frame()
frame["atoms"] = Block({
"x": [0.0, 1.0, 0.0],
"y": [0.0, 0.0, 1.0],
"z": [0.0, 0.0, 0.0]
})
compute_dist = ComputeDistanceMatrix()
result = compute_dist(frame)
print(f"Distance matrix:\n{result.distances}")
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[2], line 6 3 from molpy.compute import Compute, Result 4 from molpy.core import Frame, Block ----> 6 @dataclass 7 class DistanceMatrixResult(Result): 8 """Result of distance matrix calculation.""" 9 distances: np.ndarray # Pairwise distance matrix File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1305, in dataclass(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, weakref_slot) 1302 return wrap 1304 # We're called as @dataclass without parens. -> 1305 return wrap(cls) File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1295, in dataclass.<locals>.wrap(cls) 1294 def wrap(cls): -> 1295 return _process_class(cls, init, repr, eq, order, unsafe_hash, 1296 frozen, match_args, kw_only, slots, 1297 weakref_slot) File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:1078, in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen, match_args, kw_only, slots, weakref_slot) 1074 if init: 1075 # Does this class have a post-init function? 1076 has_post_init = hasattr(cls, _POST_INIT_NAME) -> 1078 _init_fn(all_init_fields, 1079 std_init_fields, 1080 kw_only_init_fields, 1081 frozen, 1082 has_post_init, 1083 # The name to use for the "self" 1084 # param in __init__. Use "self" 1085 # if possible. 1086 '__dataclass_self__' if 'self' in fields 1087 else 'self', 1088 func_builder, 1089 slots, 1090 ) 1092 _set_new_attribute(cls, '__replace__', _replace) 1094 # Get the fields as a list, and include only real fields. This is 1095 # used in all of the following methods. File ~/.asdf/installs/python/3.13.3/lib/python3.13/dataclasses.py:627, in _init_fn(fields, std_fields, kw_only_fields, frozen, has_post_init, self_name, func_builder, slots) 625 seen_default = f 626 elif seen_default: --> 627 raise TypeError(f'non-default argument {f.name!r} ' 628 f'follows default argument {seen_default.name!r}') 630 locals = {**{f'__dataclass_type_{f.name}__': f.type for f in fields}, 631 **{'__dataclass_HAS_DEFAULT_FACTORY__': _HAS_DEFAULT_FACTORY, 632 '__dataclass_builtins_object__': object, 633 } 634 } 636 body_lines = [] TypeError: non-default argument 'distances' follows default argument 'meta'
Real-World Example: Trajectory Analysis¶
MolPy includes MCDCompute for Mean Displacement Correlation (diffusion analysis). This demonstrates:
- Complex computation with multiple parameters
- Structured result with time series data
- Processing trajectory data
MCDCompute Overview¶
class MCDCompute(Compute[Trajectory, MCDResult]):
"""Compute Mean Displacement Correlations (MSD) for diffusion analysis.
Supports:
- Self diffusion: MSD_i = <(r_i(t+dt) - r_i(t))²>
- Distinct diffusion: <(r_i(t+dt) - r_i(t)) · (r_j(t+dt) - r_j(t))>
Args:
tags: List of atom type specifications
- Single integer (e.g., "3"): Self-diffusion MSD of type 3
- Two integers (e.g., "3,4"): Distinct diffusion between types
max_dt: Maximum time lag in ps
dt: Timestep in ps
center_of_mass: Optional COM removal
"""
def compute(self, input: Trajectory) -> MCDResult:
# Extract coordinates and unwrap periodic boundaries
# Apply center of mass correction if requested
# Compute correlations for each tag
return MCDResult(time=time_array, correlations=correlations)
Usage Pattern¶
from molpy.io import read_h5_trajectory
from molpy.compute import MCDCompute
# Load trajectory
trajectory = read_h5_trajectory("trajectory.h5")
# Compute self-diffusion MSD of atom type 3
mcd = MCDCompute(tags=["3"], max_dt=30.0, dt=0.01)
result = mcd(trajectory)
# Access results
print(result.time) # Time lag values
print(result.correlations["3"]) # MSD values at each time lag
# Compute distinct diffusion between types 3 and 4
mcd_distinct = MCDCompute(tags=["3,4"], max_dt=30.0, dt=0.01)
result_distinct = mcd_distinct(trajectory)
print(result_distinct.correlations["3,4"]) # Correlation values
Composability¶
Compute operations can be chained and composed. This is particularly useful with RDKit integration:
# Example structure (requires RDKit)
# from molpy.compute import Generate3D, OptimizeGeometry
# from molpy.adapter import RDKitAdapter
# from molpy.core.atomistic import Atomistic
# # Create molecule
# mol = Atomistic()
# # ... define atoms and bonds ...
# # Create adapter
# adapter = RDKitAdapter(internal=mol)
# # Chain operations
# generate_3d = Generate3D(add_hydrogens=True, optimize=True)
# optimize = OptimizeGeometry(max_opt_iters=500)
# # Apply sequentially
# adapter = generate_3d(adapter) # Generate 3D coordinates
# adapter = optimize(adapter) # Further optimize geometry
# # Extract result
# optimized_mol = adapter.internal
print("Composition example (requires RDKit to run)")
print("Shows how Compute operations can be chained:")
print(" adapter -> Generate3D -> OptimizeGeometry -> result")
Composition example (requires RDKit to run) Shows how Compute operations can be chained: adapter -> Generate3D -> OptimizeGeometry -> result
Benefits of the Pattern¶
1. Type Safety¶
Generic types ensure correct input/output:
# Type checker knows:
compute: Compute[Frame, CenterOfMassResult]
result: CenterOfMassResult = compute(frame) # ✓ Type safe
result: str = compute(frame) # ✗ Type error
2. Structured Results¶
Results are self-documenting:
@dataclass
class MCDResult(TimeSeriesResult):
"""Results from MCD calculation."""
time: np.ndarray # Time lag values
correlations: dict[str, np.ndarray] # MSD for each tag
3. Reusability¶
Compute operations are configurable and reusable:
# Different configurations
mcd_short = MCDCompute(tags=["1"], max_dt=5.0, dt=0.1)
mcd_long = MCDCompute(tags=["1", "2"], max_dt=20.0, dt=0.2)
# Apply to different trajectories
result1 = mcd_short(trajectory1)
result2 = mcd_short(trajectory2)
4. Testability¶
Easy to test in isolation:
def test_center_of_mass():
# Create test frame
frame = create_test_frame()
# Compute
compute = ComputeCenterOfMass()
result = compute(frame)
# Assert
assert np.allclose(result.com, expected_com)
assert result.total_mass == expected_mass
Comparison with Other Patterns¶
vs. Simple Functions¶
Function:
def calculate_rdf(frame: Frame, r_max: float = 10.0) -> tuple[np.ndarray, np.ndarray]:
# Calculate RDF
return r, g_r
Compute:
class ComputeRDF(Compute[Frame, RDFResult]):
def __init__(self, r_max: float = 10.0, n_bins: int = 100):
self.r_max = r_max
self.n_bins = n_bins
def compute(self, frame: Frame) -> RDFResult:
# Calculate RDF
return RDFResult(r=r, g_r=g_r, r_max=self.r_max)
When to use Compute:
- Multiple configuration parameters
- Need to reuse with different configs
- Complex setup/cleanup
- Want structured results
vs. Methods on Data Classes¶
Method:
class Frame:
def center_of_mass(self) -> np.ndarray:
# Calculate COM
return com
Compute:
class ComputeCenterOfMass(Compute[Frame, CenterOfMassResult]):
def compute(self, frame: Frame) -> CenterOfMassResult:
# Calculate COM
return CenterOfMassResult(com=com, total_mass=mass)
When to use Compute:
- Operation is complex or configurable
- Want to keep data classes lightweight
- Operation is optional (requires external deps)
- Want to test computation separately
Design Guidelines¶
When to Use Compute¶
Use Compute for:
- ✅ Reusable calculations (RDF, MSD, COM)
- ✅ Operations with configuration (parameters, options)
- ✅ Complex multi-step algorithms
- ✅ Operations that need setup/cleanup
Don't use Compute for:
- ❌ Simple one-off calculations (use functions)
- ❌ Data transformations (use methods on data classes)
- ❌ IO operations (use readers/writers)
When to Use Result¶
Use Result for:
- ✅ Structured computation outputs
- ✅ Multiple related values
- ✅ Results that need metadata
- ✅ Results that may be serialized
Don't use Result for:
- ❌ Simple scalar returns (use primitives)
- ❌ Temporary intermediate values
Naming Conventions¶
- Compute classes:
Compute<Operation>(e.g.,ComputeCenterOfMass,MCDCompute) - Result classes:
<Operation>Result(e.g.,CenterOfMassResult,MCDResult) - Instances: Descriptive names (e.g.,
compute_com,mcd,optimizer)
Real-World Examples in MolPy¶
Trajectory Analysis¶
# MCDCompute: Mean displacement correlation
class MCDCompute(Compute[Trajectory, MCDResult]):
...
# PMSDCompute: Polarization MSD
class PMSDCompute(Compute[Trajectory, PMSDResult]):
...
RDKit Integration¶
# Generate3D: 3D coordinate generation
class Generate3D(Compute[RDKitAdapter, RDKitAdapter]):
...
# OptimizeGeometry: Force field optimization
class OptimizeGeometry(Compute[RDKitAdapter, RDKitAdapter]):
...
Testing Compute Operations¶
Best practices for testing:
# Example test structure
def test_compute_center_of_mass():
"""Test center of mass computation."""
# Arrange: Create test data
frame = Frame()
frame["atoms"] = Block({
"x": [0.0, 2.0],
"y": [0.0, 0.0],
"z": [0.0, 0.0],
"mass": [1.0, 1.0]
})
# Act: Compute
compute = ComputeCenterOfMass()
result = compute(frame)
# Assert: Verify results
expected_com = np.array([1.0, 0.0, 0.0])
assert np.allclose(result.com, expected_com)
assert result.total_mass == 2.0
print("✓ Test passed")
# Run test
test_compute_center_of_mass()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[4], line 25 22 print("✓ Test passed") 24 # Run test ---> 25 test_compute_center_of_mass() Cell In[4], line 14, in test_compute_center_of_mass() 6 frame["atoms"] = Block({ 7 "x": [0.0, 2.0], 8 "y": [0.0, 0.0], 9 "z": [0.0, 0.0], 10 "mass": [1.0, 1.0] 11 }) 13 # Act: Compute ---> 14 compute = ComputeCenterOfMass() 15 result = compute(frame) 17 # Assert: Verify results NameError: name 'ComputeCenterOfMass' is not defined
Performance Considerations¶
Caching¶
Use before() to allocate caches:
def before(self, input: Frame) -> None:
n = len(input["atoms"])
self._cache = np.zeros((n, n))
Vectorization¶
Prefer NumPy operations over loops:
# Good: Vectorized
com = (positions * masses[:, np.newaxis]).sum(axis=0) / total_mass
# Bad: Loop
com = sum(pos * mass for pos, mass in zip(positions, masses)) / total_mass
Memory Management¶
Clean up in after():
def after(self, input: Frame, result: Result) -> None:
del self._cache
del self._temporary_arrays
Troubleshooting¶
Common Issues¶
Issue: Type errors with generic types
# Wrong: Missing type parameters
class MyCompute(Compute):
...
# Correct: Specify input and output types
class MyCompute(Compute[Frame, MyResult]):
...
Issue: Forgetting to override compute()
# Wrong: No compute method
class MyCompute(Compute[Frame, Result]):
pass
# Correct: Override compute
class MyCompute(Compute[Frame, Result]):
def compute(self, input: Frame) -> Result:
return Result()
Issue: Modifying input in compute()
# Wrong: Modifies input
def compute(self, input: Frame) -> Result:
input["atoms"]["x"] += 1.0 # Side effect!
return Result()
# Correct: Don't modify input
def compute(self, input: Frame) -> Result:
x = input["atoms"]["x"].copy()
x += 1.0
return Result()
Summary¶
The Compute & Result pattern provides:
- Type Safety - Generic types ensure correctness
- Reusability - Configurable, composable operations
- Structure - Self-documenting results
- Testability - Easy to test in isolation
- Extensibility - Simple to add new operations
Use this pattern for complex, reusable computations that benefit from configuration and structured outputs.
For simple calculations, plain functions are often sufficient.