Coding Style¶
MolPy follows a "clean, explicit, type-driven" coding style designed to make the library easy to read, test, and extend.
This guide provides detailed coding standards and best practices for contributors.
Core Principles¶
1. Explicit Over Implicit¶
Prefer clear, explicit code over clever tricks:
Good:
def calculate_distance(pos1: np.ndarray, pos2: np.ndarray) -> float:
"""Calculate Euclidean distance between two positions."""
diff = pos1 - pos2
return np.sqrt(np.sum(diff ** 2))
Bad:
def calc_dist(p1, p2):
return np.linalg.norm(p1 - p2) # Unclear what p1, p2 are
2. Predictable Behavior¶
Avoid hidden state and side effects:
Good:
def merge_frames(frame1: Frame, frame2: Frame) -> Frame:
"""Merge two frames into a new frame."""
merged = Frame()
# ... merge logic
return merged
Bad:
def merge_frames(frame1: Frame, frame2: Frame) -> None:
"""Merge frame2 into frame1 (modifies frame1)."""
# Modifying input arguments is surprising
frame1.blocks.update(frame2.blocks)
3. Type Safety¶
Use type hints everywhere:
Good:
from typing import Sequence
def process_atoms(
positions: np.ndarray,
masses: Sequence[float],
box: Box | None = None
) -> dict[str, np.ndarray]:
"""Process atom data."""
...
Bad:
def process_atoms(positions, masses, box=None):
"""Process atom data."""
...
Python Style Guidelines¶
PEP 8 Compliance¶
Follow PEP 8 with these specifics:
- Line length: 88 characters (Black default)
- Indentation: 4 spaces (never tabs)
- Blank lines: 2 between top-level definitions, 1 between methods
- Imports: Grouped and sorted (see Imports)
Naming Conventions¶
Variables and Functions¶
Use snake_case:
atom_count = 100
bond_length = 1.5
def calculate_center_of_mass(positions: np.ndarray, masses: np.ndarray) -> np.ndarray:
...
Classes¶
Use PascalCase:
class Frame:
...
class AtomicForceField:
...
class LammpsDataReader:
...
Constants¶
Use UPPER_CASE:
DEFAULT_CUTOFF = 10.0
MAX_ITERATIONS = 1000
AVOGADRO_NUMBER = 6.022e23
Private Members¶
Prefix with single underscore:
class Frame:
def __init__(self):
self._blocks: dict[str, Block] = {}
def _validate_block(self, block: Block) -> None:
"""Private validation method."""
...
Imports¶
Organize imports in three groups, sorted alphabetically:
# 1. Standard library
import os
import sys
from pathlib import Path
from typing import Any, Sequence
# 2. Third-party packages
import numpy as np
from igraph import Graph
# 3. MolPy modules
from molpy.core import Block, Frame
from molpy.io import read_pdb, write_lammps_data
Rules:
- No wildcard imports (from x import *)
- Use absolute imports, not relative
- Import modules, not individual items (exceptions for common items)
- Use as for common abbreviations: numpy as np, pandas as pd
Type Hints¶
Use modern Python 3.12+ type hints:
Good:
from typing import Literal
def read_file(
path: str | Path,
format: Literal["pdb", "xyz", "lammps"] = "pdb",
options: dict[str, Any] | None = None
) -> Frame:
"""Read a molecular structure file."""
...
Bad:
from typing import Dict, List, Optional, Union
def read_file(
path: Union[str, Path],
format: str = "pdb",
options: Optional[Dict[str, Any]] = None
) -> Frame:
...
Prefer:
- list[T] over List[T]
- dict[K, V] over Dict[K, V]
- X | Y over Union[X, Y]
- X | None over Optional[X]
- tuple[int, ...] for variable-length tuples
Avoid Any unless absolutely necessary. If you must use it, add a comment explaining why.
Code Organization¶
File Structure¶
One class per file (with exceptions for small helper classes):
molpy/
├── core/
│ ├── __init__.py
│ ├── frame.py # Frame class
│ ├── block.py # Block class
│ ├── atomistic.py # Atomistic classes
│ └── box.py # Box class
Module Structure¶
Organize modules consistently:
"""Module docstring describing the module's purpose.
This module provides...
"""
# Imports
import os
from typing import Any
import numpy as np
from molpy.core import Frame
# Constants
DEFAULT_TOLERANCE = 1e-6
# Private helpers
def _validate_input(data: np.ndarray) -> None:
"""Private helper function."""
...
# Public classes
class MyClass:
"""Public class."""
...
# Public functions
def public_function() -> None:
"""Public function."""
...
Function Length¶
Keep functions focused and concise:
- Ideal: < 20 lines
- Acceptable: < 50 lines
- Refactor if: > 50 lines
If a function is too long, break it into smaller helper functions.
Docstrings¶
Use Google-style docstrings for all public APIs:
Function Docstrings¶
def calculate_rdf(
frame: Frame,
r_max: float = 10.0,
n_bins: int = 100,
atom_types: tuple[str, str] | None = None
) -> tuple[np.ndarray, np.ndarray]:
"""Calculate radial distribution function.
Computes the RDF g(r) for atom pairs in the frame up to a maximum
distance r_max.
Args:
frame: Input frame containing atomic positions
r_max: Maximum distance for RDF calculation in Angstroms
n_bins: Number of bins for the histogram
atom_types: Pair of atom types to consider. If None, use all atoms.
Returns:
Tuple of (r, g_r) where:
- r: Array of distances (bin centers)
- g_r: RDF values at each distance
Raises:
ValueError: If r_max is negative or n_bins < 1
KeyError: If specified atom_types not found in frame
Examples:
>>> frame = read_pdb("water.pdb")
>>> r, g_r = calculate_rdf(frame, r_max=5.0)
>>> plt.plot(r, g_r)
Notes:
The RDF is normalized by the ideal gas density.
"""
if r_max <= 0:
raise ValueError("r_max must be positive")
if n_bins < 1:
raise ValueError("n_bins must be at least 1")
# Implementation...
...
Class Docstrings¶
class Frame:
"""Container for molecular structure data.
A Frame represents a molecular system at a single point in time,
containing atomic positions, topology, and metadata organized into
named blocks.
Attributes:
box: Simulation box defining periodic boundaries
metadata: Dictionary of arbitrary metadata
Examples:
>>> frame = Frame()
>>> frame["atoms"] = Block({"x": [0, 1, 2], "y": [0, 0, 0]})
>>> print(frame.blocks())
['atoms']
Notes:
Frames are designed to be immutable containers. Modifications
create new Frame instances rather than modifying in place.
"""
def __init__(self, box: Box | None = None):
"""Initialize a new Frame.
Args:
box: Optional simulation box. If None, creates infinite box.
"""
...
Property Docstrings¶
@property
def n_atoms(self) -> int:
"""Number of atoms in the frame.
Returns:
Total atom count across all blocks.
"""
return sum(len(block) for block in self._blocks.values())
Error Handling¶
Validate Early¶
Fail fast with clear error messages:
def set_positions(self, positions: np.ndarray) -> None:
"""Set atomic positions.
Args:
positions: Array of shape (n_atoms, 3)
Raises:
ValueError: If positions shape is invalid
TypeError: If positions is not a numpy array
"""
if not isinstance(positions, np.ndarray):
raise TypeError(f"positions must be ndarray, got {type(positions)}")
if positions.ndim != 2 or positions.shape[1] != 3:
raise ValueError(
f"positions must have shape (n, 3), got {positions.shape}"
)
self._positions = positions
Use Appropriate Exceptions¶
ValueError- Invalid valueTypeError- Wrong typeKeyError- Missing keyFileNotFoundError- File not foundRuntimeError- Runtime error
Don't Silently Swallow Exceptions¶
Bad:
try:
result = risky_operation()
except Exception:
pass # Silent failure!
Good:
try:
result = risky_operation()
except SpecificError as e:
logger.warning(f"Operation failed: {e}")
result = default_value
Logging¶
Use Python's logging module, not print statements:
import logging
logger = logging.getLogger(__name__)
def process_data(data: np.ndarray) -> np.ndarray:
"""Process data with logging."""
logger.debug(f"Processing data with shape {data.shape}")
if data.size == 0:
logger.warning("Empty data array provided")
return data
# Process...
result = ...
logger.info(f"Processed {len(result)} items")
return result
Core Data Structures¶
Keep core types (Frame, Block, Box, Atomistic) clean:
Value-like Containers¶
Core types should behave like immutable values:
# Good: Returns new Frame
def merge(self, other: Frame) -> Frame:
"""Merge with another frame, returning new Frame."""
merged = Frame(box=self.box)
# ... merge logic
return merged
# Bad: Modifies in place
def merge(self, other: Frame) -> None:
"""Merge other frame into this one."""
self._blocks.update(other._blocks)
Avoid Engine-Specific Logic¶
Don't put LAMMPS, GROMACS, or format-specific code in core classes:
Bad:
class Frame:
def to_lammps_data(self, filename: str) -> None:
"""Write to LAMMPS data file."""
# LAMMPS-specific logic in core class!
...
Good:
# In molpy/io/data/lammps.py
def write_lammps_data(filename: str, frame: Frame) -> None:
"""Write frame to LAMMPS data file."""
# LAMMPS-specific logic in IO module
...
Testing Code Style¶
Tests should also follow style guidelines:
import pytest
import numpy as np
from molpy.core import Frame, Block
class TestFrame:
"""Tests for Frame class."""
def test_creation_empty(self):
"""Test creating an empty frame."""
frame = Frame()
assert len(frame.blocks()) == 0
def test_add_block(self):
"""Test adding a block to frame."""
frame = Frame()
block = Block({"x": [1, 2, 3]})
frame["atoms"] = block
assert "atoms" in frame.blocks()
assert len(frame["atoms"]) == 3
def test_invalid_block_raises_type_error(self):
"""Test that invalid block type raises TypeError."""
frame = Frame()
with pytest.raises(TypeError, match="must be Block"):
frame["atoms"] = [1, 2, 3] # List, not Block
Tools¶
Black¶
Automatic code formatting:
# Format all files
black .
# Check without modifying
black --check .
# Format specific file
black molpy/core/frame.py
Configuration in pyproject.toml:
[tool.black]
line-length = 88
target-version = ['py312']
isort¶
Import sorting:
# Sort imports
isort .
# Check without modifying
isort --check .
Configuration in pyproject.toml:
[tool.isort]
profile = "black"
line_length = 88
Pre-commit¶
Automatically run checks before commits:
# Install hooks
pre-commit install
# Run manually
pre-commit run --all-files
Summary Checklist¶
Before submitting code, ensure:
- Follows PEP 8 style
- Uses type hints everywhere
- Has Google-style docstrings
- Imports are organized and sorted
- Functions are focused and concise
- Error messages are clear and helpful
- No print statements (use logging)
- Black formatting applied
- Tests follow same style guidelines