I/O: Reading and Writing File Formats¶
Readers turn files into Frames; writers turn Frames back into files — one pattern across every format. To add support for a new format, see Developer Guide: I/O Formats.
MolPy's I/O system has three layers: data (single-frame structures), trajectory (multi-frame sequences), and forcefield (parameter files). Each layer has its own reader/writer base class.
Data files: Frame in, Frame out¶
Data readers consume a file and return a Frame. Data writers take a Frame and produce a file. The Frame is the universal exchange object — it carries atom tables, bond tables, metadata, and optionally a box.
Reading¶
Every format has a factory function at mp.io.read_*:
import molpy as mp
frame = mp.io.read_pdb("molecule.pdb")
frame = mp.io.read_gro("system.gro")
frame = mp.io.read_mol2("ligand.mol2")
frame = mp.io.read_lammps_data("system.data", atom_style="full")
frame = mp.io.read_xyz("coords.xyz")
frame = mp.io.read_h5("snapshot.h5")
All readers follow the same pattern: pass a file path, get a Frame back. An optional frame argument lets you populate an existing frame instead of creating a new one.
# Populate an existing frame (useful for merging metadata)
existing = mp.Frame(timestep=42)
frame = mp.io.read_pdb("molecule.pdb", frame=existing)
print(frame.metadata["timestep"]) # 42
The Frame returned by a data reader typically contains an "atoms" block with coordinates, and may contain "bonds", "angles", and "dihedrals" blocks depending on the format.
frame = mp.io.read_pdb("water.pdb")
print(frame["atoms"].nrows) # number of atoms
print(list(frame["atoms"].keys())) # ['id', 'element', 'x', 'y', 'z', ...]
Writing¶
Writers follow the same factory pattern:
mp.io.write_pdb("output.pdb", frame)
mp.io.write_gro("output.gro", frame)
mp.io.write_lammps_data("output.data", frame, atom_style="full")
mp.io.write_h5("output.h5", frame)
mp.io.write_xsf("output.xsf", frame)
For LAMMPS, writing both the data file and force-field coefficients at once is the most common pattern. write_lammps_system is a convenience wrapper — it does exactly what calling write_lammps_data and write_lammps_forcefield yourself would. Its first argument is a directory: it is created if needed, and the two files (always named system.data and system.ff) are written inside it:
paths = mp.io.write_lammps_system("system", frame, ff)
# Creates the directory system/ containing:
# system/system.data (topology + coordinates)
# system/system.ff (force-field coefficients)
# and returns {"data": Path("system/system.data"), "ff": Path("system/system.ff")}
Supported data formats¶
| Format | Read | Write | Notes |
|---|---|---|---|
| PDB | read_pdb |
write_pdb |
ATOM/HETATM + CRYST1 + CONECT |
| GRO | read_gro |
write_gro |
GROMACS structure |
| MOL2 | read_mol2 |
— | Tripos MOL2 |
| LAMMPS data | read_lammps_data |
write_lammps_data |
Requires atom_style |
| LAMMPS molecule | read_lammps_molecule |
write_lammps_molecule |
Template files |
| XYZ | read_xyz |
— | Simple coordinate format |
| XSF | read_xsf |
write_xsf |
XCrySDen format |
| HDF5 | read_h5 |
write_h5 |
Binary, compressed |
| AMBER AC | read_amber_ac |
— | Antechamber format |
| AMBER inpcrd | read_amber_inpcrd |
— | AMBER coordinates |
Trajectory files: lazy Frame sequences¶
Trajectory readers return objects that behave like indexed, iterable sequences of frames. They use memory-mapped files and persistent indexing to handle large trajectories efficiently.
Reading¶
reader = mp.io.read_lammps_trajectory("dump.lammpstrj")
print(reader.n_frames) # total frame count
frame_0 = reader[0] # random access by index
frame_last = reader[-1] # negative indexing
subset = reader[10:20] # slicing returns list[Frame]
Iteration is lazy — frames are parsed on demand with background prefetching:
Always close the reader when done (or use a context manager) to release memory-mapped file descriptors:
Writing¶
Trajectory writers accept a list of frames:
mp.io.write_lammps_trajectory("output.lammpstrj", frames, atom_style="full")
mp.io.write_xyz_trajectory("output.xyz", frames)
mp.io.write_h5_trajectory("output.h5", frames)
Supported trajectory formats¶
| Format | Read | Write | Notes |
|---|---|---|---|
| LAMMPS dump | read_lammps_trajectory |
write_lammps_trajectory |
Custom columns supported |
| XYZ | read_xyz_trajectory |
write_xyz_trajectory |
Multi-frame XYZ |
| HDF5 | read_h5_trajectory |
write_h5_trajectory |
Binary, compressed, fast |
Log files: simulation run metadata¶
LAMMPS log files are different from data and trajectory files: they describe what LAMMPS did during each run, not where atoms were. MolPy preserves that structure. A parsed log contains the LAMMPS header, a tuple of runs, and raw text for anything that is not yet represented by a dedicated field.
The key idea is: use read_LAMMPS_log when you need run-level diagnostics such as thermo output, loop timing, load balance, neighbor statistics, or warnings. The thermo table stays in a NumPy-backed dataclass so downstream analysis can use dynamic LAMMPS column names directly.
In practice, each run is accessed in the same order it appears in the log:
log = mp.io.read_LAMMPS_log("log.lammps")
run = log.runs[0]
print(run.thermo.columns)
print(run.thermo.data["Temp"].mean())
print(run.loop_time.seconds)
print(run.neighbor_statistics.dangerous_builds)
The parser also keeps compatibility with older thermo-only code. LAMMPSLog(path).read() still supports log["stages"], while new code should prefer log.runs.
legacy = mp.io.LAMMPSLog("log.lammps").read()
first_stage = legacy["stages"][0]
print(first_stage["Step"][0])
Supported log formats¶
| Format | Read | Write | Notes |
|---|---|---|---|
| LAMMPS log | read_LAMMPS_log |
— | Returns nested dataclasses aligned with LAMMPS run output |
Force field files: ForceField in, ForceField out¶
Force field readers parse parameter files into ForceField objects. Force field writers serialize ForceField objects into engine-specific formats.
Reading¶
ff = mp.io.read_xml_forcefield("oplsaa.xml")
ff = mp.io.read_lammps_forcefield("system.ff")
ff = mp.io.read_top("forcefield.itp")
AMBER prmtop files contain both structure and parameters. read_amber returns both:
Writing¶
Each writer produces output in a specific engine format from the same ForceField object:
from molpy.io.forcefield import LAMMPSForceFieldWriter, XMLForceFieldWriter
from molpy.io.forcefield.top import GromacsForceFieldWriter
# LAMMPS coefficients
LAMMPSForceFieldWriter("system.ff", precision=4).write(ff)
# GROMACS .itp
GromacsForceFieldWriter("system.itp", precision=4).write(ff)
# OpenMM XML
XMLForceFieldWriter("system.xml", precision=6).write(ff)
Type filtering¶
LAMMPS force field files often need to include only the types actually used in the system. Pass type sets to filter the output:
LAMMPSForceFieldWriter("system.ff").write(
ff,
atom_types={"CT", "HC", "OH"},
bond_types={"CT-HC", "CT-OH"},
)
Supported force field formats¶
| Format | Read | Write | Notes |
|---|---|---|---|
| OpenMM/OPLS XML | read_xml_forcefield |
XMLForceFieldWriter |
Primary force field format |
| LAMMPS coefficients | read_lammps_forcefield |
LAMMPSForceFieldWriter |
Supports hybrid styles |
| GROMACS .itp | read_top |
GromacsForceFieldWriter |
Topology format |
| AMBER prmtop | read_amber |
— | Returns (Frame, ForceField) |
Extending with new formats¶
Adding a new data, trajectory, or force-field format is done by subclassing the reader/writer base classes and registering a factory function. That extension workflow — DataReader/DataWriter, BaseTrajectoryReader/TrajectoryWriter, and ForceFieldWriter + formatter registration — is documented in full in the Developer Guide: I/O Formats.
Quick reference¶
| I want to... | Function |
|---|---|
| Read a PDB | mp.io.read_pdb(path) |
| Write LAMMPS data + ff | mp.io.write_lammps_system(dir, frame, ff) |
| Read a trajectory | mp.io.read_lammps_trajectory(path) |
| Load OPLS-AA | mp.io.read_xml_forcefield("oplsaa.xml") |
| Read AMBER topology | mp.io.read_amber(prmtop, inpcrd) |
| Write filtered ff | LAMMPSForceFieldWriter(path).write(ff, atom_types={...}) |
| Add a new format | Developer Guide: I/O Formats |
See also: API Reference: I/O, Concepts: Force Field.