I/O: Reading and Writing File Formats¶

Readers turn files into Frames; writers turn Frames back into files — one pattern across every format. To add support for a new format, see Developer Guide: I/O Formats.

MolPy's I/O system has three layers: data (single-frame structures), trajectory (multi-frame sequences), and forcefield (parameter files). Each layer has its own reader/writer base class.

Data files: Frame in, Frame out¶

Data readers consume a file and return a Frame. Data writers take a Frame and produce a file. The Frame is the universal exchange object — it carries atom tables, bond tables, metadata, and optionally a box.

Reading¶

Every format has a factory function at mp.io.read_*:

import molpy as mp

frame = mp.io.read_pdb("molecule.pdb")
frame = mp.io.read_gro("system.gro")
frame = mp.io.read_mol2("ligand.mol2")
frame = mp.io.read_lammps_data("system.data", atom_style="full")
frame = mp.io.read_xyz("coords.xyz")
frame = mp.io.read_h5("snapshot.h5")

All readers follow the same pattern: pass a file path, get a Frame back. An optional frame argument lets you populate an existing frame instead of creating a new one.

# Populate an existing frame (useful for merging metadata)
existing = mp.Frame(timestep=42)
frame = mp.io.read_pdb("molecule.pdb", frame=existing)
print(frame.metadata["timestep"])  # 42

The Frame returned by a data reader typically contains an "atoms" block with coordinates, and may contain "bonds", "angles", and "dihedrals" blocks depending on the format.

frame = mp.io.read_pdb("water.pdb")
print(frame["atoms"].nrows)           # number of atoms
print(list(frame["atoms"].keys()))    # ['id', 'element', 'x', 'y', 'z', ...]

Writing¶

Writers follow the same factory pattern:

mp.io.write_pdb("output.pdb", frame)
mp.io.write_gro("output.gro", frame)
mp.io.write_lammps_data("output.data", frame, atom_style="full")
mp.io.write_h5("output.h5", frame)
mp.io.write_xsf("output.xsf", frame)

For LAMMPS, writing both the data file and force-field coefficients at once is the most common pattern. write_lammps_system is a convenience wrapper — it does exactly what calling write_lammps_data and write_lammps_forcefield yourself would. Its first argument is a directory: it is created if needed, and the two files (always named system.data and system.ff) are written inside it:

paths = mp.io.write_lammps_system("system", frame, ff)
# Creates the directory system/ containing:
#   system/system.data   (topology + coordinates)
#   system/system.ff     (force-field coefficients)
# and returns {"data": Path("system/system.data"), "ff": Path("system/system.ff")}

Supported data formats¶

Format	Read	Write	Notes
PDB	`read_pdb`	`write_pdb`	ATOM/HETATM + CRYST1 + CONECT
GRO	`read_gro`	`write_gro`	GROMACS structure
MOL2	`read_mol2`	—	Tripos MOL2
LAMMPS data	`read_lammps_data`	`write_lammps_data`	Requires `atom_style`
LAMMPS molecule	`read_lammps_molecule`	`write_lammps_molecule`	Template files
XYZ	`read_xyz`	—	Simple coordinate format
XSF	`read_xsf`	`write_xsf`	XCrySDen format
HDF5	`read_h5`	`write_h5`	Binary, compressed
AMBER AC	`read_amber_ac`	—	Antechamber format
AMBER inpcrd	`read_amber_inpcrd`	—	AMBER coordinates

Trajectory files: lazy Frame sequences¶

Trajectory readers return objects that behave like indexed, iterable sequences of frames. They use memory-mapped files and persistent indexing to handle large trajectories efficiently.

Reading¶

reader = mp.io.read_lammps_trajectory("dump.lammpstrj")

print(reader.n_frames)       # total frame count
frame_0 = reader[0]          # random access by index
frame_last = reader[-1]      # negative indexing
subset = reader[10:20]       # slicing returns list[Frame]

Iteration is lazy — frames are parsed on demand with background prefetching:

for frame in reader:
    atoms = frame["atoms"]
    # process one frame at a time

Always close the reader when done (or use a context manager) to release memory-mapped file descriptors:

reader.close()

Writing¶

Trajectory writers accept a list of frames:

mp.io.write_lammps_trajectory("output.lammpstrj", frames, atom_style="full")
mp.io.write_xyz_trajectory("output.xyz", frames)
mp.io.write_h5_trajectory("output.h5", frames)

Supported trajectory formats¶

Format	Read	Write	Notes
LAMMPS dump	`read_lammps_trajectory`	`write_lammps_trajectory`	Custom columns supported
XYZ	`read_xyz_trajectory`	`write_xyz_trajectory`	Multi-frame XYZ
HDF5	`read_h5_trajectory`	`write_h5_trajectory`	Binary, compressed, fast

Log files: simulation run metadata¶

LAMMPS log files are different from data and trajectory files: they describe what LAMMPS did during each run, not where atoms were. MolPy preserves that structure. A parsed log contains the LAMMPS header, a tuple of runs, and raw text for anything that is not yet represented by a dedicated field.

The key idea is: use read_LAMMPS_log when you need run-level diagnostics such as thermo output, loop timing, load balance, neighbor statistics, or warnings. The thermo table stays in a NumPy-backed dataclass so downstream analysis can use dynamic LAMMPS column names directly.

In practice, each run is accessed in the same order it appears in the log:

log = mp.io.read_LAMMPS_log("log.lammps")
run = log.runs[0]

print(run.thermo.columns)
print(run.thermo.data["Temp"].mean())
print(run.loop_time.seconds)
print(run.neighbor_statistics.dangerous_builds)

The parser also keeps compatibility with older thermo-only code. LAMMPSLog(path).read() still supports log["stages"], while new code should prefer log.runs.

legacy = mp.io.LAMMPSLog("log.lammps").read()
first_stage = legacy["stages"][0]
print(first_stage["Step"][0])

Supported log formats¶

Format	Read	Write	Notes
LAMMPS log	`read_LAMMPS_log`	—	Returns nested dataclasses aligned with LAMMPS run output

Force field files: ForceField in, ForceField out¶

Force field readers parse parameter files into ForceField objects. Force field writers serialize ForceField objects into engine-specific formats.

Reading¶

ff = mp.io.read_xml_forcefield("oplsaa.xml")
ff = mp.io.read_lammps_forcefield("system.ff")
ff = mp.io.read_top("forcefield.itp")

AMBER prmtop files contain both structure and parameters. read_amber returns both:

frame, ff = mp.io.read_amber("system.prmtop", "system.inpcrd")

Writing¶

Each writer produces output in a specific engine format from the same ForceField object:

from molpy.io.forcefield import LAMMPSForceFieldWriter, XMLForceFieldWriter
from molpy.io.forcefield.top import GromacsForceFieldWriter

# LAMMPS coefficients
LAMMPSForceFieldWriter("system.ff", precision=4).write(ff)

# GROMACS .itp
GromacsForceFieldWriter("system.itp", precision=4).write(ff)

# OpenMM XML
XMLForceFieldWriter("system.xml", precision=6).write(ff)

Type filtering¶

LAMMPS force field files often need to include only the types actually used in the system. Pass type sets to filter the output:

LAMMPSForceFieldWriter("system.ff").write(
    ff,
    atom_types={"CT", "HC", "OH"},
    bond_types={"CT-HC", "CT-OH"},
)

Supported force field formats¶

Format	Read	Write	Notes
OpenMM/OPLS XML	`read_xml_forcefield`	`XMLForceFieldWriter`	Primary force field format
LAMMPS coefficients	`read_lammps_forcefield`	`LAMMPSForceFieldWriter`	Supports hybrid styles
GROMACS .itp	`read_top`	`GromacsForceFieldWriter`	Topology format
AMBER prmtop	`read_amber`	—	Returns `(Frame, ForceField)`

Extending with new formats¶

Adding a new data, trajectory, or force-field format is done by subclassing the reader/writer base classes and registering a factory function. That extension workflow — DataReader/DataWriter, BaseTrajectoryReader/TrajectoryWriter, and ForceFieldWriter + formatter registration — is documented in full in the Developer Guide: I/O Formats.

Quick reference¶

I want to...	Function
Read a PDB	`mp.io.read_pdb(path)`
Write LAMMPS data + ff	`mp.io.write_lammps_system(dir, frame, ff)`
Read a trajectory	`mp.io.read_lammps_trajectory(path)`
Load OPLS-AA	`mp.io.read_xml_forcefield("oplsaa.xml")`
Read AMBER topology	`mp.io.read_amber(prmtop, inpcrd)`
Write filtered ff	`LAMMPSForceFieldWriter(path).write(ff, atom_types={...})`
Add a new format	Developer Guide: I/O Formats