Skip to content

I/O: Reading and Writing File Formats

Readers turn files into Frames; writers turn Frames back into files — one pattern across every format. To add support for a new format, see Developer Guide: I/O Formats.

MolPy's I/O system has three layers: data (single-frame structures), trajectory (multi-frame sequences), and forcefield (parameter files). Each layer has its own reader/writer base class.

Data files: Frame in, Frame out

Data readers consume a file and return a Frame. Data writers take a Frame and produce a file. The Frame is the universal exchange object — it carries atom tables, bond tables, metadata, and optionally a box.

Reading

Every format has a factory function at mp.io.read_*:

import molpy as mp

frame = mp.io.read_pdb("molecule.pdb")
frame = mp.io.read_gro("system.gro")
frame = mp.io.read_mol2("ligand.mol2")
frame = mp.io.read_lammps_data("system.data", atom_style="full")
frame = mp.io.read_xyz("coords.xyz")
frame = mp.io.read_h5("snapshot.h5")

All readers follow the same pattern: pass a file path, get a Frame back. An optional frame argument lets you populate an existing frame instead of creating a new one.

# Populate an existing frame (useful for merging metadata)
existing = mp.Frame(timestep=42)
frame = mp.io.read_pdb("molecule.pdb", frame=existing)
print(frame.metadata["timestep"])  # 42

The Frame returned by a data reader typically contains an "atoms" block with coordinates, and may contain "bonds", "angles", and "dihedrals" blocks depending on the format.

frame = mp.io.read_pdb("water.pdb")
print(frame["atoms"].nrows)           # number of atoms
print(list(frame["atoms"].keys()))    # ['id', 'element', 'x', 'y', 'z', ...]

Writing

Writers follow the same factory pattern:

mp.io.write_pdb("output.pdb", frame)
mp.io.write_gro("output.gro", frame)
mp.io.write_lammps_data("output.data", frame, atom_style="full")
mp.io.write_h5("output.h5", frame)
mp.io.write_xsf("output.xsf", frame)

For LAMMPS, writing both the data file and force-field coefficients at once is the most common pattern. write_lammps_system is a convenience wrapper — it does exactly what calling write_lammps_data and write_lammps_forcefield yourself would. Its first argument is a directory: it is created if needed, and the two files (always named system.data and system.ff) are written inside it:

paths = mp.io.write_lammps_system("system", frame, ff)
# Creates the directory system/ containing:
#   system/system.data   (topology + coordinates)
#   system/system.ff     (force-field coefficients)
# and returns {"data": Path("system/system.data"), "ff": Path("system/system.ff")}

Supported data formats

Format Read Write Notes
PDB read_pdb write_pdb ATOM/HETATM + CRYST1 + CONECT
GRO read_gro write_gro GROMACS structure
MOL2 read_mol2 Tripos MOL2
LAMMPS data read_lammps_data write_lammps_data Requires atom_style
LAMMPS molecule read_lammps_molecule write_lammps_molecule Template files
XYZ read_xyz Simple coordinate format
XSF read_xsf write_xsf XCrySDen format
HDF5 read_h5 write_h5 Binary, compressed
AMBER AC read_amber_ac Antechamber format
AMBER inpcrd read_amber_inpcrd AMBER coordinates

Trajectory files: lazy Frame sequences

Trajectory readers return objects that behave like indexed, iterable sequences of frames. They use memory-mapped files and persistent indexing to handle large trajectories efficiently.

Reading

reader = mp.io.read_lammps_trajectory("dump.lammpstrj")

print(reader.n_frames)       # total frame count
frame_0 = reader[0]          # random access by index
frame_last = reader[-1]      # negative indexing
subset = reader[10:20]       # slicing returns list[Frame]

Iteration is lazy — frames are parsed on demand with background prefetching:

for frame in reader:
    atoms = frame["atoms"]
    # process one frame at a time

Always close the reader when done (or use a context manager) to release memory-mapped file descriptors:

reader.close()

Writing

Trajectory writers accept a list of frames:

mp.io.write_lammps_trajectory("output.lammpstrj", frames, atom_style="full")
mp.io.write_xyz_trajectory("output.xyz", frames)
mp.io.write_h5_trajectory("output.h5", frames)

Supported trajectory formats

Format Read Write Notes
LAMMPS dump read_lammps_trajectory write_lammps_trajectory Custom columns supported
XYZ read_xyz_trajectory write_xyz_trajectory Multi-frame XYZ
HDF5 read_h5_trajectory write_h5_trajectory Binary, compressed, fast

Log files: simulation run metadata

LAMMPS log files are different from data and trajectory files: they describe what LAMMPS did during each run, not where atoms were. MolPy preserves that structure. A parsed log contains the LAMMPS header, a tuple of runs, and raw text for anything that is not yet represented by a dedicated field.

The key idea is: use read_LAMMPS_log when you need run-level diagnostics such as thermo output, loop timing, load balance, neighbor statistics, or warnings. The thermo table stays in a NumPy-backed dataclass so downstream analysis can use dynamic LAMMPS column names directly.

In practice, each run is accessed in the same order it appears in the log:

log = mp.io.read_LAMMPS_log("log.lammps")
run = log.runs[0]

print(run.thermo.columns)
print(run.thermo.data["Temp"].mean())
print(run.loop_time.seconds)
print(run.neighbor_statistics.dangerous_builds)

The parser also keeps compatibility with older thermo-only code. LAMMPSLog(path).read() still supports log["stages"], while new code should prefer log.runs.

legacy = mp.io.LAMMPSLog("log.lammps").read()
first_stage = legacy["stages"][0]
print(first_stage["Step"][0])

Supported log formats

Format Read Write Notes
LAMMPS log read_LAMMPS_log Returns nested dataclasses aligned with LAMMPS run output

Force field files: ForceField in, ForceField out

Force field readers parse parameter files into ForceField objects. Force field writers serialize ForceField objects into engine-specific formats.

Reading

ff = mp.io.read_xml_forcefield("oplsaa.xml")
ff = mp.io.read_lammps_forcefield("system.ff")
ff = mp.io.read_top("forcefield.itp")

AMBER prmtop files contain both structure and parameters. read_amber returns both:

frame, ff = mp.io.read_amber("system.prmtop", "system.inpcrd")

Writing

Each writer produces output in a specific engine format from the same ForceField object:

from molpy.io.forcefield import LAMMPSForceFieldWriter, XMLForceFieldWriter
from molpy.io.forcefield.top import GromacsForceFieldWriter

# LAMMPS coefficients
LAMMPSForceFieldWriter("system.ff", precision=4).write(ff)

# GROMACS .itp
GromacsForceFieldWriter("system.itp", precision=4).write(ff)

# OpenMM XML
XMLForceFieldWriter("system.xml", precision=6).write(ff)

Type filtering

LAMMPS force field files often need to include only the types actually used in the system. Pass type sets to filter the output:

LAMMPSForceFieldWriter("system.ff").write(
    ff,
    atom_types={"CT", "HC", "OH"},
    bond_types={"CT-HC", "CT-OH"},
)

Supported force field formats

Format Read Write Notes
OpenMM/OPLS XML read_xml_forcefield XMLForceFieldWriter Primary force field format
LAMMPS coefficients read_lammps_forcefield LAMMPSForceFieldWriter Supports hybrid styles
GROMACS .itp read_top GromacsForceFieldWriter Topology format
AMBER prmtop read_amber Returns (Frame, ForceField)

Extending with new formats

Adding a new data, trajectory, or force-field format is done by subclassing the reader/writer base classes and registering a factory function. That extension workflow — DataReader/DataWriter, BaseTrajectoryReader/TrajectoryWriter, and ForceFieldWriter + formatter registration — is documented in full in the Developer Guide: I/O Formats.

Quick reference

I want to... Function
Read a PDB mp.io.read_pdb(path)
Write LAMMPS data + ff mp.io.write_lammps_system(dir, frame, ff)
Read a trajectory mp.io.read_lammps_trajectory(path)
Load OPLS-AA mp.io.read_xml_forcefield("oplsaa.xml")
Read AMBER topology mp.io.read_amber(prmtop, inpcrd)
Write filtered ff LAMMPSForceFieldWriter(path).write(ff, atom_types={...})
Add a new format Developer Guide: I/O Formats

See also: API Reference: I/O, Concepts: Force Field.