Parser Module¶
MolPy includes a flexible parser layer for converting string representations of molecules and polymers into internal data structures (Atomistic, Monomer, Polymer, …).
The parser module focuses on:
- SMILES and BigSMILES‑like notations
- Turning parsed structures into
Atomisticand wrapper objects - Feeding results into builders, reacters and typifiers
From SMILES/BigSMILES to Atomistic¶
While the parser API is still evolving, the core idea is:
- Take a string (e.g. SMILES or BigSMILES).
- Parse it into an intermediate representation (IR).
- Convert the IR into an
Atomisticstructure (atoms + bonds).
In [1]:
Copied!
from molpy.parser import SmartsParser, SmilesParser
# Parse SMILES string
smiles_parser = SmilesParser()
smiles_ir = smiles_parser.parse_smiles("CCO")
print(f"Parsed SMILES: {smiles_ir}")
# Parse SMARTS pattern
smarts_parser = SmartsParser()
smarts_ir = smarts_parser.parse_smarts("[C;H2,H3]") # Carbon with 2 or 3 hydrogens
print(f"Parsed SMARTS: {smarts_ir}")
from molpy.parser import SmartsParser, SmilesParser
# Parse SMILES string
smiles_parser = SmilesParser()
smiles_ir = smiles_parser.parse_smiles("CCO")
print(f"Parsed SMILES: {smiles_ir}")
# Parse SMARTS pattern
smarts_parser = SmartsParser()
smarts_ir = smarts_parser.parse_smarts("[C;H2,H3]") # Carbon with 2 or 3 hydrogens
print(f"Parsed SMARTS: {smarts_ir}")
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[1], line 4 1 from molpy.parser import SmartsParser, SmilesParser 3 # Parse SMILES string ----> 4 smiles_parser = SmilesParser() 5 smiles_ir = smiles_parser.parse_smiles("CCO") 6 print(f"Parsed SMILES: {smiles_ir}") File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/parser/smiles.py:649, in SmilesParser.__init__(self) 640 def __init__(self): 641 config = GrammarConfig( 642 grammar_path=Path(__file__).parent / "grammar" / "smiles.lark", 643 start="smiles", (...) 647 auto_reload=True, 648 ) --> 649 super().__init__(config) File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/parser/base.py:44, in GrammarParserBase.__init__(self, config) 42 self._lark: Lark | None = None 43 self._mtime: float | None = None ---> 44 self._compile_grammar(force=True) File ~/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/parser/base.py:77, in GrammarParserBase._compile_grammar(self, force) 75 path = self.config.grammar_path 76 if not path.exists(): ---> 77 raise FileNotFoundError(f"Grammar file not found: {path}") 79 grammar_text = path.read_text(encoding="utf-8") 81 self._lark = Lark( 82 grammar_text, 83 start=self.config.start, (...) 88 keep_all_tokens=True, 89 ) FileNotFoundError: Grammar file not found: /opt/buildhome/.asdf/installs/python/3.13.3/lib/python3.13/site-packages/molpy/parser/grammar/smiles.lark
Integration with Molecular Building¶
Parsed Atomistic structures are commonly used as inputs to:
Monomer/Polymerwrappers- Reaction workflows via
reacter - Typifiers (
molpy.typifier) for assigning atom types
The typical pipeline looks like:
SMILES / BigSMILES string
↓
parser
↓
Atomistic
↓
Monomer / Polymer / Builder / Reacter / Typifier