Parser¶

Grammar-based parsing for chemical string notations. Convenience functions at mp.parser.*.

Quick reference¶

Function	Input	Output	Use when
`parse_molecule(s)`	SMILES	`Atomistic`	One specific molecule
`parse_mixture(s)`	dot-separated SMILES	`list[Atomistic]`	Multi-component (`[Li+].[F-]`)
`parse_monomer(s)`	BigSMILES	`Atomistic` (with ports)	Repeat unit with `<`/`>`/`$` markers
`parse_polymer(s)`	BigSMILES	`PolymerSpec`	Multi-monomer specification
`parse_smarts(s)`	SMARTS	`SmartsIR`	Pattern matching / typification
`parse_smiles(s)`	SMILES	`SmilesGraphIR`	IR-level inspection
`parse_bigsmiles(s)`	BigSMILES	`BigSmilesMoleculeIR`	IR-level BigSMILES inspection
`parse_cgsmiles(s)`	CGSmiles	`CGSmilesIR`	Topology architecture graphs
`parse_gbigsmiles(s)`	GBigSMILES	`GBigSmilesSystemIR`	System specs with distributions

Canonical example¶

import molpy as mp

mol = mp.parser.parse_molecule("CCO")           # Atomistic
ions = mp.parser.parse_mixture("[Li+].[F-]")    # [Atomistic, Atomistic]
monomer = mp.parser.parse_monomer("{[][<]CCO[>][]}") # Atomistic with ports
spec = mp.parser.parse_polymer("{[<]CC[>],[<]CC(C)[>]}") # PolymerSpec

smilesir_to_atomistic — SMILES IR → Atomistic
bigsmilesir_to_monomer — BigSMILES IR → Atomistic
bigsmilesir_to_polymerspec — BigSMILES IR → PolymerSpec
Guide: Parsing Chemistry

Full API¶

Convenience layer¶

parser ¶

Unified parser API for SMILES, BigSMILES, GBigSMILES, CGSmiles, and SMARTS.

Convenience wrappers live here so downstream code can do::

from molpy.parser import parse_molecule, parse_polymer, parse_smarts

PolymerSegment `dataclass` ¶

PolymerSegment(monomers, composition_type=None, distribution_params=None, end_groups=list(), repeat_units_ir=list(), end_groups_ir=list())

Polymer segment specification.

PolymerSpec `dataclass` ¶

PolymerSpec(segments, topology, start_group_ir=None, end_group_ir=None)

Complete polymer specification.

all_monomers ¶

all_monomers()

Get all structures from all segments.

SmartsParser ¶

SmartsParser()

Bases: GrammarParserBase

Main parser for SMARTS patterns.

Usage

parser = SmartsParser() ir = parser.parse_smarts("[#6]") ir = parser.parse_smarts("c1ccccc1") ir = parser.parse_smarts("[C,N,O]")

parse_smarts ¶

parse_smarts(smarts)

Parse SMARTS string into SmartsIR.

Parameters:

Name	Type	Description	Default
`smarts`	`str`	SMARTS pattern string	required

Returns:

Type	Description
`SmartsIR`	SmartsIR representing the pattern

Raises:

Type	Description
`ValueError`	if parsing fails or rings are unclosed

Examples:

>>> parser = SmartsParser()
>>> ir = parser.parse_smarts("C")
>>> len(ir.atoms)
1
>>> ir = parser.parse_smarts("[#6]")
>>> ir.atoms[0].expression.children[0].type
'atomic_num'

bigsmilesir_to_monomer ¶

bigsmilesir_to_monomer(ir)

Convert BigSmilesMoleculeIR to Atomistic structure (topology only).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately.

Supports BigSMILES with stochastic object: {[<]CC[>]} (ONE repeat unit only)

Parameters:

Name	Type	Description	Default
`ir`	`BigSmilesMoleculeIR`	BigSmilesMoleculeIR from parser	required

Returns:

Type	Description
`Atomistic`	Atomistic structure with ports marked on atoms, NO positions

Raises:

Type	Description
`ValueError`	If IR contains multiple repeat units (use bigsmilesir_to_polymerspec instead)

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> struct = bigsmilesir_to_monomer(ir)
>>> # Ports are marked on atoms: atom["port"] = "<" or ">"

bigsmilesir_to_polymerspec ¶

bigsmilesir_to_polymerspec(ir)

Convert BigSmilesIR to complete polymer specification.

Single responsibility: IR -> PolymerSpec conversion only. Parsing should be done separately.

Extracts monomers and analyzes polymer topology and composition.

Parameters:

Name	Type	Description	Default
`ir`	`BigSmilesMoleculeIR`	BigSmilesIR from parser	required

Returns:

Type	Description
`PolymerSpec`	PolymerSpec with segments, topology, and all monomers

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> spec = bigsmilesir_to_polymerspec(ir)
>>> spec.topology
'homopolymer'

parse_bigsmiles ¶

parse_bigsmiles(src)

Parse a BigSMILES string into BigSmilesMoleculeIR.

This parser accepts BigSMILES syntax including stochastic objects, bond descriptors, and repeat units. It does NOT accept GBigSMILES annotations.

Parameters:

Name	Type	Description	Default
`src`	`str`	BigSMILES string	required

Returns:

Type	Description
`BigSmilesMoleculeIR`	BigSmilesMoleculeIR containing backbone and stochastic objects

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> len(ir.stochastic_objects)
1

parse_cgsmiles ¶

parse_cgsmiles(src)

Parse a CGSmiles string.

Parameters:

Name	Type	Description	Default
`src`	`str`	CGSmiles string (e.g., `"{[#PEO][#PMA]}.{#PEO=[$]COC[$]}"`)	required

Returns:

Type	Description
`CGSmilesIR`	CGSmilesIR with base graph and fragment definitions

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> result = parse_cgsmiles("{[#PEO][#PMA][#PEO]}")
>>> len(result.base_graph.nodes)
3
>>> result = parse_cgsmiles("{[#PEO]|5}")
>>> len(result.base_graph.nodes)
5

parse_gbigsmiles ¶

parse_gbigsmiles(src)

Parse a GBigSMILES string into GBigSmilesSystemIR.

This parser accepts GBigSMILES syntax including all BigSMILES features plus system size specifications and other generative annotations. Always returns GBigSmilesSystemIR, wrapping single molecules in a system structure.

Parameters:

Name	Type	Description	Default
`src`	`str`	GBigSMILES string	required

Returns:

Type	Description
`GBigSmilesSystemIR`	GBigSmilesSystemIR containing the parsed system

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> ir = parse_gbigsmiles("{[<]CC[>]}|5e5|")
>>> isinstance(ir, GBigSmilesSystemIR)
True

parse_mixture ¶

parse_mixture(smiles)

Parse a (possibly dot-separated) SMILES string into a list of molecules.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string, components separated by `'.'`.	required

Returns:

Type	Description
`'list[Atomistic]'`	List of :class:`Atomistic` structures (always a list, even for one).

parse_molecule ¶

parse_molecule(smiles)

Parse a SMILES string and return a single :class:Atomistic structure.

Parameters:

Name	Type	Description	Default
`smiles`	`str`	SMILES string for a single molecule (no dots).	required

Returns:

Type	Description
`'Atomistic'`	class:`Atomistic` structure.

parse_monomer ¶

parse_monomer(bigsmiles)

Parse a BigSMILES string and return the first monomer as :class:Atomistic.

Parameters:

Name	Type	Description	Default
`bigsmiles`	`str`	BigSMILES string.	required

Returns:

Name	Type	Description
`Monomer`	`'Atomistic'`	class:`Atomistic` structure with port annotations.

parse_polymer ¶

parse_polymer(bigsmiles)

Parse a BigSMILES string and return a :class:PolymerSpec.

Parameters:

Name	Type	Description	Default
`bigsmiles`	`str`	BigSMILES string.	required

Returns:

Type	Description
`PolymerSpec`	class:`PolymerSpec` describing segments, topology, and monomers.

parse_smarts ¶

parse_smarts(pattern)

Parse a SMARTS pattern string into :class:SmartsIR.

This is a thin wrapper around SmartsParser().parse_smarts(pattern).

Parameters:

Name	Type	Description	Default
`pattern`	`str`	SMARTS string.	required

Returns:

Name	Type	Description
`Parsed`	`'SmartsIR'`	class:`SmartsIR` representation.

parse_smiles ¶

parse_smiles(src)

Parse a SMILES string into SmilesGraphIR or list of SmilesGraphIR.

This parser only accepts pure SMILES syntax. It will reject BigSMILES or GBigSMILES constructs.

For dot-separated SMILES (e.g., "C.C", "CC.O"), returns a list of SmilesGraphIR, one for each disconnected component.

Parameters:

Name	Type	Description	Default
`src`	`str`	SMILES string (may contain dots for mixtures)	required

Returns:

Type	Description
`SmilesGraphIR \| list[SmilesGraphIR]`	SmilesGraphIR for single molecule, or list[SmilesGraphIR] for mixtures

Raises:

Type	Description
`ValueError`	if syntax errors detected or unclosed rings

Examples:

>>> ir = parse_smiles("CCO")
>>> len(ir.atoms)
3
>>> irs = parse_smiles("C.C")
>>> len(irs)
2

smilesir_to_atomistic ¶

smilesir_to_atomistic(ir)

Convert SmilesGraphIR to Atomistic structure (topology only, no 3D coordinates).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately using parse_smiles().

This is a simple conversion function for pure SMILES (no BigSMILES features like ports or descriptors). For BigSMILES with ports, use bigsmilesir_to_monomer() instead.

Parameters:

Name	Type	Description	Default
`ir`	`SmilesGraphIR`	SmilesGraphIR from parse_smiles()	required

Returns:

Type	Description
`Atomistic`	Atomistic structure with atoms and bonds (no 3D coordinates, no ports)

Examples:

>>> from molpy.parser.smiles import parse_smiles, smilesir_to_atomistic
>>> ir = parse_smiles("CCO")
>>> struct = smilesir_to_atomistic(ir)
>>> len(struct.atoms)
3
>>> len(struct.bonds)
2

SMARTS¶

smarts ¶

AtomExpressionIR `dataclass` ¶

AtomExpressionIR(op, children=list(), id=(lambda: id(AtomExpressionIR))())

Represents logical expressions combining atom primitives.

Operators

'and' (&): high-priority AND
'or' (,): OR
'weak_and' (;): low-priority AND
'not' (!): negation

Examples:

AtomExpressionIR(op='and', children=[primitive1, primitive2])
AtomExpressionIR(op='not', children=[primitive])

AtomPrimitiveIR `dataclass` ¶

AtomPrimitiveIR(type, value=None, id=(lambda: id(AtomPrimitiveIR))())

Represents a single primitive atom pattern in SMARTS.

Examples:

symbol='C' (carbon atom)
atomic_num=6 (atomic number 6)
neighbor_count=3 (X3, exactly 3 neighbors)
ring_size=6 (r6, in 6-membered ring)
ring_count=2 (R2, in exactly 2 rings)
has_label='%atomA' (has label %atomA)
matches_smarts=SmartsIR(...) (recursive SMARTS)

SmartsAtomIR `dataclass` ¶

SmartsAtomIR(expression, label=None, id=(lambda: id(SmartsAtomIR))())

Represents a complete SMARTS atom with expression and optional label.

Attributes:

Name	Type	Description
`expression`	`AtomExpressionIR \| AtomPrimitiveIR`	The atom pattern expression
`label`	`int \| None`	Optional numeric label for ring closures or references

SmartsBondIR `dataclass` ¶

SmartsBondIR(itom, jtom, bond_type='implicit')

Represents a bond between two SMARTS atoms.

In SMARTS, bonds are implicit (single or aromatic) unless specified. Explicit bond types can be specified between atoms.

SmartsIR `dataclass` ¶

SmartsIR(atoms=list(), bonds=list())

Complete SMARTS pattern intermediate representation.

Attributes:

Name	Type	Description
`atoms`	`list[SmartsAtomIR]`	List of all atoms in the pattern
`bonds`	`list[SmartsBondIR]`	List of all bonds in the pattern

SmartsParser ¶

SmartsParser()

Bases: GrammarParserBase

Main parser for SMARTS patterns.

Usage

parser = SmartsParser() ir = parser.parse_smarts("[#6]") ir = parser.parse_smarts("c1ccccc1") ir = parser.parse_smarts("[C,N,O]")

parse_smarts ¶

parse_smarts(smarts)

Parse SMARTS string into SmartsIR.

Parameters:

Name	Type	Description	Default
`smarts`	`str`	SMARTS pattern string	required

Returns:

Type	Description
`SmartsIR`	SmartsIR representing the pattern

Raises:

Type	Description
`ValueError`	if parsing fails or rings are unclosed

Examples:

>>> parser = SmartsParser()
>>> ir = parser.parse_smarts("C")
>>> len(ir.atoms)
1
>>> ir = parser.parse_smarts("[#6]")
>>> ir.atoms[0].expression.children[0].type
'atomic_num'

SmartsTransformer ¶

SmartsTransformer()

Bases: Transformer

Transforms Lark parse tree into SmartsIR.

Handles

Atom primitives (symbols, atomic numbers, properties)
Logical expressions (AND, OR, NOT, weak AND)
Branches
Ring closures
Recursive SMARTS patterns

and_expression ¶

and_expression(children)

Process high-priority AND expression (&).

atom ¶

atom(children)

Process complete atom: [expression] or bare_atom, with optional label.

Returns:

Type	Description
`SmartsAtomIR`	SmartsAtomIR

atom_class ¶

atom_class(children)

Extract atom class name.

atom_id ¶

atom_id(children)

Process atom identifier (primitive).

Can be

atom_symbol
+ atomic_num (atomic number)¶
$( + SMARTS + ) (recursive SMARTS)
%label (has label)
X + N? (neighbor count, optional number)
x + N? (ring connectivity, optional number)
r + N? (ring size, optional number)
R + N? (ring count, optional number)
H + N? (hydrogen count, optional number)
h + N? (implicit hydrogen count, optional number)
D + N? (degree, optional number)
v + N? (valence, optional number)
+/- + N? (charge)
a (aromatic)
A (aliphatic)
@ / @@ (chirality)
NUM + atom_symbol (isotope)
atom_class (atom class reference)

atom_label ¶

atom_label(children)

Extract atom label (numeric).

atom_symbol ¶

atom_symbol(children)

Process atom symbol (element or wildcard).

atomic_num ¶

atomic_num(children)

Extract atomic number.

bare_atom ¶

bare_atom(children)

Process bare (unbracketed) atom: element symbol or atom class.

bond ¶

bond(children)

Extract bond type (may be negated with !).

branch ¶

branch(children)

Process branch: the content inside or after chain. This just returns the SmartsIR from _string.

charge ¶

charge(children)

Extract charge (+ or -).

chirality ¶

chirality(children)

Extract chirality (@ or @@).

degree ¶

degree(children)

Extract degree.

has_label ¶

has_label(children)

Extract label.

hydrogen_count ¶

hydrogen_count(children)

Extract explicit hydrogen count.

implicit_and ¶

implicit_and(children)

Process implicit AND: adjacent primitives without operator (e.g. #6X3r5).

implicit_hydrogen_count ¶

implicit_hydrogen_count(children)

Extract implicit hydrogen count.

isotope ¶

isotope(children)

Extract isotope mass number.

isotope_atom ¶

isotope_atom(children)

Process isotope-prefixed atom (e.g. 2H for deuterium).

matches_string ¶

matches_string(children)

Extract recursive SMARTS pattern.

neighbor_count ¶

neighbor_count(children)

Extract neighbor count.

nonlastbranch ¶

nonlastbranch(children)

Process non-last branch: (bond? branch_content).

not_expression ¶

not_expression(children)

Process NOT expression (!).

or_expression ¶

or_expression(children)

Process OR expression (,).

ring_connectivity ¶

ring_connectivity(children)

Extract ring connectivity.

ring_count ¶

ring_count(children)

Extract ring count.

ring_size ¶

ring_size(children)

Extract ring size.

start ¶

start(children)

Entry point: process complete SMARTS pattern.

The grammar produces a tree like: start atom ... atom ...

We need to build the IR from this flat or nested structure.

valence ¶

valence(children)

Extract valence.

weak_and_expression ¶

weak_and_expression(children)

Process low-priority AND expression (;).

SMILES / BigSMILES / CGSmiles¶

smiles ¶

SMILES, BigSMILES, GBigSMILES, and CGSmiles parsers.

This module provides four explicit parser APIs: - parse_smiles: Parse pure SMILES strings - parse_bigsmiles: Parse BigSMILES strings - parse_gbigsmiles: Parse GBigSMILES strings - parse_cgsmiles: Parse CGSmiles strings

Each parser uses its own dedicated grammar and transformer.

BigSmilesMoleculeIR `dataclass` ¶

BigSmilesMoleculeIR(backbone=BigSmilesSubgraphIR(), stochastic_objects=list())

Top-level structural IR for BigSMILES strings.

BigSmilesSubgraphIR `dataclass` ¶

BigSmilesSubgraphIR(atoms=list(), bonds=list(), descriptors=list())

Structural fragment that carries atoms, bonds, and descriptors.

BondingDescriptorIR `dataclass` ¶

BondingDescriptorIR(id=_generate_id(), symbol=None, label=None, bond_order=1, role='internal', anchor_atom=None, non_covalent_context=None, extras=dict(), position_hint=None)

Standalone descriptor node for bonding points.

Per BigSMILES v1.1: bonding descriptors attach to atoms within repeat units. The anchor_atom field tracks which atom this descriptor is attached to. If anchor_atom is None, this is a terminal bonding descriptor at the stochastic object boundary.

CGSmilesBondIR `dataclass` ¶

CGSmilesBondIR(node_i, node_j, order=1, id=_generate_id())

Intermediate representation for a CGSmiles bond.

Bonds directly reference NodeIR objects, not just IDs.

CGSmilesFragmentIR `dataclass` ¶

CGSmilesFragmentIR(name='', body='')

Fragment definition.

Maps a fragment name to its SMILES or CGSmiles representation.

CGSmilesGraphIR `dataclass` ¶

CGSmilesGraphIR(nodes=list(), bonds=list())

Coarse-grained graph representation.

Represents a molecular graph with CG nodes and bonds.

CGSmilesIR `dataclass` ¶

CGSmilesIR(base_graph=CGSmilesGraphIR(), fragments=list())

Root-level IR for CGSmiles parser.

Represents a complete CGSmiles string with base graph and fragment definitions. This is the output of the CGSmiles parser.

CGSmilesNodeIR `dataclass` ¶

CGSmilesNodeIR(id=_generate_id(), label='', annotations=dict())

Intermediate representation for a CGSmiles node.

A coarse-grained node with a label (e.g., "PEO", "PMA") and optional annotations.

DistributionIR `dataclass` ¶

DistributionIR(name, params=dict())

Generative distribution applied to stochastic objects.

EndGroupIR `dataclass` ¶

EndGroupIR(id=_generate_id(), graph=BigSmilesSubgraphIR(), extras=dict())

Optional end-group fragments that terminate stochastic objects.

GBBondingDescriptorIR `dataclass` ¶

GBBondingDescriptorIR(structural, global_weight=None, pair_weights=None, extras=dict())

Weights associated with a bonding descriptor.

GBStochasticObjectIR `dataclass` ¶

GBStochasticObjectIR(structural, distribution=None)

Wraps a structural stochastic object plus optional distribution.

GBigSmilesComponentIR `dataclass` ¶

GBigSmilesComponentIR(molecule, target_mass=None, mass_is_fraction=False, extras=dict())

Single component entry in a gBigSMILES system.

GBigSmilesMoleculeIR `dataclass` ¶

GBigSmilesMoleculeIR(structure, descriptor_weights=list(), stochastic_metadata=list(), extras=dict())

gBigSMILES molecule = structure + generative metadata.

GBigSmilesSystemIR `dataclass` ¶

GBigSmilesSystemIR(molecules=list(), total_mass=None)

gBigSMILES system describing an ensemble of molecules.

PolymerSegment `dataclass` ¶

PolymerSegment(monomers, composition_type=None, distribution_params=None, end_groups=list(), repeat_units_ir=list(), end_groups_ir=list())

Polymer segment specification.

PolymerSpec `dataclass` ¶

PolymerSpec(segments, topology, start_group_ir=None, end_group_ir=None)

Complete polymer specification.

all_monomers ¶

all_monomers()

Get all structures from all segments.

RepeatUnitIR `dataclass` ¶

RepeatUnitIR(id=_generate_id(), graph=BigSmilesSubgraphIR(), extras=dict())

Repeat unit captured inside a stochastic object.

SmilesAtomIR `dataclass` ¶

SmilesAtomIR(id=_generate_id(), element=None, aromatic=False, charge=None, hydrogens=None, extras=dict())

Intermediate representation for a SMILES atom.

SmilesBondIR `dataclass` ¶

SmilesBondIR(itom, jtom, order=1, stereo=None, id=_generate_id())

Intermediate representation for a SMILES bond.

Bonds directly reference AtomIR objects, not just IDs.

SmilesGraphIR `dataclass` ¶

SmilesGraphIR(atoms=list(), bonds=list())

Root-level IR for SMILES parser.

Represents a molecular graph with atoms and bonds. This is the output of the SMILES parser.

StochasticObjectIR `dataclass` ¶

StochasticObjectIR(id=_generate_id(), terminals=TerminalDescriptorIR(), repeat_units=list(), end_groups=list(), extras=dict())

Container for repeat units, terminals, and end groups.

TerminalDescriptorIR `dataclass` ¶

TerminalDescriptorIR(descriptors=list(), extras=dict())

Terminal brackets that hold descriptors for stochastic objects.

bigsmilesir_to_monomer ¶

bigsmilesir_to_monomer(ir)

Convert BigSmilesMoleculeIR to Atomistic structure (topology only).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately.

Supports BigSMILES with stochastic object: {[<]CC[>]} (ONE repeat unit only)

Parameters:

Name	Type	Description	Default
`ir`	`BigSmilesMoleculeIR`	BigSmilesMoleculeIR from parser	required

Returns:

Type	Description
`Atomistic`	Atomistic structure with ports marked on atoms, NO positions

Raises:

Type	Description
`ValueError`	If IR contains multiple repeat units (use bigsmilesir_to_polymerspec instead)

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> struct = bigsmilesir_to_monomer(ir)
>>> # Ports are marked on atoms: atom["port"] = "<" or ">"

bigsmilesir_to_polymerspec ¶

bigsmilesir_to_polymerspec(ir)

Convert BigSmilesIR to complete polymer specification.

Single responsibility: IR -> PolymerSpec conversion only. Parsing should be done separately.

Extracts monomers and analyzes polymer topology and composition.

Parameters:

Name	Type	Description	Default
`ir`	`BigSmilesMoleculeIR`	BigSmilesIR from parser	required

Returns:

Type	Description
`PolymerSpec`	PolymerSpec with segments, topology, and all monomers

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> spec = bigsmilesir_to_polymerspec(ir)
>>> spec.topology
'homopolymer'

parse_bigsmiles ¶

parse_bigsmiles(src)

Parse a BigSMILES string into BigSmilesMoleculeIR.

This parser accepts BigSMILES syntax including stochastic objects, bond descriptors, and repeat units. It does NOT accept GBigSMILES annotations.

Parameters:

Name	Type	Description	Default
`src`	`str`	BigSMILES string	required

Returns:

Type	Description
`BigSmilesMoleculeIR`	BigSmilesMoleculeIR containing backbone and stochastic objects

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> len(ir.stochastic_objects)
1

parse_cgsmiles ¶

parse_cgsmiles(src)

Parse a CGSmiles string.

Parameters:

Name	Type	Description	Default
`src`	`str`	CGSmiles string (e.g., `"{[#PEO][#PMA]}.{#PEO=[$]COC[$]}"`)	required

Returns:

Type	Description
`CGSmilesIR`	CGSmilesIR with base graph and fragment definitions

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> result = parse_cgsmiles("{[#PEO][#PMA][#PEO]}")
>>> len(result.base_graph.nodes)
3
>>> result = parse_cgsmiles("{[#PEO]|5}")
>>> len(result.base_graph.nodes)
5

parse_gbigsmiles ¶

parse_gbigsmiles(src)

Parse a GBigSMILES string into GBigSmilesSystemIR.

This parser accepts GBigSMILES syntax including all BigSMILES features plus system size specifications and other generative annotations. Always returns GBigSmilesSystemIR, wrapping single molecules in a system structure.

Parameters:

Name	Type	Description	Default
`src`	`str`	GBigSMILES string	required

Returns:

Type	Description
`GBigSmilesSystemIR`	GBigSmilesSystemIR containing the parsed system

Raises:

Type	Description
`ValueError`	if syntax errors detected

Examples:

>>> ir = parse_gbigsmiles("{[<]CC[>]}|5e5|")
>>> isinstance(ir, GBigSmilesSystemIR)
True

parse_smiles ¶

parse_smiles(src)

Parse a SMILES string into SmilesGraphIR or list of SmilesGraphIR.

This parser only accepts pure SMILES syntax. It will reject BigSMILES or GBigSMILES constructs.

For dot-separated SMILES (e.g., "C.C", "CC.O"), returns a list of SmilesGraphIR, one for each disconnected component.

Parameters:

Name	Type	Description	Default
`src`	`str`	SMILES string (may contain dots for mixtures)	required

Returns:

Type	Description
`SmilesGraphIR \| list[SmilesGraphIR]`	SmilesGraphIR for single molecule, or list[SmilesGraphIR] for mixtures

Raises:

Type	Description
`ValueError`	if syntax errors detected or unclosed rings

Examples:

>>> ir = parse_smiles("CCO")
>>> len(ir.atoms)
3
>>> irs = parse_smiles("C.C")
>>> len(irs)
2

smilesir_to_atomistic ¶

smilesir_to_atomistic(ir)

Convert SmilesGraphIR to Atomistic structure (topology only, no 3D coordinates).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately using parse_smiles().

This is a simple conversion function for pure SMILES (no BigSMILES features like ports or descriptors). For BigSMILES with ports, use bigsmilesir_to_monomer() instead.

Parameters:

Name	Type	Description	Default
`ir`	`SmilesGraphIR`	SmilesGraphIR from parse_smiles()	required

Returns:

Type	Description
`Atomistic`	Atomistic structure with atoms and bonds (no 3D coordinates, no ports)

Examples:

>>> from molpy.parser.smiles import parse_smiles, smilesir_to_atomistic
>>> ir = parse_smiles("CCO")
>>> struct = smilesir_to_atomistic(ir)
>>> len(struct.atoms)
3
>>> len(struct.bonds)
2

Parser¶

Quick reference¶

Canonical example¶

Related¶

Full API¶

Convenience layer¶

parser ¶

PolymerSegment dataclass ¶

PolymerSpec dataclass ¶

all_monomers ¶

SmartsParser ¶

parse_smarts ¶

bigsmilesir_to_monomer ¶

bigsmilesir_to_polymerspec ¶

parse_bigsmiles ¶

parse_cgsmiles ¶

parse_gbigsmiles ¶

parse_mixture ¶

parse_molecule ¶

parse_monomer ¶

parse_polymer ¶

parse_smarts ¶

parse_smiles ¶

smilesir_to_atomistic ¶

SMARTS¶

smarts ¶

AtomExpressionIR dataclass ¶

AtomPrimitiveIR dataclass ¶

SmartsAtomIR dataclass ¶

SmartsBondIR dataclass ¶

SmartsIR dataclass ¶

SmartsParser ¶

parse_smarts ¶

SmartsTransformer ¶

and_expression ¶

atom ¶

atom_class ¶

atom_id ¶

+ atomic_num (atomic number)¶

atom_label ¶

atom_symbol ¶

atomic_num ¶

bare_atom ¶

bond ¶

branch ¶

charge ¶

chirality ¶

degree ¶

has_label ¶

hydrogen_count ¶

implicit_and ¶

implicit_hydrogen_count ¶

isotope ¶

isotope_atom ¶

matches_string ¶

neighbor_count ¶

nonlastbranch ¶

not_expression ¶

or_expression ¶

ring_connectivity ¶

ring_count ¶

ring_size ¶

start ¶

valence ¶

weak_and_expression ¶

SMILES / BigSMILES / CGSmiles¶

smiles ¶

BigSmilesMoleculeIR dataclass ¶

BigSmilesSubgraphIR dataclass ¶

BondingDescriptorIR dataclass ¶

CGSmilesBondIR dataclass ¶

CGSmilesFragmentIR dataclass ¶

CGSmilesGraphIR dataclass ¶

CGSmilesIR dataclass ¶

CGSmilesNodeIR dataclass ¶

DistributionIR dataclass ¶

EndGroupIR dataclass ¶

GBBondingDescriptorIR dataclass ¶

GBStochasticObjectIR dataclass ¶

GBigSmilesComponentIR dataclass ¶

PolymerSegment `dataclass` ¶

PolymerSpec `dataclass` ¶

AtomExpressionIR `dataclass` ¶

AtomPrimitiveIR `dataclass` ¶

SmartsAtomIR `dataclass` ¶

SmartsBondIR `dataclass` ¶

SmartsIR `dataclass` ¶

BigSmilesMoleculeIR `dataclass` ¶

BigSmilesSubgraphIR `dataclass` ¶

BondingDescriptorIR `dataclass` ¶

CGSmilesBondIR `dataclass` ¶

CGSmilesFragmentIR `dataclass` ¶

CGSmilesGraphIR `dataclass` ¶

CGSmilesIR `dataclass` ¶

CGSmilesNodeIR `dataclass` ¶

DistributionIR `dataclass` ¶

EndGroupIR `dataclass` ¶

GBBondingDescriptorIR `dataclass` ¶

GBStochasticObjectIR `dataclass` ¶

GBigSmilesComponentIR `dataclass` ¶

GBigSmilesMoleculeIR `dataclass` ¶

GBigSmilesSystemIR `dataclass` ¶

PolymerSegment `dataclass` ¶

PolymerSpec `dataclass` ¶

RepeatUnitIR `dataclass` ¶

SmilesAtomIR `dataclass` ¶

SmilesBondIR `dataclass` ¶

SmilesGraphIR `dataclass` ¶

StochasticObjectIR `dataclass` ¶

TerminalDescriptorIR `dataclass` ¶