Skip to content

Parser

Grammar-based parsing for chemical string notations. Convenience functions at mp.parser.*.

Quick reference

Function Input Output Use when
parse_molecule(s) SMILES Atomistic One specific molecule
parse_mixture(s) dot-separated SMILES list[Atomistic] Multi-component ([Li+].[F-])
parse_monomer(s) BigSMILES Atomistic (with ports) Repeat unit with </>/$ markers
parse_polymer(s) BigSMILES PolymerSpec Multi-monomer specification
parse_smarts(s) SMARTS SmartsIR Pattern matching / typification
parse_smiles(s) SMILES SmilesGraphIR IR-level inspection
parse_bigsmiles(s) BigSMILES BigSmilesMoleculeIR IR-level BigSMILES inspection
parse_cgsmiles(s) CGSmiles CGSmilesIR Topology architecture graphs
parse_gbigsmiles(s) GBigSMILES GBigSmilesSystemIR System specs with distributions

Canonical example

import molpy as mp

mol = mp.parser.parse_molecule("CCO")           # Atomistic
ions = mp.parser.parse_mixture("[Li+].[F-]")    # [Atomistic, Atomistic]
monomer = mp.parser.parse_monomer("{[][<]CCO[>][]}") # Atomistic with ports
spec = mp.parser.parse_polymer("{[<]CC[>],[<]CC(C)[>]}") # PolymerSpec
  • smilesir_to_atomistic — SMILES IR → Atomistic
  • bigsmilesir_to_monomer — BigSMILES IR → Atomistic
  • bigsmilesir_to_polymerspec — BigSMILES IR → PolymerSpec
  • Guide: Parsing Chemistry

Full API

Convenience layer

parser

Unified parser API for SMILES, BigSMILES, GBigSMILES, CGSmiles, and SMARTS.

Convenience wrappers live here so downstream code can do::

from molpy.parser import parse_molecule, parse_polymer, parse_smarts

PolymerSegment dataclass

PolymerSegment(monomers, composition_type=None, distribution_params=None, end_groups=list(), repeat_units_ir=list(), end_groups_ir=list())

Polymer segment specification.

PolymerSpec dataclass

PolymerSpec(segments, topology, start_group_ir=None, end_group_ir=None)

Complete polymer specification.

all_monomers
all_monomers()

Get all structures from all segments.

SmartsParser

SmartsParser()

Bases: GrammarParserBase

Main parser for SMARTS patterns.

Usage

parser = SmartsParser() ir = parser.parse_smarts("[#6]") ir = parser.parse_smarts("c1ccccc1") ir = parser.parse_smarts("[C,N,O]")

parse_smarts
parse_smarts(smarts)

Parse SMARTS string into SmartsIR.

Parameters:

Name Type Description Default
smarts str

SMARTS pattern string

required

Returns:

Type Description
SmartsIR

SmartsIR representing the pattern

Raises:

Type Description
ValueError

if parsing fails or rings are unclosed

Examples:

>>> parser = SmartsParser()
>>> ir = parser.parse_smarts("C")
>>> len(ir.atoms)
1
>>> ir = parser.parse_smarts("[#6]")
>>> ir.atoms[0].expression.children[0].type
'atomic_num'

bigsmilesir_to_monomer

bigsmilesir_to_monomer(ir)

Convert BigSmilesMoleculeIR to Atomistic structure (topology only).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately.

Supports BigSMILES with stochastic object: {[<]CC[>]} (ONE repeat unit only)

Parameters:

Name Type Description Default
ir BigSmilesMoleculeIR

BigSmilesMoleculeIR from parser

required

Returns:

Type Description
Atomistic

Atomistic structure with ports marked on atoms, NO positions

Raises:

Type Description
ValueError

If IR contains multiple repeat units (use bigsmilesir_to_polymerspec instead)

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> struct = bigsmilesir_to_monomer(ir)
>>> # Ports are marked on atoms: atom["port"] = "<" or ">"

bigsmilesir_to_polymerspec

bigsmilesir_to_polymerspec(ir)

Convert BigSmilesIR to complete polymer specification.

Single responsibility: IR -> PolymerSpec conversion only. Parsing should be done separately.

Extracts monomers and analyzes polymer topology and composition.

Parameters:

Name Type Description Default
ir BigSmilesMoleculeIR

BigSmilesIR from parser

required

Returns:

Type Description
PolymerSpec

PolymerSpec with segments, topology, and all monomers

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> spec = bigsmilesir_to_polymerspec(ir)
>>> spec.topology
'homopolymer'

parse_bigsmiles

parse_bigsmiles(src)

Parse a BigSMILES string into BigSmilesMoleculeIR.

This parser accepts BigSMILES syntax including stochastic objects, bond descriptors, and repeat units. It does NOT accept GBigSMILES annotations.

Parameters:

Name Type Description Default
src str

BigSMILES string

required

Returns:

Type Description
BigSmilesMoleculeIR

BigSmilesMoleculeIR containing backbone and stochastic objects

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> len(ir.stochastic_objects)
1

parse_cgsmiles

parse_cgsmiles(src)

Parse a CGSmiles string.

Parameters:

Name Type Description Default
src str

CGSmiles string (e.g., "{[#PEO][#PMA]}.{#PEO=[$]COC[$]}")

required

Returns:

Type Description
CGSmilesIR

CGSmilesIR with base graph and fragment definitions

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> result = parse_cgsmiles("{[#PEO][#PMA][#PEO]}")
>>> len(result.base_graph.nodes)
3
>>> result = parse_cgsmiles("{[#PEO]|5}")
>>> len(result.base_graph.nodes)
5

parse_gbigsmiles

parse_gbigsmiles(src)

Parse a GBigSMILES string into GBigSmilesSystemIR.

This parser accepts GBigSMILES syntax including all BigSMILES features plus system size specifications and other generative annotations. Always returns GBigSmilesSystemIR, wrapping single molecules in a system structure.

Parameters:

Name Type Description Default
src str

GBigSMILES string

required

Returns:

Type Description
GBigSmilesSystemIR

GBigSmilesSystemIR containing the parsed system

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> ir = parse_gbigsmiles("{[<]CC[>]}|5e5|")
>>> isinstance(ir, GBigSmilesSystemIR)
True

parse_mixture

parse_mixture(smiles)

Parse a (possibly dot-separated) SMILES string into a list of molecules.

Parameters:

Name Type Description Default
smiles str

SMILES string, components separated by '.'.

required

Returns:

Type Description
'list[Atomistic]'

List of :class:Atomistic structures (always a list, even for one).

parse_molecule

parse_molecule(smiles)

Parse a SMILES string and return a single :class:Atomistic structure.

Parameters:

Name Type Description Default
smiles str

SMILES string for a single molecule (no dots).

required

Returns:

Type Description
'Atomistic'

class:Atomistic structure.

parse_monomer

parse_monomer(bigsmiles)

Parse a BigSMILES string and return the first monomer as :class:Atomistic.

Parameters:

Name Type Description Default
bigsmiles str

BigSMILES string.

required

Returns:

Name Type Description
Monomer 'Atomistic'

class:Atomistic structure with port annotations.

parse_polymer

parse_polymer(bigsmiles)

Parse a BigSMILES string and return a :class:PolymerSpec.

Parameters:

Name Type Description Default
bigsmiles str

BigSMILES string.

required

Returns:

Type Description
PolymerSpec

class:PolymerSpec describing segments, topology, and monomers.

parse_smarts

parse_smarts(pattern)

Parse a SMARTS pattern string into :class:SmartsIR.

This is a thin wrapper around SmartsParser().parse_smarts(pattern).

Parameters:

Name Type Description Default
pattern str

SMARTS string.

required

Returns:

Name Type Description
Parsed 'SmartsIR'

class:SmartsIR representation.

parse_smiles

parse_smiles(src)

Parse a SMILES string into SmilesGraphIR or list of SmilesGraphIR.

This parser only accepts pure SMILES syntax. It will reject BigSMILES or GBigSMILES constructs.

For dot-separated SMILES (e.g., "C.C", "CC.O"), returns a list of SmilesGraphIR, one for each disconnected component.

Parameters:

Name Type Description Default
src str

SMILES string (may contain dots for mixtures)

required

Returns:

Type Description
SmilesGraphIR | list[SmilesGraphIR]

SmilesGraphIR for single molecule, or list[SmilesGraphIR] for mixtures

Raises:

Type Description
ValueError

if syntax errors detected or unclosed rings

Examples:

>>> ir = parse_smiles("CCO")
>>> len(ir.atoms)
3
>>> irs = parse_smiles("C.C")
>>> len(irs)
2

smilesir_to_atomistic

smilesir_to_atomistic(ir)

Convert SmilesGraphIR to Atomistic structure (topology only, no 3D coordinates).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately using parse_smiles().

This is a simple conversion function for pure SMILES (no BigSMILES features like ports or descriptors). For BigSMILES with ports, use bigsmilesir_to_monomer() instead.

Parameters:

Name Type Description Default
ir SmilesGraphIR

SmilesGraphIR from parse_smiles()

required

Returns:

Type Description
Atomistic

Atomistic structure with atoms and bonds (no 3D coordinates, no ports)

Examples:

>>> from molpy.parser.smiles import parse_smiles, smilesir_to_atomistic
>>> ir = parse_smiles("CCO")
>>> struct = smilesir_to_atomistic(ir)
>>> len(struct.atoms)
3
>>> len(struct.bonds)
2

SMARTS

smarts

AtomExpressionIR dataclass

AtomExpressionIR(op, children=list(), id=(lambda: id(AtomExpressionIR))())

Represents logical expressions combining atom primitives.

Operators
  • 'and' (&): high-priority AND
  • 'or' (,): OR
  • 'weak_and' (;): low-priority AND
  • 'not' (!): negation

Examples:

  • AtomExpressionIR(op='and', children=[primitive1, primitive2])
  • AtomExpressionIR(op='not', children=[primitive])

AtomPrimitiveIR dataclass

AtomPrimitiveIR(type, value=None, id=(lambda: id(AtomPrimitiveIR))())

Represents a single primitive atom pattern in SMARTS.

Examples:

  • symbol='C' (carbon atom)
  • atomic_num=6 (atomic number 6)
  • neighbor_count=3 (X3, exactly 3 neighbors)
  • ring_size=6 (r6, in 6-membered ring)
  • ring_count=2 (R2, in exactly 2 rings)
  • has_label='%atomA' (has label %atomA)
  • matches_smarts=SmartsIR(...) (recursive SMARTS)

SmartsAtomIR dataclass

SmartsAtomIR(expression, label=None, id=(lambda: id(SmartsAtomIR))())

Represents a complete SMARTS atom with expression and optional label.

Attributes:

Name Type Description
expression AtomExpressionIR | AtomPrimitiveIR

The atom pattern expression

label int | None

Optional numeric label for ring closures or references

SmartsBondIR dataclass

SmartsBondIR(itom, jtom, bond_type='implicit')

Represents a bond between two SMARTS atoms.

In SMARTS, bonds are implicit (single or aromatic) unless specified. Explicit bond types can be specified between atoms.

SmartsIR dataclass

SmartsIR(atoms=list(), bonds=list())

Complete SMARTS pattern intermediate representation.

Attributes:

Name Type Description
atoms list[SmartsAtomIR]

List of all atoms in the pattern

bonds list[SmartsBondIR]

List of all bonds in the pattern

SmartsParser

SmartsParser()

Bases: GrammarParserBase

Main parser for SMARTS patterns.

Usage

parser = SmartsParser() ir = parser.parse_smarts("[#6]") ir = parser.parse_smarts("c1ccccc1") ir = parser.parse_smarts("[C,N,O]")

parse_smarts
parse_smarts(smarts)

Parse SMARTS string into SmartsIR.

Parameters:

Name Type Description Default
smarts str

SMARTS pattern string

required

Returns:

Type Description
SmartsIR

SmartsIR representing the pattern

Raises:

Type Description
ValueError

if parsing fails or rings are unclosed

Examples:

>>> parser = SmartsParser()
>>> ir = parser.parse_smarts("C")
>>> len(ir.atoms)
1
>>> ir = parser.parse_smarts("[#6]")
>>> ir.atoms[0].expression.children[0].type
'atomic_num'

SmartsTransformer

SmartsTransformer()

Bases: Transformer

Transforms Lark parse tree into SmartsIR.

Handles
  • Atom primitives (symbols, atomic numbers, properties)
  • Logical expressions (AND, OR, NOT, weak AND)
  • Branches
  • Ring closures
  • Recursive SMARTS patterns
and_expression
and_expression(children)

Process high-priority AND expression (&).

atom
atom(children)

Process complete atom: [expression] or bare_atom, with optional label.

Returns:

Type Description
SmartsAtomIR

SmartsAtomIR

atom_class
atom_class(children)

Extract atom class name.

atom_id
atom_id(children)

Process atom identifier (primitive).

Can be
  • atom_symbol
  • + atomic_num (atomic number)
  • $( + SMARTS + ) (recursive SMARTS)
  • %label (has label)
  • X + N? (neighbor count, optional number)
  • x + N? (ring connectivity, optional number)
  • r + N? (ring size, optional number)
  • R + N? (ring count, optional number)
  • H + N? (hydrogen count, optional number)
  • h + N? (implicit hydrogen count, optional number)
  • D + N? (degree, optional number)
  • v + N? (valence, optional number)
  • +/- + N? (charge)
  • a (aromatic)
  • A (aliphatic)
  • @ / @@ (chirality)
  • NUM + atom_symbol (isotope)
  • atom_class (atom class reference)
atom_label
atom_label(children)

Extract atom label (numeric).

atom_symbol
atom_symbol(children)

Process atom symbol (element or wildcard).

atomic_num
atomic_num(children)

Extract atomic number.

bare_atom
bare_atom(children)

Process bare (unbracketed) atom: element symbol or atom class.

bond
bond(children)

Extract bond type (may be negated with !).

branch
branch(children)

Process branch: the content inside or after chain. This just returns the SmartsIR from _string.

charge
charge(children)

Extract charge (+ or -).

chirality
chirality(children)

Extract chirality (@ or @@).

degree
degree(children)

Extract degree.

has_label
has_label(children)

Extract label.

hydrogen_count
hydrogen_count(children)

Extract explicit hydrogen count.

implicit_and
implicit_and(children)

Process implicit AND: adjacent primitives without operator (e.g. #6X3r5).

implicit_hydrogen_count
implicit_hydrogen_count(children)

Extract implicit hydrogen count.

isotope
isotope(children)

Extract isotope mass number.

isotope_atom
isotope_atom(children)

Process isotope-prefixed atom (e.g. 2H for deuterium).

matches_string
matches_string(children)

Extract recursive SMARTS pattern.

neighbor_count
neighbor_count(children)

Extract neighbor count.

nonlastbranch
nonlastbranch(children)

Process non-last branch: (bond? branch_content).

not_expression
not_expression(children)

Process NOT expression (!).

or_expression
or_expression(children)

Process OR expression (,).

ring_connectivity
ring_connectivity(children)

Extract ring connectivity.

ring_count
ring_count(children)

Extract ring count.

ring_size
ring_size(children)

Extract ring size.

start
start(children)

Entry point: process complete SMARTS pattern.

The grammar produces a tree like: start atom ... atom ...

We need to build the IR from this flat or nested structure.

valence
valence(children)

Extract valence.

weak_and_expression
weak_and_expression(children)

Process low-priority AND expression (;).

SMILES / BigSMILES / CGSmiles

smiles

SMILES, BigSMILES, GBigSMILES, and CGSmiles parsers.

This module provides four explicit parser APIs: - parse_smiles: Parse pure SMILES strings - parse_bigsmiles: Parse BigSMILES strings - parse_gbigsmiles: Parse GBigSMILES strings - parse_cgsmiles: Parse CGSmiles strings

Each parser uses its own dedicated grammar and transformer.

BigSmilesMoleculeIR dataclass

BigSmilesMoleculeIR(backbone=BigSmilesSubgraphIR(), stochastic_objects=list())

Top-level structural IR for BigSMILES strings.

BigSmilesSubgraphIR dataclass

BigSmilesSubgraphIR(atoms=list(), bonds=list(), descriptors=list())

Structural fragment that carries atoms, bonds, and descriptors.

BondingDescriptorIR dataclass

BondingDescriptorIR(id=_generate_id(), symbol=None, label=None, bond_order=1, role='internal', anchor_atom=None, non_covalent_context=None, extras=dict(), position_hint=None)

Standalone descriptor node for bonding points.

Per BigSMILES v1.1: bonding descriptors attach to atoms within repeat units. The anchor_atom field tracks which atom this descriptor is attached to. If anchor_atom is None, this is a terminal bonding descriptor at the stochastic object boundary.

CGSmilesBondIR dataclass

CGSmilesBondIR(node_i, node_j, order=1, id=_generate_id())

Intermediate representation for a CGSmiles bond.

Bonds directly reference NodeIR objects, not just IDs.

CGSmilesFragmentIR dataclass

CGSmilesFragmentIR(name='', body='')

Fragment definition.

Maps a fragment name to its SMILES or CGSmiles representation.

CGSmilesGraphIR dataclass

CGSmilesGraphIR(nodes=list(), bonds=list())

Coarse-grained graph representation.

Represents a molecular graph with CG nodes and bonds.

CGSmilesIR dataclass

CGSmilesIR(base_graph=CGSmilesGraphIR(), fragments=list())

Root-level IR for CGSmiles parser.

Represents a complete CGSmiles string with base graph and fragment definitions. This is the output of the CGSmiles parser.

CGSmilesNodeIR dataclass

CGSmilesNodeIR(id=_generate_id(), label='', annotations=dict())

Intermediate representation for a CGSmiles node.

A coarse-grained node with a label (e.g., "PEO", "PMA") and optional annotations.

DistributionIR dataclass

DistributionIR(name, params=dict())

Generative distribution applied to stochastic objects.

EndGroupIR dataclass

EndGroupIR(id=_generate_id(), graph=BigSmilesSubgraphIR(), extras=dict())

Optional end-group fragments that terminate stochastic objects.

GBBondingDescriptorIR dataclass

GBBondingDescriptorIR(structural, global_weight=None, pair_weights=None, extras=dict())

Weights associated with a bonding descriptor.

GBStochasticObjectIR dataclass

GBStochasticObjectIR(structural, distribution=None)

Wraps a structural stochastic object plus optional distribution.

GBigSmilesComponentIR dataclass

GBigSmilesComponentIR(molecule, target_mass=None, mass_is_fraction=False, extras=dict())

Single component entry in a gBigSMILES system.

GBigSmilesMoleculeIR dataclass

GBigSmilesMoleculeIR(structure, descriptor_weights=list(), stochastic_metadata=list(), extras=dict())

gBigSMILES molecule = structure + generative metadata.

GBigSmilesSystemIR dataclass

GBigSmilesSystemIR(molecules=list(), total_mass=None)

gBigSMILES system describing an ensemble of molecules.

PolymerSegment dataclass

PolymerSegment(monomers, composition_type=None, distribution_params=None, end_groups=list(), repeat_units_ir=list(), end_groups_ir=list())

Polymer segment specification.

PolymerSpec dataclass

PolymerSpec(segments, topology, start_group_ir=None, end_group_ir=None)

Complete polymer specification.

all_monomers
all_monomers()

Get all structures from all segments.

RepeatUnitIR dataclass

RepeatUnitIR(id=_generate_id(), graph=BigSmilesSubgraphIR(), extras=dict())

Repeat unit captured inside a stochastic object.

SmilesAtomIR dataclass

SmilesAtomIR(id=_generate_id(), element=None, aromatic=False, charge=None, hydrogens=None, extras=dict())

Intermediate representation for a SMILES atom.

SmilesBondIR dataclass

SmilesBondIR(itom, jtom, order=1, stereo=None, id=_generate_id())

Intermediate representation for a SMILES bond.

Bonds directly reference AtomIR objects, not just IDs.

SmilesGraphIR dataclass

SmilesGraphIR(atoms=list(), bonds=list())

Root-level IR for SMILES parser.

Represents a molecular graph with atoms and bonds. This is the output of the SMILES parser.

StochasticObjectIR dataclass

StochasticObjectIR(id=_generate_id(), terminals=TerminalDescriptorIR(), repeat_units=list(), end_groups=list(), extras=dict())

Container for repeat units, terminals, and end groups.

TerminalDescriptorIR dataclass

TerminalDescriptorIR(descriptors=list(), extras=dict())

Terminal brackets that hold descriptors for stochastic objects.

bigsmilesir_to_monomer

bigsmilesir_to_monomer(ir)

Convert BigSmilesMoleculeIR to Atomistic structure (topology only).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately.

Supports BigSMILES with stochastic object: {[<]CC[>]} (ONE repeat unit only)

Parameters:

Name Type Description Default
ir BigSmilesMoleculeIR

BigSmilesMoleculeIR from parser

required

Returns:

Type Description
Atomistic

Atomistic structure with ports marked on atoms, NO positions

Raises:

Type Description
ValueError

If IR contains multiple repeat units (use bigsmilesir_to_polymerspec instead)

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> struct = bigsmilesir_to_monomer(ir)
>>> # Ports are marked on atoms: atom["port"] = "<" or ">"

bigsmilesir_to_polymerspec

bigsmilesir_to_polymerspec(ir)

Convert BigSmilesIR to complete polymer specification.

Single responsibility: IR -> PolymerSpec conversion only. Parsing should be done separately.

Extracts monomers and analyzes polymer topology and composition.

Parameters:

Name Type Description Default
ir BigSmilesMoleculeIR

BigSmilesIR from parser

required

Returns:

Type Description
PolymerSpec

PolymerSpec with segments, topology, and all monomers

Examples:

>>> from molpy.parser.smiles import parse_bigsmiles
>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> spec = bigsmilesir_to_polymerspec(ir)
>>> spec.topology
'homopolymer'

parse_bigsmiles

parse_bigsmiles(src)

Parse a BigSMILES string into BigSmilesMoleculeIR.

This parser accepts BigSMILES syntax including stochastic objects, bond descriptors, and repeat units. It does NOT accept GBigSMILES annotations.

Parameters:

Name Type Description Default
src str

BigSMILES string

required

Returns:

Type Description
BigSmilesMoleculeIR

BigSmilesMoleculeIR containing backbone and stochastic objects

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> ir = parse_bigsmiles("{[<]CC[>]}")
>>> len(ir.stochastic_objects)
1

parse_cgsmiles

parse_cgsmiles(src)

Parse a CGSmiles string.

Parameters:

Name Type Description Default
src str

CGSmiles string (e.g., "{[#PEO][#PMA]}.{#PEO=[$]COC[$]}")

required

Returns:

Type Description
CGSmilesIR

CGSmilesIR with base graph and fragment definitions

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> result = parse_cgsmiles("{[#PEO][#PMA][#PEO]}")
>>> len(result.base_graph.nodes)
3
>>> result = parse_cgsmiles("{[#PEO]|5}")
>>> len(result.base_graph.nodes)
5

parse_gbigsmiles

parse_gbigsmiles(src)

Parse a GBigSMILES string into GBigSmilesSystemIR.

This parser accepts GBigSMILES syntax including all BigSMILES features plus system size specifications and other generative annotations. Always returns GBigSmilesSystemIR, wrapping single molecules in a system structure.

Parameters:

Name Type Description Default
src str

GBigSMILES string

required

Returns:

Type Description
GBigSmilesSystemIR

GBigSmilesSystemIR containing the parsed system

Raises:

Type Description
ValueError

if syntax errors detected

Examples:

>>> ir = parse_gbigsmiles("{[<]CC[>]}|5e5|")
>>> isinstance(ir, GBigSmilesSystemIR)
True

parse_smiles

parse_smiles(src)

Parse a SMILES string into SmilesGraphIR or list of SmilesGraphIR.

This parser only accepts pure SMILES syntax. It will reject BigSMILES or GBigSMILES constructs.

For dot-separated SMILES (e.g., "C.C", "CC.O"), returns a list of SmilesGraphIR, one for each disconnected component.

Parameters:

Name Type Description Default
src str

SMILES string (may contain dots for mixtures)

required

Returns:

Type Description
SmilesGraphIR | list[SmilesGraphIR]

SmilesGraphIR for single molecule, or list[SmilesGraphIR] for mixtures

Raises:

Type Description
ValueError

if syntax errors detected or unclosed rings

Examples:

>>> ir = parse_smiles("CCO")
>>> len(ir.atoms)
3
>>> irs = parse_smiles("C.C")
>>> len(irs)
2

smilesir_to_atomistic

smilesir_to_atomistic(ir)

Convert SmilesGraphIR to Atomistic structure (topology only, no 3D coordinates).

Single responsibility: IR → Atomistic conversion only. Parsing should be done separately using parse_smiles().

This is a simple conversion function for pure SMILES (no BigSMILES features like ports or descriptors). For BigSMILES with ports, use bigsmilesir_to_monomer() instead.

Parameters:

Name Type Description Default
ir SmilesGraphIR

SmilesGraphIR from parse_smiles()

required

Returns:

Type Description
Atomistic

Atomistic structure with atoms and bonds (no 3D coordinates, no ports)

Examples:

>>> from molpy.parser.smiles import parse_smiles, smilesir_to_atomistic
>>> ir = parse_smiles("CCO")
>>> struct = smilesir_to_atomistic(ir)
>>> len(struct.atoms)
3
>>> len(struct.bonds)
2