Selectors¶
Overview¶
Selectors provide a composable, query-like interface for filtering atoms in Block objects. They implement the MaskPredicate protocol, producing boolean masks that can be combined with logical operators, &, |, ~ to build complex selection criteria.
Design Philosophy
Selectors follow a functional, composable design where each selector is a predicate function that returns a boolean mask. This design enables chaining and combination of selection criteria without intermediate data copies. The MaskPredicate base class provides operator overloading for logical composition, making complex queries readable and efficient.
Integration with MolPy
Selectors work with Block objects, typically the atoms block in a Frame. They integrate with the region system, Region classes also implement MaskPredicate, allowing seamless combination of geometric and property-based selections. Selectors are used throughout MolPy for analysis workflows, visualization filtering, and force field assignment.
Basic Selectors¶
Basic selectors filter atoms by intrinsic properties stored in block columns. ElementSelector matches element symbols, AtomTypeSelector matches type identifiers, and AtomIndexSelector selects by atom indices. These selectors are fast and work with any block containing the required columns.
import molpy as mp
from molpy.core.selector import ElementSelector, AtomTypeSelector, AtomIndexSelector
import numpy as np
# Create a frame with atoms
frame = mp.Frame()
frame["atoms"] = mp.Block(
{
"element": ["C", "C", "H", "H", "O", "N"],
"type": [1, 1, 2, 2, 3, 4],
"x": [0.0, 1.0, 2.0, 3.0, 4.0, 5.0],
"y": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
"z": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
}
)
atoms = frame["atoms"]
# ElementSelector: select by element symbol
sel_c = ElementSelector("C")
carbons = sel_c(atoms) # Returns a Block with selected atoms
print(f"Carbons: {carbons.nrows} atoms")
print(f" Elements: {carbons['element']}")
# AtomTypeSelector: select by type (integer or string)
sel_type2 = AtomTypeSelector(2)
type2_atoms = sel_type2(atoms)
print(f"\nType 2 atoms: {type2_atoms.nrows}")
print(f" Elements: {type2_atoms['element']}")
# AtomIndexSelector: select by indices
sel_indices = AtomIndexSelector([0, 2, 4])
selected = sel_indices(atoms)
print(f"\nSelected by indices [0, 2, 4]: {selected.nrows}")
print(f" Elements: {selected['element']}")
# Get boolean mask instead of Block
mask = sel_c.mask(atoms)
print(f"\nBoolean mask for carbons: {mask}")
print(f" Indices where True: {np.where(mask)[0]}")
Carbons: 2 atoms Elements: ['C' 'C'] Type 2 atoms: 2 Elements: ['H' 'H'] Selected by indices [0, 2, 4]: 0 Elements: [] Boolean mask for carbons: [ True True False False False False] Indices where True: [0 1]
Geometric Selectors¶
Geometric selectors filter atoms based on spatial coordinates. CoordinateRangeSelector selects atoms within a coordinate range along a single axis, useful for slab geometries or spatial partitioning, while DistanceSelector selects atoms within a distance from a reference point, useful for solvation shells or local environment analysis. These selectors require x, y, z columns in the block.
from molpy.core.selector import CoordinateRangeSelector, DistanceSelector
# CoordinateRangeSelector: select by coordinate range along an axis
# Select atoms with 1.0 < x < 4.0
sel_x_range = CoordinateRangeSelector(axis="x", min_value=1.0, max_value=4.0)
x_range_atoms = sel_x_range(atoms)
print(f"Atoms with 1.0 < x < 4.0: {x_range_atoms.nrows}")
print(f" Elements: {x_range_atoms['element']}")
print(f" X coordinates: {x_range_atoms['x']}")
# Select atoms with x > 1.5 (only min_value)
sel_x_min = CoordinateRangeSelector(axis="x", min_value=1.5)
right_side = sel_x_min(atoms)
print(f"\nAtoms with x > 1.5: {right_side['element']}")
# DistanceSelector: select within distance from a point
# Select atoms within 1.5 Å of origin
sel_dist = DistanceSelector(center=[0.0, 0.0, 0.0], max_distance=1.5)
close_atoms = sel_dist(atoms)
print(f"\nAtoms within 1.5 Å of origin: {close_atoms.nrows}")
print(f" Elements: {close_atoms['element']}")
# DistanceSelector with min_distance (shell selection)
# Select atoms between 1.0 and 2.5 Å from point [2.0, 0.0, 0.0]
sel_shell = DistanceSelector(
center=[2.0, 0.0, 0.0],
min_distance=1.0,
max_distance=2.5
)
shell_atoms = sel_shell(atoms)
print(f"\nAtoms in shell (1.0-2.5 Å from [2,0,0]): {shell_atoms.nrows}")
print(f" Elements: {shell_atoms['element']}")
Atoms with 1.0 < x < 4.0: 4 Elements: ['C' 'H' 'H' 'O'] X coordinates: [1. 2. 3. 4.] Atoms with x > 1.5: ['H' 'H' 'O' 'N'] Atoms within 1.5 Å of origin: 2 Elements: ['C' 'C'] Atoms in shell (1.0-2.5 Å from [2,0,0]): 4 Elements: ['C' 'C' 'H' 'O']
Combining Selectors¶
Selectors can be combined using logical operators to build complex queries. The & operator, AND requires both conditions, |, OR requires either condition, and ~, NOT inverts the selection. These operators return new MaskPredicate objects that can be further combined, enabling readable composition of complex selection logic.
# Combine selectors with logical operators
# (Carbon OR Oxygen) AND (x > 0.5)
complex_sel = (ElementSelector("C") | ElementSelector("O")) & CoordinateRangeSelector(
"x", min_value=0.5
)
result = complex_sel(atoms)
print(f"Complex selection (C or O) AND (x > 0.5):")
print(f" Elements: {result['element']}")
print(f" X coordinates: {result['x']}")
# NOT operator: exclude certain elements
no_h = ~ElementSelector("H")
non_hydrogens = no_h(atoms)
print(f"\nNon-hydrogens: {non_hydrogens['element']}")
# Nested combinations: (C or N) AND NOT (x < 1.0)
sel_nested = (ElementSelector("C") | ElementSelector("N")) & ~CoordinateRangeSelector(
"x", max_value=1.0
)
nested_result = sel_nested(atoms)
print(f"\nNested selection (C or N) AND NOT (x < 1.0):")
print(f" Elements: {nested_result['element']}")
# Multiple conditions: heavy atoms in a specific region
heavy_in_region = (
~ElementSelector("H") # Not hydrogen
& CoordinateRangeSelector("x", min_value=1.0, max_value=4.0) # In x range
& DistanceSelector(center=[2.0, 0.0, 0.0], max_distance=2.0) # Near point
)
heavy_result = heavy_in_region(atoms)
print(f"\nHeavy atoms in region: {heavy_result['element']}")
Complex selection (C or O) AND (x > 0.5): Elements: ['C' 'O'] X coordinates: [1. 4.] Non-hydrogens: ['C' 'C' 'O' 'N'] Nested selection (C or N) AND NOT (x < 1.0): Elements: ['N'] Heavy atoms in region: ['C' 'O']
Working with Masks¶
Selectors can return boolean masks instead of filtered Block objects. Masks are useful for indexing operations, combining with NumPy operations, or when you need the original block structure. The mask() method returns a boolean array that can be used with NumPy's boolean indexing.
# Get boolean mask instead of Block
mask = sel_c.mask(atoms)
print(f"Boolean mask for carbons: {mask}")
print(f" Shape: {mask.shape}, dtype: {mask.dtype}")
# Use mask for indexing or NumPy operations
indices = np.where(mask)[0]
print(f" Indices where True: {indices}")
# Apply mask directly to block columns
x_coords = atoms["x"][mask]
print(f" X coordinates of carbons: {x_coords}")
# Combine masks with NumPy operations
mask_c = ElementSelector("C").mask(atoms)
mask_o = ElementSelector("O").mask(atoms)
mask_co = mask_c | mask_o # NumPy boolean OR
print(f"\nCombined mask (C or O): {mask_co}")
print(f" Selected elements: {atoms['element'][mask_co]}")
# Use mask to modify data
# Example: set charge for selected atoms
if "charge" not in atoms:
atoms["charge"] = np.zeros(atoms.nrows)
atoms["charge"][mask_c] = 0.0 # Set carbon charges
atoms["charge"][mask_o] = -0.5 # Set oxygen charges
print(f"\nCharges after selection-based assignment:")
print(f" Elements: {atoms['element']}")
print(f" Charges: {atoms['charge']}")
Boolean mask for carbons: [ True True False False False False] Shape: (6,), dtype: bool Indices where True: [0 1] X coordinates of carbons: [0. 1.] Combined mask (C or O): [ True True False False True False] Selected elements: ['C' 'C' 'O'] Charges after selection-based assignment: Elements: ['C' 'C' 'H' 'H' 'O' 'N'] Charges: [ 0. 0. 0. 0. -0.5 0. ]
Example: Complex Selection Workflow¶
This example demonstrates a realistic workflow: selecting atoms based on multiple criteria, combining selections, and using masks for analysis operations.
import molpy as mp
from molpy.core.selector import ElementSelector, CoordinateRangeSelector, DistanceSelector
import numpy as np
# Create a simple system
frame = mp.Frame()
frame["atoms"] = mp.Block({
"element": ["O", "H", "H", "O", "H", "H", "Na", "Cl"],
"x": np.array([0.0, 0.76, -0.76, 5.0, 5.76, 4.24, 10.0, 12.0]),
"y": np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
"z": np.array([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]),
})
atoms = frame["atoms"]
# Select oxygen atoms
sel_oxygen = ElementSelector("O")
oxygens = sel_oxygen(atoms)
print(f"Oxygen atoms: {oxygens.nrows}")
print(f" Elements: {list(oxygens['element'])}")
# Select atoms in a region
sel_region = CoordinateRangeSelector("x", min_value=8.0)
atoms_in_region = sel_region(atoms)
print(f"\nAtoms with x > 8.0: {atoms_in_region.nrows}")
print(f" Elements: {list(atoms_in_region['element'])}")
# Select atoms within distance
sel_distance = DistanceSelector(center=[0.0, 0.0, 0.0], max_distance=3.0)
nearby = sel_distance(atoms)
print(f"\nAtoms within 3.0 Å of origin: {nearby.nrows}")
print(f" Elements: {list(nearby['element'])}")
# Combine selectors
sel_h = ElementSelector("H")
sel_near = DistanceSelector(center=[0.0, 0.0, 0.0], max_distance=1.0)
h_near_origin = (sel_h & sel_near)(atoms)
print(f"\nH atoms near origin: {h_near_origin.nrows}")
Oxygen atoms: 2
Elements: [np.str_('O'), np.str_('O')]
Atoms with x > 8.0: 2
Elements: [np.str_('Na'), np.str_('Cl')]
Atoms within 3.0 Å of origin: 3
Elements: [np.str_('O'), np.str_('H'), np.str_('H')]
H atoms near origin: 2
Summary¶
Selectors implement a composable “predicate → mask” pattern over Block columns., Use ElementSelector / AtomTypeSelector / AtomIndexSelector for property-based selection., Use CoordinateRangeSelector / DistanceSelector for geometry-based selection., and Combine selectors with &, |, and ~ to express complex queries without manual loops..