Tutorial: High-Throughput Screening¶

This tutorial covers systematic screening of compositions, defects, and adsorption configurations.

Learning Objectives¶

Set up parameter space screening
Manage large numbers of calculations
Analyze screening results

Overview¶

High-throughput screening automates exploration of: - Vacancy concentrations - Nitrogen fractions - Dopant elements - Adsorption sites

Step 1: Define Parameter Space¶

from nh3sofc.workflows import ScreeningWorkflow

workflow = ScreeningWorkflow(
    base_structure=surface,
    parameter_space={
        "vacancy_concentration": [0.0, 0.05, 0.10, 0.15, 0.20],
        "nitrogen_fraction": [0.5, 0.67, 0.75],
    },
    work_dir="./oxynitride_screening",
    encut=520,
    kspacing=0.03,
)

This creates 5 × 3 = 15 calculations.

Step 2: Setup Calculations¶

calc_dirs = workflow.setup()
print(f"Created {len(calc_dirs)} calculation directories")

# Directory structure:
# oxynitride_screening/
# ├── vac0.00_N0.50/
# ├── vac0.00_N0.67/
# ├── vac0.00_N0.75/
# ├── vac0.05_N0.50/
# ...

Step 3: Monitor Progress¶

status = workflow.get_status()

for name, state in status.items():
    print(f"{name}: {state}")

# Output:
# vac0.00_N0.50: completed
# vac0.00_N0.67: running
# vac0.05_N0.50: pending
# ...

Step 4: Parse Results¶

import pandas as pd

results = workflow.parse_results()

# Returns DataFrame with columns:
# - vacancy_concentration
# - nitrogen_fraction
# - energy
# - converged
# - ...

print(results.head())

Step 5: Find Optimal¶

optimal = workflow.find_optimal(metric="energy")
print(f"Optimal parameters: {optimal}")

# Or custom criteria
best = results.loc[results["energy"].idxmin()]
print(f"Lowest energy: {best}")

Specialized Screening¶

Composition Screening¶

from nh3sofc.workflows import CompositionScreening

workflow = CompositionScreening(
    base_structure=surface,
    elements_to_vary={
        "La": ["La", "Sr", "Ba", "Ca"],
        "V": ["V", "Ti", "Nb", "Ta"],
    },
    work_dir="./dopant_screening",
)

workflow.setup()
# Creates 4 × 4 = 16 calculations

Adsorbate Screening¶

from nh3sofc.workflows import AdsorbateScreening

workflow = AdsorbateScreening(
    surface=surface,
    adsorbates=["NH3", "NH2", "NH", "N", "H"],
    sites=["ontop_La", "ontop_V", "bridge", "hollow"],
    work_dir="./adsorbate_screening",
)

paths = workflow.setup()
# Creates 5 × 4 = 20 calculations

Generating Adsorbate Configurations¶

For fine-grained control over adsorbate placement, use AdsorbatePlacer directly. All placement methods use uniform sampling on SO(3) for unbiased molecular orientations, ensuring proper coverage of configurational space.

from nh3sofc.structure import AdsorbatePlacer, SlabStructure

# Load surface
slab = SlabStructure.from_file("surface.vasp")
placer = AdsorbatePlacer(slab)

# Method 1: Random placement with random orientations
configs = placer.add_random(
    "NH3",
    n_configs=20,
    height=2.0,
    random_seed=42,
)
# Generates 20 configurations with uniformly sampled positions and orientations

# Method 2: Grid-based placement with multiple orientations per point
configs = placer.add_grid(
    "NH3",
    grid_size=(3, 3),    # 3x3 grid of positions
    orientations=4,       # 4 random orientations per grid point
    height=2.0,
    random_seed=42,
)
# Generates 3 × 3 × 4 = 36 configurations

# Method 3: Site-specific placement (most physically meaningful)
# Step 1: Identify surface atoms (within 2 Å of topmost atom)
# Step 2: Filter to only La and V atoms among those surface atoms
configs = placer.add_on_site(
    "NH3",
    atom_types=["La", "V"],  # Select La/V from surface atoms only
    n_orientations=5,         # 5 orientations per site
    height=2.0,
    layer_tolerance=2.0,      # Top bilayer (default for perovskites)
    random_seed=42,
)
# Places above La and V atoms in the TOP BILAYER only
# Bulk La/V atoms are automatically excluded

# For only the topmost layer (e.g., just VO2 termination)
configs = placer.add_on_site(
    "NH3",
    atom_types=["V"],
    n_orientations=3,
    layer_tolerance=1.0,      # Only topmost ~1 Å
    random_seed=42,
)

# Method 4: Collision-aware random placement
configs = placer.add_with_collision(
    "NH3",
    n_configs=10,
    min_distance=2.0,  # Minimum atom-atom distance
    height=2.0,
    random_seed=42,
)

Rotation Sampling¶

All methods apply uniform random rotations using proper SO(3) sampling:

# The rotation uses: β = arccos(1 - 2u) for uniform spherical coverage
# This avoids clustering near the "poles" that occurs with naive sampling

# For reproducibility, always set random_seed
configs_a = placer.add_random("NH3", n_configs=10, random_seed=42)
configs_b = placer.add_random("NH3", n_configs=10, random_seed=42)
# configs_a and configs_b are identical

Filtering Unique Configurations¶

After generating many configurations, filter duplicates based on adsorbate-only RMSD:

from nh3sofc.structure.adsorbates import filter_unique_configs, save_configs

# Remove duplicates (compares only adsorbate atoms, not the slab)
# n_slab_atoms = number of atoms BEFORE adding adsorbate
n_slab_atoms = len(slab.atoms)
unique = filter_unique_configs(
    configs,
    threshold=0.5,        # RMSD threshold in Angstroms
    n_slab_atoms=n_slab_atoms
)
print(f"Filtered {len(configs)} → {len(unique)} unique configurations")

# Save for calculations
paths = save_configs(unique, output_dir="./adsorbate_configs", format="vasp")

Why n_slab_atoms? Adsorbate atoms are appended to the end of the structure. This parameter tells the filter to compare only atoms at index n_slab_atoms onwards, ignoring the identical slab atoms that would otherwise dominate the RMSD.

Visualizing Results¶

import matplotlib.pyplot as plt

results = workflow.parse_results()

# Heatmap
pivot = results.pivot(
    index="vacancy_concentration",
    columns="nitrogen_fraction",
    values="energy"
)

plt.figure(figsize=(8, 6))
plt.imshow(pivot.values, cmap='RdYlGn_r')
plt.colorbar(label='Energy (eV)')
plt.xlabel('Nitrogen Fraction')
plt.ylabel('Vacancy Concentration')
plt.savefig('screening_heatmap.png', dpi=150)

Complete Example¶

from ase.io import read
from nh3sofc.workflows import ScreeningWorkflow

# 1. Load base structure (POSCAR format)
surface = read("work/surfaces/LaVO3_001/surface.vasp", format="vasp")

# 2. Define screening with meaningful work directory
workflow = ScreeningWorkflow(
    base_structure=surface,
    parameter_space={
        "vacancy_concentration": [0.0, 0.10, 0.20],
        "nitrogen_fraction": [0.5, 0.67],
    },
    work_dir="work/screening/LaVON_composition",
    encut=520,
)

# 3. Setup (generates POSCAR files with atoms sorted by element)
workflow.setup()

# 4. After completion, analyze
results = workflow.parse_results()

# 5. Find optimal
optimal = workflow.find_optimal("energy")
print(f"Best: {optimal}")

# 6. Save results
results.to_csv("work/screening/LaVON_composition/results.csv")

Best Practices¶

Start small - Test with subset before full screening
Use database - Store all results in ASE database
Check convergence - Verify all calculations completed
Automate submission - Use job arrays for HPC

Next Steps¶

Surface Comparison - Rank screened surfaces
MACE Force Fields - Use ML for faster screening