This short tutorial will cover running ABISMAL on NERSC in order to process time-resolved crystallography data.
@kmdalton maintains an ABISMAL install on NERSC for LCLS use:
source /global/common/software/lcls/abismal/setup.sh
Set up phenix refinement
Reference PDB
First let’s make a set of R-free flags and a reference mtz:
wget https://files.rcsb.org/download/7KQO-sf.cif
wget https://files.rcsb.org/download/7KQO.pdb
gemmi cif2mtz 7KQO-sf.cif 7KQO-sf.mtz
Run the following python script
import numpy as np
import reciprocalspaceship as rs
input_cif="7KQO-sf.cif"
mtz_out="rfree.mtz"
dmin = 0.8 #implausibly high resolution
ds = rs.read_cif(input_cif)
h,k,l = rs.utils.generate_reciprocal_asu(ds.cell, ds.spacegroup, dmin=dmin).T
rkey = 'FreeR_flag'
rfree = rs.DataSet({
'H' : h,
'K' : k,
'L' : l,
rkey : np.random.choice(ds[rkey].unique(), len(h)),
},
cell=ds.cell,
spacegroup=ds.spacegroup,
merged=True
).infer_mtz_dtypes().set_index(["H", "K", "L"])
rfree.loc[ds.index, rkey] = ds[rkey]
rfree.write_mtz(mtz_out)
This will create an rfree.mtz which we can use in all the subsequent refinements. Importantly, it will be an extended resolution version of the one that was actually used in the PDB deposition. Now we can set up an eff file which will tell phenix how to do refinement for this sample. Here’s a pretty basic eff file that does rigid body and isotropic b-factor refinement.
data_manager {
miller_array {
file = "$MTZFILE"
labels {
name = "mtz:F,SIGF"
array_type = unknown *amplitude bool complex hendrickson_lattman \
integer intensity nonsense
}
user_selected_labels = "mtz:F,SIGF"
}
miller_array {
file = "$RFREE_FILE"
labels {
name = "R-free-flags"
array_type = unknown amplitude bool complex hendrickson_lattman \
*integer intensity nonsense
}
user_selected_labels = "R-free-flags"
}
fmodel {
xray_data {
outliers_rejection = True
french_wilson_scale = False
}
}
default_miller_array = "$MTZFILE"
model {
file = "$PDB_FILE"
}
default_model = "$PDB_FILE"
}
refinement {
refine {
strategy = individual_sites individual_sites_real_space *rigid_body \
*individual_adp group_adp tls occupancies group_anomalous den
}
main {
number_of_macro_cycles = 5
}
modify_start_model {
modify {
adp {
atom_selection = """All"""
set_b_iso = 20.0
}
}
}
}
output {
prefix = """refine"""
serial = 1
}
Note that the locations of the RFREE_FILE and the PDB_FILE are supplied as shell variables. These should be set in the top-level slurm script. The MTZFILE variable will be set by abismal at run time. There will be a section in the merging script that looks like this:
################################################################################
# Configuration for phenix.refine
export EFF=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/refine.eff
# These shell variables are referenced in the eff file
export RFREE_FILE=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/rfree.mtz
export PDB_FILE=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/7KQO.pdb
Merging with Abismal
Abismal can either merge all the inputs together into a single mtz, or it can keep them separate by using the --separate flag. An example of the former (/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/merge.sh):
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --account=lcls_g
#SBATCH -N 1
#SBATCH -c 16
#SBATCH -G 1
#SBATCH --mem=96G
#SBATCH -t 0-12:00
#Set up conda env
source /global/common/software/lcls/abismal/setup.sh
#0 - silent
#1 - progress bar
#2 - one line per epoch
KERAS_VERBOSITY=2
num_epochs=30
################################################################################
# All the output will be written here
OUTDIR=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/results/merge
################################################################################
# Add all the .expt and .refl files you like in this list ; these are just dummy paths for illustration
INPUTS=(
/global/cfs/cdirs/lcls/kmdalton/abismal-benchmarks/data/cxidb_81/reflection_data/figure7/r0011_t016_rg013_chunk000_reintegrated.expt
/global/cfs/cdirs/lcls/kmdalton/abismal-benchmarks/data/cxidb_81/reflection_data/figure7/r0011_t016_rg013_chunk000_reintegrated.refl
/global/cfs/cdirs/lcls/kmdalton/abismal-benchmarks/data/cxidb_81/reflection_data/figure7/r0012_t016_rg013_chunk000_reintegrated.expt
/global/cfs/cdirs/lcls/kmdalton/abismal-benchmarks/data/cxidb_81/reflection_data/figure7/r0012_t016_rg013_chunk000_reintegrated.refl
)
################################################################################
# Used to force same indexing sense for output
REFERENCE_MTZ=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/7KQO-sf.mtz
################################################################################
# Configuration for phenix.refine
export EFF=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/refine.eff
# These shell variables are referenced in the eff file
export RFREE_FILE=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/rfree.mtz
export PDB_FILE=/global/cfs/cdirs/lcls/kmdalton/beamtime/20260415_Mous/reference_data/7KQO.pdb
# Dataset specific params
# cell and space group will be taken from the inputs if they aren't provided
# it's probably better to provide them if you know them
EXPERIMENT_PARAMS=(
--dmin 2.0
--cell 88.720 88.720 39.680 90.000 90.000 90.000
--space-group "P 43"
#--separate #uncomment when merging time-resolved data
#--disable-index-disambiguation #can provide a speed boost if you already ran cosym
)
################################################################################
# Base parameters for all runs
ABISMAL_BASE_PARAMS=(
--keras-verbosity=$KERAS_VERBOSITY #one line per epoch
--studentt-dof=32
--num-cpus=10
--epochs=$num_epochs
)
################################################################################
# Parameters governing post-training crossvalidation run
CCHALF_PARAMS=(
--keras-verbosity=$KERAS_VERBOSITY #one line per epoch
)
echo "Time: $(date)"
echo "Running on node: $HOSTNAME"
nvidia-smi
if [[ -v EFF ]]; then
# Join EFFS with comma
echo "Adding PHENIX config from"
echo " - $EFF"
EXPERIMENT_PARAMS+=( --eff-files $EFF )
fi
# Prepare output dir
echo "Output will be written to..."
echo "- $OUTDIR"
mkdir -p $OUTDIR
cp $0 $OUTDIR/merge.sh
echo "Base parameters from env:"
echo "${ABISMAL_BASE_PARAMS[@]}"
echo "Experiment parameters from env:"
echo "${EXPERIMENT_PARAMS[@]}"
if [[ -v REFERENCE_MTZ ]]; then
echo "Setting reference mtz as: $REFERENCE_MTZ"
EXPERIMENT_PARAMS+=( --reference-mtz=$REFERENCE_MTZ )
CCHALF_PARAMS+=( --reference-mtz=$REFERENCE_MTZ )
fi
abismal \
"${ABISMAL_BASE_PARAMS[@]}" \
"${EXPERIMENT_PARAMS[@]}" \
-o $OUTDIR \
${INPUTS[@]}
echo "################################################################################"
echo "# Training ended... starting CChalf calculation"
echo "################################################################################"
cd $OUTDIR
checkpoint_file=`ls -t epoch_*.keras | head -1`
abismal.cchalf \
"${CCHALF_PARAMS[@]}" \
--sf-init epoch_0.keras \
datamanager.yml \
$checkpoint_file
You might also want to use a glob string to set the inputs, for instance
INPUTS=(`ls $ABISMAL_BENCHMARKS/data/cxidb_81/reflection_data/figure7/*.{expt,refl}`)
as I did in abismal-benchmarks.
Merging Time-Resolved Data with Abismal
Abismal needs different time points to be in different files. For DIALS, this means you need to first run dials.combine_experiments to pool the reflection tables from each time point into the same file. With that done, you need to modify the INPUTS variable
################################################################################
# Add all the .expt and .refl files you like in this list ; these are just dummy paths for illustration
INPUTS=(
/path/to/data/time_0.expt
/path/to/data/time_0.refl
/path/to/data/time_1.expt
/path/to/data/time_1.refl
/path/to/data/time_2.expt
/path/to/data/time_2.refl
)
and provide --separate as in
EXPERIMENT_PARAMS=(
--dmin 2.0
--cell 88.720 88.720 39.680 90.000 90.000 90.000
--space-group "P 43"
--separate #uncomment when merging time-resolved data
#--disable-index-disambiguation #can provide a speed boost if you already ran cosym
)