Extending Free R Flags

I have a phenix refine output mtz which has an R-free flag column, but it is not complete and it doesn’t extend to the resolution range I’d like. How can I use rs to make a new mtz with r-free flags that are complete out to a higher resolution cutoff? (@dorismai)

1 Like

Attached in the end is a minimal example script using the reciprocalspaceship package.

However, note that some program (e.g. phenix.fmodel) caps the maximum number of rfree. This can generate reference rfree flags that disrespects the user input on the r-free percentage. Consequently, this would affect the extension here (fail the sanity checks in the commented part). Next step is to implement resampling in a way that preserves the existing r-free>0 rows.

import reciprocalspaceship as rs

# User input definitions
ref_refine_mtz_path = '/path/to/reference/mtz/with/rfree/flags'
rfree_percentage = 0.03
anomalous = False
dmin = 0.89

# Load the reference mtz file
ref_refine_mtz = rs.read_mtz(ref_refine_mtz_path)
ref_refine_mtz.hkl_to_asu(inplace=True, anomalous=anomalous)
# Generate the full ASU
full_asu = rs.utils.generate_reciprocal_asu(ref_refine_mtz.cell,
                                      ref_refine_mtz.spacegroup,
                                      dmin=dmin,
                                      anomalous=anomalous)
ASU = rs.DataSet(
            {
                "H": full_asu[:, 0],
                "K": full_asu[:, 1],
                "L": full_asu[:, 2],
            },
            cell=ref_refine_mtz.cell,
            spacegroup=ref_refine_mtz.spacegroup,
        ).set_index(["H", "K", "L"])
# Join the reference mtz into the full ASU
full = ref_refine_mtz.join(ASU, how="right")

# Calculate the number of new R-free-flags to add
num_rfree = (full['R-free-flags'] > 0).sum() # Note that some program generates non-binary R-free-flags
target_rfree = int(rfree_percentage * len(full))
extra_rfree = target_rfree - num_rfree
# You might want sanity checks on the numbers:
# if extra_rfree < 0:
#     raise ValueError(f"Percentage of R-free-flags within dmin={dmin.round(2)} already exceeds the limit {rfree_percentage}")
# if na_mask.sum() < extra_rfree:
#     raise ValueError(f"Not enough extended rows to add more R-free-flags to meet the target percentage.

# Sample additional R-free-flags in the extended part (which is currently nan)
na_mask = full['R-free-flags'].isna()
chosen_rows = full[na_mask].sample(n=extra_rfree).index
full.loc[chosen_rows, 'R-free-flags'] = 1
full.loc[full['R-free-flags'].isna(), 'R-free-flags'] = 0

# Save the output
output_path = '/path/to/output/mtz'
full.to_mtz(output_path)
1 Like