Creation of multi-block sf-cif files for VALDO deposition

Doeke_Hekstra · December 23, 2025, 11:09pm

In preparing VALDO output for PDB deposition, we are thinking of following the following convention for deposition: a multi-block sf-cif file with the following blocks:

extrapolated structure factor amplitudes as used for structure refinement, along with 2Fo-Fc and Fo-Fc amplitudes and phases
original, unextrapolated amplitudes coming from the crystal with fragment soaked in.
difference map amplitudes, weights, and phases

Would someone have thoughts about the best way to create such a file? Can this be done in rs? Should we go through GEMMI? Or, create one sf-cif file per block and concatenate them?

kmdalton · December 24, 2025, 2:57pm

I think that a deficiency with rs right now is that it can convert monoblock cif to mtz but there’s no support for multi-block cif nor writing cifs. You could certainly accomplish what you want with pure gemmi. If rs would be a value add, we can look into extending it to better support multi-block cif. This might require making some API decisions which are probably long overdue.

Doeke_Hekstra · December 25, 2025, 8:03pm

Thanks. For now, I am using rs to read in the MTZs and preprocess column labels, and then use rs.io.to_gemmi(), after which I create an instance of a gemmi.cif.Document class. I will add this to the VALDO repository and can provide the code if anyone finds that helpful.

JBGreisman · December 28, 2025, 5:46pm

As @kmdalton said, right now rs doesn’t have an explicit way to support multi-block cifs (or more generally, any sort of multi-rs.DataSet-style object). Back in the day, we had toyed around with having such an object to support unmerged datasets (with each image being a separate DataSet—this would enable things like per-image cell parameters, etc).

Implementing this in a logical, coherent way will be a rather big API decision, but it could certainly be worthwhile if there’s a big new class of experiments/analyses that it would support. Keep us posted if you think there’s a compelling use case.

tjlane · December 28, 2025, 6:57pm

Bit of an aside, but I would deposit the original intensities (and SIGI’s) as well.

Randy (for example) is big on retaining the original data, as a lot of his work requires it.

CIF support feels like a nice to have to me, rather than an urgent need, but that’s just my 2c ;).

Doeke_Hekstra · December 29, 2025, 10:05pm

thanks, @JBGreisman. For now, our solution lives in added valdo to sf-cif script by DHekstra · Pull Request #38 · Hekstra-Lab/valdo · GitHub . Happy to wait with an rs-based solution until a compelling use case comes along.

@tjlane, great point. In this case, we didn’t correct the original data and do not have the intensities, but generally fully agree.