I am new to making difference maps to compare on and off states, and I am curious about what people typically do when they don’t see a clear signal that they expect.
Do you further refine the off-state model in the hope of better phase estimates? Is there a rule-of-thumb on what quality of data and model is needed for success (e.g. R-free, CChalf, resolution)?
Do you try to scale the on-state reflection data differently than, for example, just running SCALEIT? If scaling is an acceptable hyperparameter to tune in this context, what kind of scaling schemes are out there?
Do you adjust the difference map weight? I see that in rs-booster, the weight is implemented as w = \frac{1}{1+\frac{\sigma^2_{\triangle F}}{\langle \sigma^2_{\triangle F} \rangle} + \alpha \frac{|\triangle F|^2}{\langle |\triangle F|^2 \rangle}}. What is the significance of \alpha? Are there other ways of calculating w?
@anonymous, welcome! Difference maps for time-resolved crystallography are indeed a confusing subject. There is a lot to unpack here. I will share my opinions, but I’m sure others will have their own.
This will depend on the particulars of the experiment. Usually, I will do a very light refinement (rigid body + atomic b-factors) of a reference structure into the off-state dataset. I like to use the model phases from this refinement for the difference map. A more extensive refinement may be required in cases where the reference structure is significantly different in some way (for instance if it was collected at a different temperature).
As far as quality measures, I’m not sure I can offer much actionable. I have seen statistically significant and biologically interpretable difference peaks in maps as low as 3.0Å with corresponding R-values >0.3. I would say it is important that the resolution meets or exceeds the length scale of the phenomenon you are trying to observe.
In my opinion, the best method is to scale the time points jointly prior to merging if your software package supports this. I do believe it is acceptable (essential even?) to using scaling as a handle to improve difference signal. We recently published a very thorough example of this (Structural Dynamics Article, Pre-print). I this paper we used careless to scale time-points as statistically dependent whereas most approaches treat them as independent.
\alpha is a hyperparameter which suppresses outliers in a difference map. The model implicit in the w is that the structure factor differences are normally distributed, |\Delta F| \sim \mathcal{N}(0, \sigma^2_{|\Delta F|} ). \alpha encodes the strength of this prior belief or equivalently the width of the expected distribution of difference structure factors. Smaller values of \alpha are more permissive of outliers. In my experience it is typical to use a value like \alpha=0.05. For data with few outliers, \alpha=0 may be viable.
Yes! You could try total variation based denoising usint METEOR.
One key question is about isomorphism vs non-isomorphism. Your (F1-F2, phicalc) difference map will only be meaningful if the difference of F1-F2 is higher than the non-isomorphism between the F1 and F2 crystals, AND phicalc is appropriate for both. In some cases, this condition may not be fulfilled. In those cases, other difference maps (the usual F1-Fcalc1,phicalc1 and F2-Fcalc2,phicalc2) must be inspected and compared. There are no clear-cut procedures for analysis of such questions, I think, but Kevin may know better.
I was thinking about a different aspect. How can you prove that your ligand soak (or whatever you did to make it different from the reference) is really different from the reference? In other words, is there really a difference to the reference, or do you just expect that difference but there is none or so little that it is not detectable in the presence of noise?
Ideally that question should be answered without a model, because a model produces a bias.
I can suggest a procedure that is model-free, but the catch is that you need more than one dataset from each of the F1 and F2 crystal forms. If that condition is fulfilled, the procedure arranges all datasets in a plane. You then check if all datasets of type 1 are on a line, and that all datasets of type 2 are on a different line.
Technically, XDS datasets are analyzed with XSCALE_ISOCLUSTER. A similar procedure now also exists in the DIALS framework (ask Amy Thompson). Whether Kevin has this in his toolbox, I don’t know.
The relevant publication is (IUCr) Making a difference in multi-data-set crystallography: simple and deterministic data-scaling/selection methods , with theory in (IUCr) Dissecting random and systematic differences between noisy composite data sets
I don't explicitly have anything like Isocluster in my toolbox, but I think one could implement it using gemmi and/or reciprocalspaceship combined with the MDS implementation in scikit learn. Then again, why bother when it is already available in other nice packages? :)
Hey @anonymous – just to say: meteor aims to help pick the best scaling & weighting procedure for you using principled, automatic methods. Further it can denoise the result using a TV denoiser. We’ve had success seeing signals with meteor that were hidden with other methods – not always, but in cases where there is a bit of signal hiding, it can make a substantial difference.
Actually we are still working to improve meteor to try and address some of the good points you brought up, like automatically flagging datasets that have no appreciable signal. That’s still a research topic.
If you try it, we’d be curious to know how well it worked, success or not! Happy to weigh in and provide more advice.
@anonymous I’m just finally joining the discourse now, so I’ll give my hopefully-not-too-late two cents here!
Another option (perhaps a little lower-tech than meteor) is the package I wrote, matchmaps, which generalizes the isomorphous difference map to the non-isomorphous case. As Kay mentioned, the quality of an isomorphous difference map can fall off pretty rapidly with even a small unit cell change. In some cases, even isomorphous datasets can benefit from the matchmaps approach. It’s easy to just run quickly on your data and take a peek, so if you’re still thinking about this, it might be worth a shot!
Happy to chat more about it, and of course raise an issue on GitHub if you’re running into any trouble.
I don't explicitly have anything like Isocluster in my toolbox, but I think one could implement it using
– kmdaltongemmiand/orreciprocalspaceshipcombined with the MDS implementation in scikit learn. Then again, why bother when it is already available in other nice packages? :)