Can 1/(spaceship) output unmerged mtz files in batches? I can’t seem to figure out how to do that if so. Gemmi can, but it’s messy:
cell = ds_corr.cell
sg = ds_corr.spacegroup
ds_corr = rs.DataSet(whatever)
mtz_out = gemmi.Mtz(with_base=True)
mtz_out.spacegroup = sg
mtz_out.cell = cell
ds = mtz_out.add_dataset("combined")
ds.project_name = "whatamidoing"
ds.crystal_name = "pleasehelp"
dataset_id = ds.id
# Add columns: M/ISYM BATCH I SIGI LAMBDA
mtz_out.add_column("M/ISYM", "Y", dataset_id=dataset_id)
mtz_out.add_column("BATCH", "B", dataset_id=dataset_id)
mtz_out.add_column("I", "J", dataset_id=dataset_id)
mtz_out.add_column("SIGI", "Q", dataset_id=dataset_id)
mtz_out.add_column("LAMBDA", "R", dataset_id=dataset_id)
all_data = []
for batch, group in ds_corr.groupby(by='BATCH'):
nrefls = len(group)
# Build array: H K L M/ISYM BATCH I SIGI LAMBDA
batch_data = np.column_stack([
group.index.get_level_values('H').to_numpy(),
group.index.get_level_values('K').to_numpy(),
group.index.get_level_values('L').to_numpy(),
group['M/ISYM'].to_numpy(),
np.full(nrefls, batch, dtype='f4'),
group['I'].to_numpy(),
group['SIGI'].to_numpy(),
group['LAMBDA'].to_numpy()
])
all_data.append(batch_data)
combined = np.vstack(all_data)
mtz_out.set_data(combined)
nbatch = ds_corr.groupby(by='BATCH').ngroups
for b in range(1, nbatch + 1):
batch = gemmi.Mtz.Batch()
batch.number = b
batch.cell = cell
batch.dataset_id = dataset_id
mtz_out.batches.append(batch)
mtz_out.sort()
mtz_out.write_to_file("combined_sorted_lorentz.mtz")
Hi @aaronfinke, can you say a little more about your use case? I’m a little unsure what wrangling your snippet does apart from telling gemmi to write batch headers. If you want to edit the batch headers, that probably needs to happen at the gemmi level. While rs supports “BATCH” columns to associate reflections with specific batches, it doesn’t have support fort batch headers which map BATCH numbers to dataset IDs.
I think you can probably simplify your code if you do whatever manipulations you want in rs, convert your rs.DataSet to gemmi, and edit the batch headers there.
ds = rs.read_mtz(whatever)
... #do whatever fancy stuff you want here to sort / modify ds
dataset_id = 0
cell = ds.cell
mtz = ds.to_gemmi()
for b in ds.BATCH.unique():
batch = gemmi.Mtz.Batch()
batch.number = b
batch.cell = cell
batch.dataset_id = dataset_id
mtz.batches.append(batch)
mtz_out.write_to_file("combined_sorted_lorentz.mtz")
It’s definitely not ideal. Couple of questions
- should we modify
rs to use the above strategy to populate batch headers automatically?
- should we add methods to edit batch headers in rs? or should we just punt that to gemmi?
I wanted to output an mtz file for input into aimless, which requires batch headers.
- should we modify
rs to use the above strategy to populate batch headers automatically?
maybe- up to you if you want to include this functionality. Since it’s exclusive to mtz, maybe just set it as an option when writing to ds.write_mtz, like:
ds.write_mtz("foo.mtz", batch=True)
And if batch=True it can just run the gemmi batch file write like I’ve done above, but only if there’s a proper BATCH column in the dataset (e.g. raise ValueError if no BATCH column).
- should we add methods to edit batch headers in rs? or should we just punt that to gemmi?
I can’t think of a reason to do it other than for output to mtz. rs can do any other manipulation with a simple ds.groupby('BATCH') which is way simpler than anything else. I’d just stick with gemmi.