Suggest optimal patch size based on dataset dimensions.
Uses median shape as the starting point but clamps to the minimum volume size per axis, ensuring the suggested patch fits ALL volumes without requiring padding during training.
Algorithm: 1. Use min(median, min_volume) per axis for safety 2. Round down to nearest multiple of divisor (16 for UNet compatibility) 3. Clamp to [min_patch_size, max_patch_size] bounds 4. Validate: error if min_patch_size exceeds smallest volume
Args: dataset: MedDataset instance with analyzed images. target_spacing: Target voxel spacing [x, y, z]. If None, uses dataset.get_suggestion()[‘target_spacing’]. min_patch_size: Minimum per dimension. Default [32, 32, 32]. max_patch_size: Maximum per dimension. Default [256, 256, 256]. divisor: Ensure divisibility (default 16 for UNet compatibility).
Preprocess dataset to disk, creating new columns for preprocessed paths.
Processes images (and optionally masks) through a transform pipeline, saves to output_dir, then creates new ’{col}_preprocessed’ columns in the DataFrame. Original columns are preserved unchanged.
Args: df: DataFrame with file paths. img_col: Column name for image paths. mask_col: Optional column name for mask paths. output_dir: Output directory. Creates images/ and masks/ subdirectories. target_spacing: Target voxel spacing for resampling (e.g., [1.0, 1.0, 1.0]). apply_reorder: Whether to reorder to RAS+ canonical orientation. transforms: Additional TorchIO or fastMONAI transforms to apply after reordering and resampling. max_workers: Number of parallel workers. Each worker loads a full 3D volume into memory, so reduce for large volumes. skip_existing: Skip files that already exist on disk (with size > 0).
import tempfile, shutilfrom fastcore.test import test_eq, test_fail_tmp = tempfile.mkdtemp()# Create synthetic NIfTI filesfor i inrange(3): tio.ScalarImage(tensor=torch.randn(1, 10, 10, 10)).save(f'{_tmp}/img_{i}.nii.gz') tio.LabelMap(tensor=torch.randint(0, 2, (1, 10, 10, 10))).save(f'{_tmp}/mask_{i}.nii.gz')# Test 1: Image-only preprocessing (new columns, originals preserved)_df1 = pd.DataFrame({'img': [f'{_tmp}/img_{i}.nii.gz'for i inrange(3)]})_orig_paths1 = _df1['img'].tolist()_out1 =f'{_tmp}/out1'preprocess_dataset(_df1, img_col='img', output_dir=_out1, apply_reorder=False)# Original column preservedtest_eq(_df1['img'].tolist(), _orig_paths1)# New preprocessed column createdtest_eq('img_preprocessed'in _df1.columns, True)test_eq(all(Path(p).exists() for p in _df1['img_preprocessed']), True)test_eq(all('out1/images/'in p for p in _df1['img_preprocessed']), True)# Test 2: Skip-existing (rerun with original paths pointing to same filenames)_df2 = pd.DataFrame({'img': [f'{_tmp}/img_{i}.nii.gz'for i inrange(3)]})preprocess_dataset(_df2, img_col='img', output_dir=_out1, apply_reorder=False)# Should print "0 processed, 3 skipped"# Test 3: With masks (both columns preserved, new columns created)_df3 = pd.DataFrame({'img': [f'{_tmp}/img_{i}.nii.gz'for i inrange(3)],'mask': [f'{_tmp}/mask_{i}.nii.gz'for i inrange(3)],})_orig_img3 = _df3['img'].tolist()_orig_mask3 = _df3['mask'].tolist()_out3 =f'{_tmp}/out3'preprocess_dataset(_df3, img_col='img', mask_col='mask', output_dir=_out3, apply_reorder=False)# Original columns preservedtest_eq(_df3['img'].tolist(), _orig_img3)test_eq(_df3['mask'].tolist(), _orig_mask3)# New preprocessed columns createdtest_eq(all(Path(p).exists() for p in _df3['img_preprocessed']), True)test_eq(all(Path(p).exists() for p in _df3['mask_preprocessed']), True)test_eq(all('out3/masks/'in p for p in _df3['mask_preprocessed']), True)# Test 4: Input validationtest_fail(lambda: preprocess_dataset(pd.DataFrame(), img_col='img'), contains='empty')test_fail(lambda: preprocess_dataset(pd.DataFrame({'x': [1]}), img_col='img'), contains='not found')_df_dup = pd.DataFrame({'img': [f'{_tmp}/img_0.nii.gz', f'{_tmp}/img_0.nii.gz']})test_fail(lambda: preprocess_dataset(_df_dup, img_col='img'), contains='Duplicate')shutil.rmtree(_tmp)