Breast Cancer Detection Tutorial
Overview
| Estimated Time | 60 minutes |
| Difficulty | Advanced |
| Prerequisites | Python 3.9+, familiarity with deep learning concepts |
What You'll Learn
- Generate synthetic mammography data using DiffuAug
- Evaluate synthetic data quality with EvalDistSim
- Train a detection model with an augmented dataset
Step 1: Setup
Clone and install the DiffuAug repository:
git clone https://github.com/SSTDV-Project/DiffuAug.git
cd DiffuAug && pip install -e .Step 2: Generate Synthetic Data
Use DiffuAug to generate conditional synthetic mammography images:
import diffaug
augmenter = diffaug.DiffuAug(model_type='ddpm', checkpoint='diffaug_pretrained.pth')
synthetic_data = augmenter.generate(condition=labels, n_samples=500, diffusion_steps=100)Step 3: Evaluate Distribution Similarity
Assess how well the synthetic data matches the real data distribution:
import evaldistsim
evaluator = evaldistsim.EvalDistSim()
metrics = evaluator.compute_metrics(
original=original_data,
synthetic=synthetic_data,
methods=['ks_test', 'wasserstein', 'fid']
)
print(f"KS test: {metrics['ks_test']:.4f}")Troubleshooting
| Issue | Solution |
|---|---|
| Out of memory during generation | Reduce n_samples or use a smaller batch size via the batch_size parameter |
| Poor FID score | Increase diffusion_steps (try 200–500) or fine-tune on your dataset |
| Checkpoint not found | Download pretrained weights from the DiffuAug releases page on GitHub |
Citation
If you use DiffuAug or EvalDistSim in your research, please cite:
@software{diffuaug,
title = {DiffuAug: Diffusion-based Data Augmentation},
author = {SSTDV Project},
year = {2024},
url = {https://github.com/SSTDV-Project/DiffuAug}
}