Breast Cancer Detection Tutorial

Overview

Estimated Time60 minutes
DifficultyAdvanced
PrerequisitesPython 3.9+, familiarity with deep learning concepts

What You'll Learn

  • Generate synthetic mammography data using DiffuAug
  • Evaluate synthetic data quality with EvalDistSim
  • Train a detection model with an augmented dataset

Step 1: Setup

Clone and install the DiffuAug repository:

git clone https://github.com/SSTDV-Project/DiffuAug.git
cd DiffuAug && pip install -e .

Step 2: Generate Synthetic Data

Use DiffuAug to generate conditional synthetic mammography images:

import diffaug
augmenter = diffaug.DiffuAug(model_type='ddpm', checkpoint='diffaug_pretrained.pth')
synthetic_data = augmenter.generate(condition=labels, n_samples=500, diffusion_steps=100)

Step 3: Evaluate Distribution Similarity

Assess how well the synthetic data matches the real data distribution:

import evaldistsim
evaluator = evaldistsim.EvalDistSim()
metrics = evaluator.compute_metrics(
    original=original_data,
    synthetic=synthetic_data,
    methods=['ks_test', 'wasserstein', 'fid']
)
print(f"KS test: {metrics['ks_test']:.4f}")

Troubleshooting

IssueSolution
Out of memory during generationReduce n_samples or use a smaller batch size via the batch_size parameter
Poor FID scoreIncrease diffusion_steps (try 200–500) or fine-tune on your dataset
Checkpoint not foundDownload pretrained weights from the DiffuAug releases page on GitHub

Citation

If you use DiffuAug or EvalDistSim in your research, please cite:

@software{diffuaug,
  title   = {DiffuAug: Diffusion-based Data Augmentation},
  author  = {SSTDV Project},
  year    = {2024},
  url     = {https://github.com/SSTDV-Project/DiffuAug}
}

← Back to Tutorials