Research

Generation & Evaluation

Generate high-quality synthetic data from real-world spatio-temporal data, and verify quality through statistical and objective metrics.

01 โ€” Generation

Generation

Generate high-quality synthetic spatio-temporal data with domain knowledge encoding.

Knowledge Encoding

  • Knowledge encoding implements computable data structures to represent domain prior knowledge, including structure, dynamics, and physical constraints, of entities in spatio-temporal data.
  • Generative models are built to explicitly leverage this encoded knowledge, enabling principled and controllable synthetic data generation.

SoftwareBrainShapeToolKit โ€” Brain MRI-based shape extraction
DiffAM โ€” Spine CT-based anatomical modeling
VCM โ€” Conditional brain MRI image generation
HF-GAN โ€” High-fidelity unified brain MRI synthesis

Open DataNote: To request the full synthetic dataset, please contact eunhye0323@kaist.ac.kr.
Versatile VCMVCM Details

Relation Modeling

  • Relation modeling aims to identify how ocean variables are connected across time, space, and resolution. In SST analysis, this helps explain how past temperature changes, extreme high- and low-temperature events, and coarse-resolution ocean fields relate to future or high-resolution SST patterns.
  • To model these relationships, we use EVL-based SST forecasting to capture temporal extreme events and GAN-based SST downscaling to learn spatial relationships between coarse- and high-resolution ocean fields.

SoftwareKIOST-SST-Downscaling โ€” GAN-based SST downscaling for spatial relation modeling
KIOST-SST-EVL โ€” EVL-based SST forecasting for temporal extreme event modeling

Open DataKIOST-SST-EVL/data ยท KIOST-SST-Downscaling/data/OSTIA/peninsula
Note: Only a partial dataset is available on GitHub due to file size limitations. To request the full synthetic dataset, please contact tkkim@kiost.ac.kr.

Relation Modeling

Data Assimilation

  • Data assimilation combines numerical model outputs and real observations to estimate a more reliable ocean state. It is especially useful when observations are sparse, noisy, or incomplete, because physical constraints can guide the reconstruction of missing or uncertain variables.
  • To address this, we combine 3D-Var data assimilation with deep learning to generate spatiotemporal ocean data that reflects both observational evidence and physically informed constraints.

SoftwareKIOST-3D-VAR-DA โ€” 3D-Var data assimilation combined with deep learning for spatiotemporal ocean data generation
Open DataKIOST-3D-VAR-DA/data
Note: Only a partial dataset is available on GitHub due to file size limitations. To request the full synthetic dataset, please contact tkkim@kiost.ac.kr.

Data Assimilation

Objective Synthesis

  • Predict the post-operative breast from the patient's pre-operative MRI and planned surgery type.
  • The model learns real surgical change from paired studies so the expected outcome can inform surgical planning and patient discussion.

Objective Synthesis

02 โ€” Evaluation

Evaluation

Quantitatively verify the quality of generated synthetic data through statistical similarity and objective relevance.

Statistical Similarity

Quantitatively evaluate distributional similarity and inter-variable correlations between synthetic and real data.
SoftwareEvalCorrSyn โ€” Evaluation of multivariate distribution similarity in synthetic data
EvalSynLongD โ€” Statistical similarity of real and synthetic longitudinal data (linear mixed-effects model, Cohen's d)
EvalDistSim โ€” 1D data distribution similarity via KS test, KL divergence, and Jensen-Shannon divergence

Objective Relevance

The objective of the CNN-based synthetic data evaluation is to assess whether synthetic brain imaging data, generated to address the limitations of real neuroimaging data such as data scarcity, privacy concerns, and disease-group imbalance, possess sufficient biological, anatomical, and clinical validity to be used in real research settings and AI model training.

Specifically, this evaluation aims to determine:

  • Whether synthetic images are visually and structurally similar to real brain images.
  • Whether CNN-extracted features are comparable between real and synthetic data.
  • Whether synthetic data preserve disease-related differences between patient and control groups across disease progression.

Objective Relevance Evaluation

Turing Test

  • Test whether radiologists can distinguish synthetic from real post-operative breast MRI in blinded review.
  • Expert judgment serves as a perceptual quality check alongside statistical-similarity and objective-relevance metrics.

Turing Test