Research
Generation & Evaluation
Generate high-quality synthetic data from real-world spatio-temporal data, and verify quality through statistical and objective metrics.
01 โ Generation
Generation
Generate high-quality synthetic spatio-temporal data with domain knowledge encoding.
Knowledge Encoding
- Knowledge encoding implements computable data structures to represent domain prior knowledge, including structure, dynamics, and physical constraints, of entities in spatio-temporal data.
- Generative models are built to explicitly leverage this encoded knowledge, enabling principled and controllable synthetic data generation.
SoftwareBrainShapeToolKit โ Brain MRI-based shape extraction
DiffAM โ Spine CT-based anatomical modeling
VCM โ Conditional brain MRI image generation
HF-GAN โ High-fidelity unified brain MRI synthesis
Open DataNote: To request the full synthetic dataset, please contact eunhye0323@kaist.ac.kr.


Relation Modeling
- Relation modeling aims to identify how ocean variables are connected across time, space, and resolution. In SST analysis, this helps explain how past temperature changes, extreme high- and low-temperature events, and coarse-resolution ocean fields relate to future or high-resolution SST patterns.
- To model these relationships, we use EVL-based SST forecasting to capture temporal extreme events and GAN-based SST downscaling to learn spatial relationships between coarse- and high-resolution ocean fields.
SoftwareKIOST-SST-Downscaling โ GAN-based SST downscaling for spatial relation modeling
KIOST-SST-EVL โ EVL-based SST forecasting for temporal extreme event modeling
Open DataKIOST-SST-EVL/data ยท KIOST-SST-Downscaling/data/OSTIA/peninsula
Note: Only a partial dataset is available on GitHub due to file size limitations. To request the full synthetic dataset, please contact tkkim@kiost.ac.kr.

Data Assimilation
- Data assimilation combines numerical model outputs and real observations to estimate a more reliable ocean state. It is especially useful when observations are sparse, noisy, or incomplete, because physical constraints can guide the reconstruction of missing or uncertain variables.
- To address this, we combine 3D-Var data assimilation with deep learning to generate spatiotemporal ocean data that reflects both observational evidence and physically informed constraints.
SoftwareKIOST-3D-VAR-DA โ 3D-Var data assimilation combined with deep learning for spatiotemporal ocean data generation
Open DataKIOST-3D-VAR-DA/data
Note: Only a partial dataset is available on GitHub due to file size limitations. To request the full synthetic dataset, please contact tkkim@kiost.ac.kr.

Objective Synthesis
- Predict the post-operative breast from the patient's pre-operative MRI and planned surgery type.
- The model learns real surgical change from paired studies so the expected outcome can inform surgical planning and patient discussion.

02 โ Evaluation
Evaluation
Quantitatively verify the quality of generated synthetic data through statistical similarity and objective relevance.
Statistical Similarity
Quantitatively evaluate distributional similarity and inter-variable correlations between synthetic and real data.
SoftwareEvalCorrSyn โ Evaluation of multivariate distribution similarity in synthetic data
EvalSynLongD โ Statistical similarity of real and synthetic longitudinal data (linear mixed-effects model, Cohen's d)
EvalDistSim โ 1D data distribution similarity via KS test, KL divergence, and Jensen-Shannon divergence
Objective Relevance
The objective of the CNN-based synthetic data evaluation is to assess whether synthetic brain imaging data, generated to address the limitations of real neuroimaging data such as data scarcity, privacy concerns, and disease-group imbalance, possess sufficient biological, anatomical, and clinical validity to be used in real research settings and AI model training.
Specifically, this evaluation aims to determine:
- Whether synthetic images are visually and structurally similar to real brain images.
- Whether CNN-extracted features are comparable between real and synthetic data.
- Whether synthetic data preserve disease-related differences between patient and control groups across disease progression.

Turing Test
- Test whether radiologists can distinguish synthetic from real post-operative breast MRI in blinded review.
- Expert judgment serves as a perceptual quality check alongside statistical-similarity and objective-relevance metrics.
