We start by motivating the gap between pre-training and post-training, drawing on seminal work such as DDPM [1] and Latent Diffusion Models [2]. We then cover three broad themes — Alignment, Enhancement, and Control — each complemented by implementation references and noteworthy literature for further study.
A brief discussion on common pre-training methodologies of text-to-image diffusion models, and the practical motivation for post-training large pre-trained models. Sets up the three themes that organize the rest of the tutorial.
What alignment means within the diffusion paradigm, and how it is implemented in contemporary base models. Three core dimensions:
Improving capabilities beyond the original training scope, plus distillation to reduce denoising steps. Covers both inference-only and training-based techniques:
Structural control beyond natural language, plus emerging in-context-learning approaches for image generation:
A half-day event in the second half of September 8, 2026. Indicative breakdown:
| Part | Topic | Duration |
|---|---|---|
| Part 1 | Pre-training vs. post-training — motivation and overview | 45 min |
| Part 2 | Alignment — definitions, safety, fairness, RLHF | 45 min |
| Break | Coffee & networking | 15 min |
| Part 3 | Enhancement — inference-time and training-based techniques | 45 min |
| Part 4 | Control — structural control and in-context learning | 45 min |
| Q&A | Open discussion on the boundaries of post-training | 15 min |