Spring 2026 · KAIST · Co-taught with Prof. Homin Kim
This course explores the intersection of artificial intelligence and protein science. The AI section covers modern deep learning approaches — from foundational concepts to state-of-the-art models including AlphaFold, RFDiffusion, and ProteinMPNN. All lecture notes are written in a textbook narrative style, accessible to students new to machine learning.
Prerequisites: Python programming, linear algebra, basic probability. No prior deep learning experience required.
Assessment: Midterm project (50%), Final project (50%)
Preliminary Notes
Self-study material for students with a biology background to prepare for in-class lectures.
- Python & Data Basics for Protein AI — NumPy arrays, Pandas DataFrames, and protein file formats—the essential toolkit for turning biological data into ML-ready inputs.
- Protein Representations for Machine Learning — From one-hot encodings to graph neural network inputs—how to represent protein sequences and structures numerically for deep learning.
- Introduction to AI and Deep Learning — From tensors and automatic differentiation to training neural networks—a self-contained guide to deep learning fundamentals for protein science.
- Training & Optimizing Neural Networks for Proteins — Regularization, learning rate schedules, debugging training failures, and protein-specific challenges like sequence-identity splits and variable-length inputs.
Lectures
In-class lectures covering advanced architectures and landmark protein AI models.
- Transformers & Graph Neural Networks for Proteins — Mar 16. Attention mechanisms for protein sequences and message-passing networks for protein structures—the two architectural pillars of modern protein AI.
- Generative Models: VAEs and Diffusion for Proteins — Mar 23. Variational autoencoders and denoising diffusion models—two frameworks for generating novel proteins, from the ELBO derivation to the denoising score-matching objective.
- Protein Language Models — Mar 25. How masked language modeling on millions of protein sequences learns evolutionary, structural, and functional information—from ESM-2 embeddings to LoRA fine-tuning.
- AlphaFold: Protein Structure Prediction — Mar 30. A deep dive into AlphaFold2's architecture—from MSA processing and the Evoformer's triangle updates to invariant point attention and the FAPE loss.
- RFDiffusion: De Novo Protein Structure Generation — Apr 01. How diffusion models on SE(3) frames enable the computational design of novel protein backbones, from the mathematics of rotational noise to conditional scaffold generation.
- ProteinMPNN: Inverse Folding and Sequence Design — Apr 06. How message-passing neural networks solve the inverse folding problem—designing amino acid sequences that fold into target protein structures.
Key References
- Jumper et al. (2021). “Highly accurate protein structure prediction with AlphaFold.” Nature.
- Watson et al. (2023). “De novo design of protein structure and function with RFdiffusion.” Nature.
- Dauparas et al. (2022). “Robust deep learning-based protein sequence design using ProteinMPNN.” Science.
- Lin et al. (2023). “Evolutionary-scale prediction of atomic-level protein structure with a language model.” Science.