Spring 2026 · KAIST · Co-taught with Prof. Homin Kim

This course explores the intersection of artificial intelligence and protein science. The AI section covers modern deep learning approaches — from foundational concepts to state-of-the-art models including AlphaFold, RFDiffusion, and ProteinMPNN. All lecture notes are written in a textbook narrative style, accessible to students new to machine learning.

Prerequisites: Python programming, linear algebra, basic probability. No prior deep learning experience required.

Assessment: Midterm project (50%), Final project (50%)

Preliminary Notes

Self-study material for students with a biology background to prepare for in-class lectures.

  1. Python & Data Basics for Protein AI — NumPy arrays, Pandas DataFrames, and protein file formats—the essential toolkit for turning biological data into ML-ready inputs.
  2. Protein Representations for Machine Learning — From one-hot encodings to graph neural network inputs—how to represent protein sequences and structures numerically for deep learning.
  3. Introduction to AI and Deep Learning — From tensors and automatic differentiation to training neural networks—a self-contained guide to deep learning fundamentals for protein science.
  4. Training & Optimizing Neural Networks for Proteins — Regularization, learning rate schedules, debugging training failures, and protein-specific challenges like sequence-identity splits and variable-length inputs.

Lectures

In-class lectures covering advanced architectures and landmark protein AI models.

  1. Transformers & Graph Neural Networks for Proteins — Mar 16. Attention mechanisms for protein sequences and message-passing networks for protein structures—the two architectural pillars of modern protein AI.
  2. Generative Models: VAEs and Diffusion for Proteins — Mar 23. Variational autoencoders and denoising diffusion models—two frameworks for generating novel proteins, from the ELBO derivation to the denoising score-matching objective.
  3. Protein Language Models — Mar 25. How masked language modeling on millions of protein sequences learns evolutionary, structural, and functional information—from ESM-2 embeddings to LoRA fine-tuning.
  4. AlphaFold: Protein Structure Prediction — Mar 30. A deep dive into AlphaFold2's architecture—from MSA processing and the Evoformer's triangle updates to invariant point attention and the FAPE loss.
  5. RFDiffusion: De Novo Protein Structure Generation — Apr 01. How diffusion models on SE(3) frames enable the computational design of novel protein backbones, from the mathematics of rotational noise to conditional scaffold generation.
  6. ProteinMPNN: Inverse Folding and Sequence Design — Apr 06. How message-passing neural networks solve the inverse folding problem—designing amino acid sequences that fold into target protein structures.

Key References

  • Jumper et al. (2021). “Highly accurate protein structure prediction with AlphaFold.” Nature.
  • Watson et al. (2023). “De novo design of protein structure and function with RFdiffusion.” Nature.
  • Dauparas et al. (2022). “Robust deep learning-based protein sequence design using ProteinMPNN.” Science.
  • Lin et al. (2023). “Evolutionary-scale prediction of atomic-level protein structure with a language model.” Science.