Protein & Artificial Intelligence

Spring 2026 · KAIST · Co-taught with Prof. Homin Kim

This course explores the intersection of artificial intelligence and protein science. The AI section covers modern deep learning approaches — from foundational concepts to state-of-the-art models including AlphaFold, RFDiffusion, and ProteinMPNN. All lecture notes are written in a textbook narrative style, accessible to students new to machine learning.

Prerequisites: Python programming, linear algebra, basic probability. No prior deep learning experience required.

Assessment: Midterm project (50%), Final project (50%)

Preliminary Notes

Self-study material for students with a biology background to prepare for in-class lectures.

Python & Data Basics for Protein AI — NumPy arrays, Pandas DataFrames, and protein file formats—the essential toolkit for turning biological data into ML-ready inputs.
Protein Representations for Machine Learning — From one-hot encodings to graph neural network inputs—how to represent protein sequences and structures numerically for deep learning.
Introduction to AI and Deep Learning — From tensors and automatic differentiation to training neural networks—a self-contained guide to deep learning fundamentals for protein science.
Training & Optimizing Neural Networks for Proteins — Regularization, learning rate schedules, debugging training failures, and protein-specific challenges like sequence-identity splits and variable-length inputs.

Lectures

In-class lectures covering advanced architectures and landmark protein AI models.

Transformers & Graph Neural Networks for Proteins — Mar 16. Attention mechanisms for protein sequences and message-passing networks for protein structures—the two architectural pillars of modern protein AI.
Generative Models: VAEs and Diffusion for Proteins — Mar 23. Variational autoencoders and denoising diffusion models—two frameworks for generating novel proteins, from the ELBO derivation to the denoising score-matching objective.
Protein Language Models — Mar 25. How masked language modeling on millions of protein sequences learns evolutionary, structural, and functional information—from ESM-2 embeddings to LoRA fine-tuning.
AlphaFold: Protein Structure Prediction — Mar 30. A deep dive into AlphaFold2's architecture—from MSA processing and the Evoformer's triangle updates to invariant point attention and the FAPE loss.
RFDiffusion: De Novo Protein Structure Generation — Apr 01. How diffusion models on SE(3) frames enable the computational design of novel protein backbones, from the mathematics of rotational noise to conditional scaffold generation.
ProteinMPNN: Inverse Folding and Sequence Design — Apr 06. How message-passing neural networks solve the inverse folding problem—designing amino acid sequences that fold into target protein structures.

Key References

Jumper et al. (2021). “Highly accurate protein structure prediction with AlphaFold.” Nature.
Watson et al. (2023). “De novo design of protein structure and function with RFdiffusion.” Nature.
Dauparas et al. (2022). “Robust deep learning-based protein sequence design using ProteinMPNN.” Science.
Lin et al. (2023). “Evolutionary-scale prediction of atomic-level protein structure with a language model.” Science.