/ module 04
Fine-tuning
Pre-training gives a model general language ability. Fine-tuning teaches it a specific behavior, such as answering in a tone, following a format, or learning your domain. You change the weights with much smaller datasets and tiny learning rates.
methods
Three families of fine-tuning
Full fine-tuning, which updates every weight. Highest quality, highest cost.
LoRA / adapters, which freeze the base and train a small low-rank delta. Cheap, composable, the default in 2024+.
RLHF / DPO, which align the model to human preferences using ranked answers. How ChatGPT became helpful.
Try a tiny dataset with a high learning rate, then watch val loss diverge. Add dropout or more data to fix it.
Now feel it run. This simulated A100 fine-tunes your chosen method epoch-by-epoch, streaming GPU util, VRAM, tokens/sec, and a live train/val loss curve. Fine-tuning typically uses far less VRAM and finishes in minutes, not weeks.
Change Method, Dataset size, Dropout, or Epochs above and press Start, then the run rebuilds. Watch LoRA finish in a fraction of the VRAM of full fine-tuning.