/ module 04

Fine-tuning

Pre-training gives a model general language ability. Fine-tuning teaches it a specific behavior, such as answering in a tone, following a format, or learning your domain. You change the weights with much smaller datasets and tiny learning rates.

methods

Three families of fine-tuning

Full fine-tuning, which updates every weight. Highest quality, highest cost.

LoRA / adapters, which freeze the base and train a small low-rank delta. Cheap, composable, the default in 2024+.

RLHF / DPO, which align the model to human preferences using ranked answers. How ChatGPT became helpful.

learning rate
How aggressively weights change per step. Too high = chaos; too low = stuck.
overfitting
Train loss falls, val loss rises. The model memorized your data.
LoRA
Low-Rank Adaptation. Trains < 1% of the params.
Live lab · Fine-tuning simulator
Method:

Try a tiny dataset with a high learning rate, then watch val loss diverge. Add dropout or more data to fix it.

Live lab · Animated training run · A100 GPU

Now feel it run. This simulated A100 fine-tunes your chosen method epoch-by-epoch, streaming GPU util, VRAM, tokens/sec, and a live train/val loss curve. Fine-tuning typically uses far less VRAM and finishes in minutes, not weeks.

Method LORA· steps/epoch 10· dataset 500
NVIDIA A100 · 80GB SXM
idle
util0%vram0/80 GBtemp34°Cpower35W
LoRA · 7B · 50M params
Epoch 1/20 · step 0/2000.0%
tokens seen
0
throughput
38.0k tok/s
elapsed
0.0s
eta
n/a
e1
e2
e3
e4
e5
e6
e7
e8
e9
e10
e11
e12
e13
e14
e15
e16
e17
e18
e19
e20
train.log
Press Start to stream epoch logs…

Change Method, Dataset size, Dropout, or Epochs above and press Start, then the run rebuilds. Watch LoRA finish in a fraction of the VRAM of full fine-tuning.