/ module 04

Fine-tuning

Pre-training gives a model general language ability. Fine-tuning teaches it a specific behavior, such as answering in a tone, following a format, or learning your domain. You change the weights with much smaller datasets and tiny learning rates.

methods

Three families of fine-tuning

Full fine-tuning, which updates every weight. Highest quality, highest cost.

LoRA / adapters, which freeze the base and train a small low-rank delta. Cheap, composable, the default in 2024+.

RLHF / DPO, which align the model to human preferences using ranked answers. How ChatGPT became helpful.

learning rate

How aggressively weights change per step. Too high = chaos; too low = stuck.

overfitting

Train loss falls, val loss rises. The model memorized your data.

LoRA

Low-Rank Adaptation. Trains < 1% of the params.

Live lab · Fine-tuning simulator

Method:

Learning rate0.050

Epochs20

Dataset size500

Dropout0.10

Try a tiny dataset with a high learning rate, then watch val loss diverge. Add dropout or more data to fix it.

Live lab · Animated training run · A100 GPU

Now feel it run. This simulated A100 fine-tunes your chosen method epoch-by-epoch, streaming GPU util, VRAM, tokens/sec, and a live train/val loss curve. Fine-tuning typically uses far less VRAM and finishes in minutes, not weeks.

Method LORA· steps/epoch 10· dataset 500speed: 70ms

NVIDIA A100 · 80GB SXM

idle

util0%vram0/80 GBtemp34°Cpower35W

speedLoRA · 7B · 50M params

Epoch 1/20 · step 0/2000.0%

tokens seen

throughput

38.0k tok/s

elapsed

0.0s

eta

n/a

e10

e11

e12

e13

e14

e15

e16

e17

e18

e19

e20

train.log

Press Start to stream epoch logs…

Change Method, Dataset size, Dropout, or Epochs above and press Start, then the run rebuilds. Watch LoRA finish in a fraction of the VRAM of full fine-tuning.