/ module 01

Tokens

Models don't read characters, instead they read tokens. A tokenizer splits your text into chunks and maps each chunk to an integer ID. The choice of tokenizer changes everything downstream: vocabulary size, sequence length, even bias.

concept

Why tokens?

Neural networks operate on numbers. Tokenization is the bridge from text to integer IDs the model can embed and process. The most common modern approach is Byte-Pair Encoding (BPE), which learns frequent subword chunks like tion, ing, or the.

Trade-offs: character-level tokenizers have tiny vocabularies but long sequences. Word-level tokenizers have huge vocabularies and can't handle new words. BPE sits in the middle.

token
A chunk of text mapped to an integer ID.
context window
Max number of tokens a model can read at once.
BPE
Byte-Pair Encoding, which learns frequent subwords.
Live lab · Tokenizer
mode:
19 tokens · ~15 words
The·quick·brown·fox·is·learning·to·read.
"The"#1084
" "#1032
"qui"#1113
"ck"#1099
" "#1032
"bro"#1098
"wn"#1119
" "#1032
"fox"#1102
" is"#1032
" "#1032
"lea"#1108
"rn"#1114
"ing"#1105
" to"#1032
" "#1032

Try pasting code, emoji, or another language. Notice how token count changes, since that directly affects cost and context length.