/ module 01
Tokens
Models don't read characters, instead they read tokens. A tokenizer splits your text into chunks and maps each chunk to an integer ID. The choice of tokenizer changes everything downstream: vocabulary size, sequence length, even bias.
concept
Why tokens?
Neural networks operate on numbers. Tokenization is the bridge from text to integer IDs the model can embed and process. The most common modern approach is Byte-Pair Encoding (BPE), which learns frequent subword chunks like tion, ing, or the.
Trade-offs: character-level tokenizers have tiny vocabularies but long sequences. Word-level tokenizers have huge vocabularies and can't handle new words. BPE sits in the middle.
Try pasting code, emoji, or another language. Notice how token count changes, since that directly affects cost and context length.