Product Quantization - AI & ML Glossary

What it is: A compression technique that reduces vector storage size by breaking vectors into smaller pieces and using “codebooks.”

How it works:

Splits each vector into smaller sub-vectors
Creates a “codebook” of common patterns for each piece
Stores references to codebook entries instead of actual values

Why it matters: Saves massive amounts of memory and storage while maintaining reasonable search quality. Essential for large-scale deployments.

Real-world analogy: Like using abbreviations in texting - “LOL” instead of “laugh out loud.” You lose some nuance but save space and can still communicate effectively.