concept

Knowledge Distillation

Knowledge distillation is a machine learning technique where a smaller, more efficient model (the student) is trained to mimic the behavior of a larger, more complex model (the teacher). It involves transferring knowledge from the teacher to the student, often by using the teacher's soft predictions (probabilities) as training targets rather than hard labels. This process helps create compact models that retain much of the performance of larger ones, making them suitable for deployment in resource-constrained environments.

Also known as: Model Distillation, Teacher-Student Learning, KD, Distillation, Neural Network Distillation

🧊Why learn Knowledge Distillation?

Developers should learn and use knowledge distillation when they need to deploy machine learning models on devices with limited computational power, memory, or energy, such as mobile phones, edge devices, or embedded systems. It is particularly valuable in scenarios where model size and inference speed are critical, such as real-time applications, IoT devices, or when serving models to a large user base with cost constraints, as it balances accuracy with efficiency.