Mental model
Soft labels (the MCTS visit distribution, or teacher logits) carry far more bits per sample than one-hot RL rewards, explaining why distillation works and why AlphaGo trains on the full visit-count distribution.
- Who
- Eric Jang
- Topic
- Bits per sample