Skip to main content

Self-Regulated Feature Learning via Teacher-free Feature Distillation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13686))

Included in the following conference series:

Abstract

Knowledge distillation conditioned on intermediate feature representations always leads to significant performance improvements. Conventional feature distillation framework demands extra selecting/training budgets of teachers and complex transformations to align the features between teacher-student models. To address the problem, we analyze teacher roles in feature distillation and have an intriguing observation: additional teacher architectures are not always necessary. Then we propose Tf-FD, a simple yet effective Teacher-\(\boldsymbol{f}\)ree Feature Distillation framework, reusing channel-wise and layer-wise meaningful features within the student to provide teacher-like knowledge without an additional model. In particular, our framework is subdivided into intra-layer and inter-layer distillation. The intra-layer Tf-FD performs feature salience ranking and transfers the knowledge from salient feature to redundant feature within the same layer. For inter-layer Tf-FD, we deal with distilling high-level semantic knowledge embedded in the deeper layer representations to guide the training of shallow layers. Benefiting from the small gap between these self-features, Tf-FD simply needs to optimize extra feature mimicking losses without complex transformations. Furthermore, we provide insightful discussions to shed light on Tf-FD from feature regularization perspectives. Our experiments conducted on classification and object detection tasks demonstrate that our technique achieves state-of-the-art results on different models with fast training speeds. Code is available at https://lilujunai.github.io/Teacher-free-Distillation/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., Dai, Z.: Variational information distillation for knowledge transfer. In: CVPR (2019)

    Google Scholar 

  2. Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 (2020)

  3. Bucila, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: KDD (2006)

    Google Scholar 

  4. Chattopadhyay, A., Sarkar, A., Howlader, P., Balasubramanian, V.: Grad-CAM++: Improved visual explanations for deep convolutional networks. In: WACV (2018)

    Google Scholar 

  5. Chen, D., Mei, J.P., Zhang, Y., Wang, C., Wang, Z., Feng, Y., Chen, C.: Cross-layer distillation with semantic calibration. arXiv preprint, arXiv:2012.03236 (2020)

  6. Cheng, X., Rao, Z., Chen, Y., Zhang, Q.: Explaining knowledge distillation by quantifying the knowledge. In: CVPR (2020)

    Google Scholar 

  7. Chung, I., Park, S., Kim, J., Kwak, N.: Feature-map-level online adversarial knowledge distillation. In: ICML (2020)

    Google Scholar 

  8. Dong, P., Niu, X., Li, L., Xie, L., Zou, W., Ye, T., Wei, Z., Pan, H.: Prior-guided one-shot neural architecture search. arXiv preprint arXiv:2206.13329 (2022)

  9. Dong, Z., Hanwang, Z., Jinhui, T., Xiansheng, H., Qianru, S.: Self-regulation for semantic segmentation. International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  10. Ghiasi, G., Lin, T.Y., Le, Q.V.: Dropblock: a regularization method for convolutional networks. In: NeurIPS (2018)

    Google Scholar 

  11. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  12. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: GhostNet: more features from cheap operations. In: CVPR (2020)

    Google Scholar 

  13. Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. In: NeurIPS (2015)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. Heo, B., Lee, M., Yun, S., Choi, J.Y.: Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: AAAI (2019)

    Google Scholar 

  16. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint, arXiv:1503.02531 (2015)

  17. Hou, Y., Ma, Z., Liu, C., Loy, C.C.: Learning lightweight lane detection CNNS by self attention distillation. In: ICCV (2019)

    Google Scholar 

  18. Hu, Y., Wang, X., Li, L., Gu, Q.: Improving one-shot NAS with shrinking-and-expanding supernet. Pattern Recogn. 118, 108025 (2021)

    Article  Google Scholar 

  19. Huang, G., Chen, D., Li, T., Wu, F., van der Maaten, L., Weinberger, K.Q.: Multi-scale dense networks for resource efficient image classification. In: ICLR (2018)

    Google Scholar 

  20. Huang, Z., Wang, N.: Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint, arXiv:1707.01219 (2017)

  21. Ji, M., Shin, S., Hwang, S., Park, G., Moon, I.C.: Refine myself by teaching myself: feature refinement via self-knowledge distillation. arXiv preprint, arXiv:2103.08273 (2021)

  22. Kim, J., Park, S., Kwak, N.: Paraphrasing complex network: Network compression via factor transfer. In: NeurIPS (2018)

    Google Scholar 

  23. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)

    Google Scholar 

  24. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NeurIPS (2012)

    Google Scholar 

  25. Lan, X., Zhu, X., Gong, S.: Knowledge distillation by on-the-fly native ensemble. In: NeurIPS (2018)

    Google Scholar 

  26. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS (2015)

    Google Scholar 

  27. Lee, H., Hwang, S.J., Shin, J.: Self-supervised label augmentation via input transformations. In: ICML (2020)

    Google Scholar 

  28. Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. In: ICLR (2017)

    Google Scholar 

  29. Li, L., Shiuan-Ni, L., Yang, Y., Jin, Z.: Boosting online feature transfer via separable feature fusion. In: IJCNN (2022)

    Google Scholar 

  30. Li, L., Shiuan-Ni, L., Yang, Y., Jin, Z.: Teacher-free distillation via regularizing intermediate representation. In: IJCNN (2022)

    Google Scholar 

  31. Li, L., Wang, Y., Yao, A., Qian, Y., Zhou, X., He, K.: Explicit connection distillation (2020)

    Google Scholar 

  32. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  33. Liu, L., et al.: Exploring inter-channel correlation for diversity-preserved knowledge distillation. In: ICCV (2021)

    Google Scholar 

  34. Liu, Y., et al.: Search to distill: Pearls are everywhere but not the eyes. In: CVPR (2020)

    Google Scholar 

  35. Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV (2017)

    Google Scholar 

  36. Malinin, A., Mlodozeniec, B., Gales, M.: Ensemble distribution distillation. In: ICLR (2020)

    Google Scholar 

  37. Pan, H., Jiang, H., Niu, X., Dou, Y.: Dropfilter: A novel regularization method for learning convolutional neural networks. arXiv preprint, arXiv:1811.06783 (2018)

  38. Passalis, N., Tefas, A.: Learning deep representations with probabilistic knowledge transfer. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 283–299. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_17

    Chapter  Google Scholar 

  39. Peng, B., et al.: Correlation congruence for knowledge distillation. In: ICCV (2019)

    Google Scholar 

  40. Pengguang, C., Shu, L., Hengshuang, Z., Jia, J.: Distilling knowledge via knowledge review. In: CVPR (2021)

    Google Scholar 

  41. Phuong, M., Lampert, C.H.: Distillation-based training for multi-exit architectures. In: ICCV (2019)

    Google Scholar 

  42. Qin, J., Wu, J., Xiao, X., Li, L., Wang, X.: Activation modulation and recalibration scheme for weakly supervised semantic segmentation. In: AAAI (2022)

    Google Scholar 

  43. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint, arXiv:1506.01497 (2015)

  44. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. In: ICLR (2015)

    Google Scholar 

  45. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. IJCV. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  46. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint, arXiv:1409.1556 (2014)

  47. Sun, D., Yao, A.: Deeply-supervised knowledge synergy. In: CVPR (2019)

    Google Scholar 

  48. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: ICCV (2016)

    Google Scholar 

  49. Tang, Y., Wang, Y., Xu, Y., Shi, B., Xu, C., Xu, C., Xu, C.: Beyond dropout: feature map distortion to regularize deep neural networks. In: AAAI (2020)

    Google Scholar 

  50. Tian, Y., Krishnan, D., Isola, P.: Contrastive representation distillation. In: ICLR (2020)

    Google Scholar 

  51. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR (2015)

    Google Scholar 

  52. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: ICCV (2019)

    Google Scholar 

  53. Vaswani, A., et al.: Attention is all you need. arXiv preprint, arXiv:1706.03762 (2017)

  54. Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR (2019)

    Google Scholar 

  55. Wolchover, N., Reading, L.: New theory cracks open the black box of deep learning. Quanta Magazine (2017)

    Google Scholar 

  56. Yang, Z., et al.: Focal and global knowledge distillation for detectors. In: CVPR (2022)

    Google Scholar 

  57. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: CVPR (2017)

    Google Scholar 

  58. Yuan, L., Tay, F.E., Li, G., Wang, T., Feng, J.: Revisiting knowledge distillation via label smoothing regularization. In: CVPR (2020)

    Google Scholar 

  59. Yun, S., Park, J.S., Lee, K., Shin, J.: Regularizing class-wise predictions via self-knowledge distillation. In: CVPR (2020)

    Google Scholar 

  60. Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)

    Google Scholar 

  61. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)

    Google Scholar 

  62. Zhang, L., Song, J., Gao, A., Chen, J., Bao, C., Ma, K.: Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: ICCV (2019)

    Google Scholar 

  63. Zhang, Z., Sabuncu, M.R.: Self-distillation as instance-specific label smoothing. arXiv preprint, arXiv:2006.05065 (2020)

  64. Zhao, C., Ni, B., Zhang, J., Zhao, Q., Zhang, W., Tian, Q.: Variational convolutional neural network pruning. In: CVPR (2019)

    Google Scholar 

  65. Zhou, A., Yao, A., Guo, Y., Xu, L., Chen, Y.: Incremental network quantization: towards lossless CNNs with low-precision weights. In: ICLR (2017)

    Google Scholar 

  66. Zhou, S., Yuxin, W., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: training low Bitwidth convolutional neural networks with low Bitwidth gradients. arXiv preprint, arXiv:1606.06160 (2016)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lujun Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 144 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, L. (2022). Self-Regulated Feature Learning via Teacher-free Feature Distillation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19809-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19808-3

  • Online ISBN: 978-3-031-19809-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics