References
Simonyan K.,Zisserman A. :Two-stream convolutional networks for action recognition in videos. Proceedings of the 27th International Conference on Neural Information Processing Systems. 568-576 (2014).https://dl.acm.org/doi/10.5555/2968826.2968890
2 Feichtenhofer C., Pinz A., Wildes R P.: Spatiotemporal multiplier networks for video action recognition. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . 4768-4777 (2017).https://doi.org/10.1109/CVPR.2017.787
3 Wang L., Xiong Y., Wang Z., Wang Z.,Qiao Y.,Lin D.,Tang X.,Gool L. Temporal segment networks: Towards good practices for deep action recognition. Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016. 20-36 (2016).https://doi.org/10.48550/arXiv.1608.00859
  1. Tran D., Bourdev L., Fergus R., Torresani L.,Paluri M.: Learning spatiotemporal features with 3d convolutional networks.2015 IEEE International Conference on Computer Vision (ICCV). 4489-4497 (2015).https://doi.org/10.1109/ICCV.2015.510
  2. Tran D., Ray J., Shou Z.,Chang S.,Paluri M.: Convnet architecture search for spatiotemporal feature learning [EB/OL]. The IEEE International Conference on Computer Vision and Pattern Recognition .1-12(2017).https://doi.org/10.48550/arXiv.1708. 05038
  3. Diba A., Fayyaz M., Sharma V., Karami A.,Arzani M., Yousefzadeh R.,Gool L. Temporal 3d convnets: New architecture and transfer learning for video classification [EB/OL]. The IEEE International Conference on Computer Vision and Pattern Recognition .1-9(2017).https://doi.org/10.48550/ arXiv. 1711. 08200
  4. Girdhar R., Carreira J., Doersch C., Zisserman A.: Video action transformer network. The IEEE International Conference on Computer Vision and Pattern Recognition . 244-253 (2019). https://doi.org/10.48550/arXiv.1812.02707
  5. Bertasius G., Wang H., Torresani L.: Is space-time attention all you need for video understanding. The IEEE International Conference on Computer Vision and Pattern Recognition . 2(3)-4 (2021). https://doi.org/10.48550/arXiv.2102.05095
  6. Soomro K., Zamir A R., Shah M. Ucf101: A dataset of 101 human actions classes from videos in the wild[EB/OL]. The IEEE International Conference on Computer Vision and Pattern Recognition . 1-7(2012).https://doi.org/10.48550/arXiv. 1212.0402
  7. Kuehne H., Jhuang H., Garrote E.,Poggio T., Serre T. Hmdb: A large video database for human motion recognition. 2011 International Conference on Computer Vision .2556-2563(2021). https://doi.org/10.1109/ICCV.2011.6126543
  8. Carreira J., Zisserman A. : Quo vadis, action recognition? a new model and the kinetics dataset. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) .6299-6308 (2017).https://doi.org/10.1109/CVPR.2017.502
  9. Goyal R., Ebrahimi Kahou S., Michalski V., Materzyńska J.,Westphal S.,Kim H.,Haenel V.,Fruend I.,Yianilos P.,Mueller-Freitag M.,Hoppe F., Thurau C.,Bax I.,Memisevic R.: The ”something something” video database for learning and evaluating visual common sense.2017 IEEE International Conference on Computer Vision (ICCV). 5842-5850 (2017).https://doi.org/ 10.48550/arXiv.1706.04261
  10. Lin J., Gan C., Han S. Tsm: Temporal shift module for efficient video understanding. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) .7083-7093(2019).https://10.1109/ICCV. 2019.00718
  11. Li Y., Ji B., Shi X., Zhang J.,Kang B.,Wang L.: Tea: Temporal excitation and aggregation for action recognition.2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) .909-918(2019).https://10.1109/cvpr42600.2020.00099