Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics | IEEE Conference Publication | IEEE Xplore