Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks | IEEE Conference Publication | IEEE Xplore