Building semantic understanding beyond deep learning from sound and vision | IEEE Conference Publication | IEEE Xplore