Published December 4, 2022 | Version v1
Conference paper Open

Raga Classification From Vocal Performances Using Multimodal Analysis

Description

Work on musical gesture and embodied cognition suggests a rich complementarity between audio and movement information in musical performance. Pose estimation algorithms now make it possible (in contrast to Motion Capture) to collect rich movement information from unconstrained performances of indefinite length. Vocal performances of Indian art music offer the opportunity to carry out multimodal analysis using this information, combing musician's body movements (i.e. pose and gesture data) with audio features. In this work we investigate raga identification from 12 s excerpts from a dataset of 3 singers and 9 ragas using the combination of audio and visual representations that are each semantically salient on their own. While gesture based classification is relatively weak by itself, we show that combining latent representations from the pre-trained unimodal networks can surpass the already high performance obtained by audio features.

Files

000033.pdf

Files (345.7 kB)

Name Size Download all
md5:5a2379fb13780f390ad7207db452f393
345.7 kB Preview Download