ABSTRACT
In this paper we present a novel interface for selecting sounds in audio mixtures. Traditional interfaces in audio editors provide a graphical representation of sounds which is either a waveform, or some variation of a time/frequency transform. Although with these representations a user might be able to visually identify elements of sounds in a mixture, they do not facilitate object-specific editing (e.g. selecting only the voice of a singer in a song). This interface uses audio guidance from a user in order to select a target sound within a mixture. The user is asked to vocalize (or otherwise sonically represent) the desired target sound, and an automatic process identifies and isolates the elements of the mixture that best relate to the user's input. This way of pointing to specific parts of an audio stream allows a user to perform audio selections which would have been infeasible otherwise.
Supplemental Material
- Flandrin, P. 1999. Time-Frequency/Time-scale Analysis, in Wavelet Analysis and Its Applications series, Academic Press; ISBN 978-0-12-259870-8.Google Scholar
- Makino, S., T.-W. Lee, H. Sawada (eds.) 2007. Blind Speech Separation, in Signals and Communication Technology Series, Springer, ISBN: 978-1-4020-6478-4.Google Scholar
- Smaragdis, P. Raj, B. and Shashanka, M.V. 2007. Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures. In proceedings of ICA2009. London, UK. September 2007. Google ScholarDigital Library
- Blei, D., Ng, A., Jordan, M. 2003. Latent Dirichlet allocation. in Journal of Machine Learning Research 3: pp. 993--1022. doi:10.1162/jmlr.2003.3.4-5.993. Google ScholarDigital Library
- Fevotte, C., R. Gribonval and E. Vincent. 2005. BSS EVAL Toolbox User Guide, IRISA Technical Report 1706, Rennes, France, April 2005.Google Scholar
Index Terms
- User guided audio selection from complex sound mixtures
Recommendations
Working with audio: integrating personal tape recorders and desktop computers
CHI '92: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsAudio data is rarely used on desktop computers today, although audio is otherwise widely used for communication tasks. This paper describes early work aimed at creating computer tools that support the ways users may want to work with audio data. User ...
The immersive computer-controlled audio sound theater: history and current trends in multi-modal sound diffusion
SIGGRAPH '09: SIGGRAPH 2009: TalksMulti-channel sound diffusion has been an essential part of the electronic and computer music performance from the beginnings of the genre in the 1950's. We see early experimentation in sound spatialization in Varèse's use of loudspeaker "paths" in ...
Analysis of expression in simple musical gestures to enhance audio in interfaces
Expression could play a key role in the audio rendering of virtual reality applications. Its understanding is an ambitious issue in the scientific environment, and several studies have investigated the analysis techniques to detect expression in music ...
Comments