research-article

User guided audio selection from complex sound mixtures

Author:
Paris Smaragdis

Adobe Systems Inc., Newton, MA, USA

Adobe Systems Inc., Newton, MA, USA
View Profile

UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technologyOctober 2009Pages 89–92https://doi.org/10.1145/1622176.1622193

Published:04 October 2009Publication History

UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technology

Pages 89–92

ABSTRACT

In this paper we present a novel interface for selecting sounds in audio mixtures. Traditional interfaces in audio editors provide a graphical representation of sounds which is either a waveform, or some variation of a time/frequency transform. Although with these representations a user might be able to visually identify elements of sounds in a mixture, they do not facilitate object-specific editing (e.g. selecting only the voice of a singer in a song). This interface uses audio guidance from a user in order to select a target sound within a mixture. The user is asked to vocalize (or otherwise sonically represent) the desired target sound, and an automatic process identifies and isolates the elements of the mixture that best relate to the user's input. This way of pointing to specific parts of an audio stream allows a user to perform audio selections which would have been infeasible otherwise.

Supplemental Material

p89-smaragdis.mp4

mp4

7.4 MB

Download

References

Flandrin, P. 1999. Time-Frequency/Time-scale Analysis, in Wavelet Analysis and Its Applications series, Academic Press; ISBN 978-0-12-259870-8.Google Scholar
Makino, S., T.-W. Lee, H. Sawada (eds.) 2007. Blind Speech Separation, in Signals and Communication Technology Series, Springer, ISBN: 978-1-4020-6478-4.Google Scholar
Smaragdis, P. Raj, B. and Shashanka, M.V. 2007. Supervised and Semi-Supervised Separation of Sounds from Single-Channel Mixtures. In proceedings of ICA2009. London, UK. September 2007. Google ScholarDigital Library
Blei, D., Ng, A., Jordan, M. 2003. Latent Dirichlet allocation. in Journal of Machine Learning Research 3: pp. 993--1022. doi:10.1162/jmlr.2003.3.4-5.993. Google ScholarDigital Library
Fevotte, C., R. Gribonval and E. Vincent. 2005. BSS EVAL Toolbox User Guide, IRISA Technical Report 1706, Rennes, France, April 2005.Google Scholar

Index Terms

User guided audio selection from complex sound mixtures

Recommendations

Working with audio: integrating personal tape recorders and desktop computers
CHI '92: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Audio data is rarely used on desktop computers today, although audio is otherwise widely used for communication tasks. This paper describes early work aimed at creating computer tools that support the ways users may want to work with audio data. User ...
Read More
The immersive computer-controlled audio sound theater: history and current trends in multi-modal sound diffusion
SIGGRAPH '09: SIGGRAPH 2009: Talks

Multi-channel sound diffusion has been an essential part of the electronic and computer music performance from the beginnings of the genre in the 1950's. We see early experimentation in sound spatialization in Varèse's use of loudspeaker "paths" in ...
Read More
Analysis of expression in simple musical gestures to enhance audio in interfaces

Expression could play a key role in the audio rendering of virtual reality applications. Its understanding is an ambitious issue in the scientific environment, and several studies have investigated the analysis techniques to detect expression in music ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technology
October 2009
278 pages
ISBN:9781605587455
DOI:10.1145/1622176
General Chair:
Andrew Wilson
Microsoft Research, USA
,
Program Chair:
François Guimbretière
Cornell University, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 October 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
audio interfaces
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate842of3,967submissions,21%
Upcoming Conference
UIST '24

Sponsor:

sigchi

sigchi

UIST '24: The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 518
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

User guided audio selection from complex sound mixtures

UIST '09: Proceedings of the 22nd annual ACM symposium on User interface software and technology

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Working with audio: integrating personal tape recorders and desktop computers

The immersive computer-controlled audio sound theater: history and current trends in multi-modal sound diffusion

Analysis of expression in simple musical gestures to enhance audio in interfaces