Book Chapter Details
Mandatory Fields
Asari, H;Olsson, RK;Pearlmutter, BA;Zador, AM
2007 January
BLIND SPEECH SEPARATION
Sparsification for Monaural Source Separation
SPRINGER-VERLAG BERLIN
BERLIN
Published
1
Optional Fields
BLIND SOURCE SEPARATION SPARSE REPRESENTATION SOUND LOCALIZATION COCKTAIL PARTY RECONSTRUCTION DECONVOLUTION DECOMPOSITION MINIMIZATION DICTIONARY STATISTICS
We explore the use of sparse representations for separation of a monaural mixture signal, where by a sparse representation we mean one where the number of nonzero elements is smaller than might be expected. This is a surprisingly powerful idea, as the ability to express a signal sparsely in some known, and potentially overcomplete, basis constitutes a strong model, while also lending itself to efficient algorithms. In the framework we explore, the representation of the signal is linear in a vector of coefficients. However, because many coefficient values could represent the same signal, the mapping from signal to coefficients is nonlinear, with the coefficients being chosen to simultaneously represent the signal and maximize a measure of sparsity. This conversion of the signal into the coefficients using L-1-optimization is viewed not as a preprocessing step performed before the data reaches the heart of the algorithm, but rather as itself the heart of the algorithm: after the coefficients have been found, only trivial processing remains to be clone. We show how, by suitable choice of overcomplete basis, this framework can use a variety Of Cues (e.g., speaker identity, differential filtering, differential attenuation) to accomplish monaural separation. We also discuss two radically different algorithms for finding the required overcomplete dictionaries: one based on nonnegative matrix factorization of isolated sources, and the other based on end-to-end optimization using automatic differentiation.
1860-4862
387
410
Grant Details