NotesFAQContact Us
Search Tips
ERIC Number: ED565551
Record Type: Non-Journal
Publication Date: 2013
Pages: 152
Abstractor: As Provided
Reference Count: N/A
ISBN: 978-1-3036-7760-1
Score-Informed Musical Source Separation and Reconstruction
Han, Yushen
ProQuest LLC, Ph.D. Dissertation, Indiana University
A systematic approach to retrieve individual parts in a monaural music recording with its score is introduced. We are interested in isolating the accompaniment part by removing the solo part from a recording of concerto music in which a solo instrument is accompanied by an orchestra. We require the music audio, the score, and optionally a sample library of individual notes played in isolation. Our approach is based on explicit knowledge of the musical audio at the semantic level (notes or chords) from an audio-score alignment. Such knowledge allows the spectrogram energy to be decomposed into note-based models that could be trained with the sample library. Our approach can be divided into: (1) "masking" to estimate a solo mask to remove the solo and (2) "reconstruction" to impute the missing harmonics of the orchestra notes that have been inevitably damaged in masking. In "masking," we estimate a 2-dimensional binary mask to classify each time-frequency cell of the short-time Fourier Transform (STFT) spectrogram as either solo or accompaniment in STFT domain. We mainly employ an Expectation Maximization (EM) algorithm to decompose spectrogram magnitude into note-based models. In this process of "erasing" the soloist's contribution to the mixture by applying the mask, the remaining orchestra is degraded. In "reconstruction," we propose a novel technique to repair such degradation. We use a state-space model for each note partial which is represented by a slowing-changing amplitude envelope and an "unwrapped" phase sequence. Such amplitude-phase representation can be computed by Kalman smoothing. It allows us to "transpose" intact partials of the orchestra part onto the degraded time-frequency region. Objective metrics and subjective listening are used on real and synthesized musical audio data for evaluation and parameter optimization. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A