musisep.dictsep package¶
Submodules¶
musisep.dictsep.__main__ module¶
Wrapper for the dictionary learning algorithm. When invoked, the audio sources in the supplied audio file are separated.
-
musisep.dictsep.__main__.
main
(mixed_soundfile, orig_soundfiles, inst_num, tone_num, pexp, qexp, har, sigmas, sampdist, spectheight, logspectheight, minfreq, maxfreq, out_name, runs, lifetime, num_dicts, mask, plot_range)[source]¶ Wrapper function for the dictionary learning algorithm.
Parameters: - mixed_soundfile (string) – Name of the mixed input file
- orig_soundfiles (list of string or NoneType) – Names of the files with the isolated instrument tracks or None
- inst_num (int) – Number of instruments
- tone_num (int) – Maximum number of simultaneous tones
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- har (int) – Number of harmonics
- sigmas (float) – Number of standard deviations after which to cut the window/kernel
- sampdist (int) – Time intervals to sample the spectrogram
- spectheight (int) – Height of the linear-frequency spectrogram
- logspectheight (int) – Height of the log-frequency spectrogram
- minfreq (float) – Minimum frequency in Hz to be represented (included)
- maxfreq (float) – Maximum frequency in Hz to be represented (excluded)
- out_name (string) – Prefix for the file names
- runs (int) – Number of training iterations to perform
- lifetime (int) – Number of steps after which to renew the dictionary
- num_dicts (int) – Number of different dictionaries to generate and train
- mask (bool) – Whether to apply spectral masking
- plot_range (slice or NoneType) – part of the spectrogram to plot
musisep.dictsep.adam_b module¶
Module containing the modified ADAM algorithm.
-
class
musisep.dictsep.adam_b.
Adam_B
(init, lo=0, hi=1, alpha=0.0001, beta1=0.9, beta2=0.999, eps=1e-08)[source]¶ Bases:
object
Object for the ADAM algorithm with bounds, adapted for the update of instrument dictionaries. Each column refers to one instruments, and the harmonics are in rows.
Parameters: - init (array-like) – Initial value for the dictionary
- lo (float) – Lower bound for the dictionary entries
- hi (float) – Upper bound for the dictionary entries
- alpha (float) – Global step-size
- beta1 (float) – Inertia of the first moment estimator
- beta2 (float) – Inertia of the second moment estimator
- eps (float) – Value to add in the denominator to avoid division by zero
musisep.dictsep.dictlearn module¶
Module for the training of the dictionary. When invoked, a performance test on artificial data is performed.
-
class
musisep.dictsep.dictlearn.
Learner
(fsigma, tone_num, inst_num, har, m, lifetime, pexp, qexp, init=None)[source]¶ Bases:
object
Container object for the dictionary learning process.
Parameters: - fsigma (float) – Standard deviation (frequency)
- tone_num (int) – Maximum number of simultaneous tones
- inst_num (int) – Number of instruments in the dictionary
- har (int) – Number of harmonics
- m (int) – Height of the log-frequency spectrogram
- lifetime (int) – Number of steps after which to renew the dictionary
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
-
get_dict
()[source]¶ Get the active part of the dictionary.
Returns: inst_dict – Dictionary with inst_num columns Return type: ndarray
-
musisep.dictsep.dictlearn.
gen_random_inst
(har)[source]¶ Generate random harmonic amplitudes according to a Par(1,2) distribution.
Parameters: har (int) – Number of harmonics Returns: inst – Harmonic amplitudes for one instrument, unified to an interval of [0,1] Return type: ndarray
-
musisep.dictsep.dictlearn.
gen_random_inst_dict
(har, inst_num)[source]¶ Generate a random instrument dictionary according to a Par(1,2) distribution.
Parameters: - har (int) – Number of harmonics
- inst_num (int) – Number of instruments
Returns: inst_dict – Dictionary with instruments in columns, unified to an interval of [0,1]
Return type: ndarray
-
musisep.dictsep.dictlearn.
learn_spect_dict
(spect, fsigma, tone_num, inst_num, pexp, qexp, har, m, minfreq, maxfreq, runs, lifetime)[source]¶ Train the dictionary containing the relative amplitudes of the harmonics.
Parameters: - spect (array_like) – Original log-frequency spectrogram of the recording
- fsigma (float) – Standard deviation (frequency)
- tone_num (int) – Maximum number of simultaneous tones
- inst_num (int) – Number of instruments in the dictionary
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- har (int) – Number of harmonics
- m (int) – Height of the log-frequency spectrogram
- minfreq (float) – Minimum frequency in Hz to be represented (included)
- maxfreq (float) – Maximum frequency in Hz to be represented (excluded)
- runs (int) – Number of training iterations to perform
- lifetime (int) – Number of steps after which to renew the dictionary
Returns: inst_dict – Dictionary containing the relative amplitudes of the harmonics
Return type: ndarray
-
musisep.dictsep.dictlearn.
mask_spectrums
(spects, orig_spect)[source]¶ Mask the synthesized spectrograms with the original spectrogram.
Parameters: - spects (list of array_like) – List of synthesized spectrograms
- orig_spect (array_like) – Original spectrogram
Returns: - spectrums (list of ndarray) – Masked spectrograms
- mask_spect (ndarray) – Array mask
-
musisep.dictsep.dictlearn.
stoch_grad
(y, inst_dict, tone_num, adam, fsigma, harscale, baseshift, inst_spect, pexp, qexp)[source]¶ Perform a dictionary training step.
Parameters: - y (array_like) – Log-frequency spectrum to represent
- inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
- tone_num (int) – Maximum number of simultaneous tones
- adam (Adam_B) – Container object for the ADAM optimizer
- fsigma (float) – Standard deviation (frequency)
- harscale (float) – Scaling factor
- baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
- inst_spect (array_like) – Spectra of the instruments, in the columns
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
Returns: - inst_dict (ndarray) – Updated dictionary
- reconstruction (ndarray) – Synthesized spectrum
- inst_amps (ndarray) – Summed amplitudes for each instruments
-
musisep.dictsep.dictlearn.
synth_spect
(spect, tone_num, inst_dict, fsigma, spectheight, pexp, qexp, minfreq, maxfreq)[source]¶ Separate and synthesize the spectrograms from the original spectrogram.
Parameters: - spect (array_like) – Original log-frequency spectrogram of the recording
- tone_num (int) – Maximum number of simultaneous tones
- inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
- fsigma (float) – Standard deviation (frequency)
- spectheight (int) – Height of the linear-frequency spectrograms
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
- maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
Returns: - dict_spectrum (ndarray) – Synthesized log-frequency spectrogram with all instruments
- inst_spectrums (list of ndarray) – List of synthesized log-frequency spectrograms for the instruments
- dict_spectrum_lin (ndarray) – Synthesized linear-frequency spectrogram with all instruments
- inst_spectrums_lin (list of ndarray) – List of synthesized linear-frequency spectrograms for the instruments
-
musisep.dictsep.dictlearn.
test_learn
(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime)[source]¶ Evaluate the performance of the dictionary learning algorithm via artificial spectra.
Parameters: - fsigma (float) – Width of the Gaussians in the log-frequency spectrogram
- tone_num (int) – Maximum number of simultaneous tones
- inst_num (int) – Number of instruments in the dictionaries
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- har (int) – Number of harmonics
- m (int) – Height of the log-frequency spectrogram
- runs (int) – Number of training iterations to perform
- test_samples (int) – Number of test spectra to generate
- lifetime (int) – Number of steps after which to renew the dictionary
Returns: measures – Array containing, in that order, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary
Return type: ndarray
-
musisep.dictsep.dictlearn.
test_learn_multi
(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime, num_dicts)[source]¶ Evaluate the performance of the dictionary learning algorithm via artificial spectra.
Parameters: - fsigma (float) – Width of the Gaussians in the log-frequency spectrogram
- tone_num (int) – Maximum number of simultaneous tones
- inst_num (int) – Number of instruments in the dictionaries
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- har (int) – Number of harmonics
- m (int) – Height of the log-frequency spectrogram
- runs (int) – Number of training iterations to perform
- test_samples (int) – Number of test spectra to generate
- lifetime (int) – Number of steps after which to renew the dictionary
- num_dicts (int) – Number of different dictionaries to generate and train
Returns: measures – Array containing, in the rows, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary
Return type: ndarray
musisep.dictsep.exptool module¶
Back-end module for the generation of spectrograms and their gradients.
-
musisep.dictsep.exptool.
inst_scale
()¶
-
musisep.dictsep.exptool.
inst_scale_grad
()¶
-
musisep.dictsep.exptool.
inst_shift
()¶
-
musisep.dictsep.exptool.
inst_shift_dict_grad
()¶
-
musisep.dictsep.exptool.
inst_shift_grad
()¶
musisep.dictsep.pursuit module¶
Module for the sparse pursuit algorithm and its helper functions.
-
class
musisep.dictsep.pursuit.
Peaks
(amps, shifts, sigmas, spreads, insts)[source]¶ Bases:
object
Object to represent the parameters for the peaks in the spectrogram.
Parameters: - amps (array_like) – Amplitudes
- shifts (array_like) – Fundamental frequencies
- sigmas (array_like) – Standard deviations (frequency)
- spreads (array_like) – Inharmonicities
- insts (array_like) – Instrument numbers
-
classmethod
empty
()[source]¶ Construct an empty Peaks object.
Returns: A Peaks object with zero peaks Return type: Peaks
-
classmethod
from_array
(array, insts)[source]¶ Construct a Peaks object from an array.
Parameters: - array (array_like) – Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies
- insts (array_like) – Instrument numbers
-
get_array
()[source]¶ Returns: Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies Return type: array_like
-
musisep.dictsep.pursuit.
calc_harscale
(minfreq, maxfreq, numfreqs)[source]¶ Calculate the scaling factor of the frequency axis for the log-frequency spectrogram.
Parameters: - minfreq (float) – Minimum frequency to be represented (included)
- maxfreq (float) – Maximum frequency to be represented (excluded)
- numfreqs (int) – Intended height of the spectrogram
Returns: harscale – Scaling factor
Return type: float
-
musisep.dictsep.pursuit.
fft_selector
(y, prenum, baseshift, inst_spect, qexp)[source]¶ Callback selector to find fundamental frequencies based on the correlation of the spectrum with the instrument spectra.
Parameters: - y (array_like) – Spectrum
- prenum (int) – Number of peaks to consider
- baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
- inst_spect (array_like) – Spectra of the instruments, in the columns
- qexp (float) – Exponent to be applied on the spectrum
Returns: - amps (array_like) – Amplitudes
- shifts (array_like) – Fundamental frequencies
- insts (array_like) – Instrument numbers
-
musisep.dictsep.pursuit.
gen_inst_spect
(baseshift, fsigma, inst_dict, harscale, pexp, qexp, m)[source]¶ Generate an instrument log-frequency spectrum.
Parameters: - baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
- fsigma (float) – Standard deviation (frequency)
- inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
- harscale (float) – Scaling factor
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- m (int) – Height of the spectrogram
Returns: inst_spect – Spectra of the instruments, in the columns
Return type: ndarray
-
musisep.dictsep.pursuit.
inst_scale
(peaks, inst_dict, pexp, m, n)[source]¶ Synthesize the linear-frequency spectrum.
Parameters: - peaks (Peaks) – Peak parameters
- inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- pexp (float) – Exponent for the addition of sinusoids
- m (int) – Height of the spectrogram
- n (int) – Number of instruments
Returns: Linear-frequency spectrum
Return type: ndarray
-
musisep.dictsep.pursuit.
inst_shift
(peaks, inst_dict, harscale, pexp, m, n)[source]¶ Synthesize the log-frequency spectrum.
Parameters: - peaks (Peaks) – Peak parameters
- inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- harscale (float) – Scaling factor
- pexp (float) – Exponent for the addition of sinusoids
- m (int) – Height of the spectrogram
- n (int) – Number of instruments
Returns: Log-frequency spectrum
Return type: ndarray
-
musisep.dictsep.pursuit.
inst_shift_dict_grad
(peak_array, insts, inst_dict, harscale, pexp, qexp, m, n, y)[source]¶ Least-squares gradient function for the log-frequency spectrum w.r.t. the dictionary.
Parameters: - peak_array (array_like) – Peak parameters in array form
- insts (array_like) – Instrument numbers
- inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- harscale (float) – Scaling factor
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- m (int) – Height of the spectrogram
- n (int) – Number of instruments
- y (array_like) – Spectrum to compare with
Returns: grad – Least-squares gradient w.r.t. the dictionary
Return type: ndarray
-
musisep.dictsep.pursuit.
inst_shift_grad
(peak_array, insts, inst_dict, harscale, pexp, qexp, m, n, y)[source]¶ Least-squares gradient function for the log-frequency spectrum w.r.t. the parameters.
Parameters: - peak_array (array_like) – Peak parameters in array form
- insts (array_like) – Instrument numbers
- inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- harscale (float) – Scaling factor
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- m (int) – Height of the spectrogram
- n (int) – Number of instruments
- y (array_like) – Spectrum to compare with
Returns: grad – Least-squares gradient
Return type: ndarray
-
musisep.dictsep.pursuit.
inst_shift_obj
(peak_array, insts, inst_dict, harscale, pexp, qexp, m, n, y)[source]¶ Least-squares objective function for the log-frequency spectrum.
Parameters: - peak_array (array_like) – Peak parameters in array form
- insts (array_like) – Instrument numbers
- inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- harscale (float) – Scaling factor
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- m (int) – Height of the spectrogram
- n (int) – Number of instruments
- y (array_like) – Spectrum to compare with
Returns: obj – Least-squares error
Return type: float
-
musisep.dictsep.pursuit.
make_bounds
(fsigma, length)[source]¶ Compute sensible bounds for the peak parameters.
Parameters: - fsigma (float) – Standard deviation (frequency)
- length (int) – Number of instruments
Returns: bounds – Bounds for the optimizer
Return type: list of tuple
-
musisep.dictsep.pursuit.
max_selector
(y, prenum, n)[source]¶ Callback selector to find peaks based on the local maxima which are dominant in a discrete interval, viewed from its midpoint.
Parameters: - y (array_like) – Spectrum
- prenum (int) – Number of peaks to consider
- n (int) – Length of the interval
Returns: - amps (array_like) – Amplitudes
- shifts (array_like) – Frequencies
- insts (array_like) – Instrument numbers (always 0)
-
musisep.dictsep.pursuit.
peak_pursuit
(y, num, prenum, runs, inst_dict, fsigma, harscale, selector, selector_args, pexp, qexp, beta=1, init=None)[source]¶ Sparse pursuit algorithm for the identification of peaks in a spectrum.
Parameters: - y (array_like) – Spectrum
- num (int) – Maximum number of peaks
- prenum (int) – Number of new peaks to consider per step
- inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
- fsigma (float) – Standard deviation (frequency)
- harscale (float) – Scaling factor
- selector (function) – Callback selector accepting y and prenum as arguments
- selector_args (sequence) – Extra arguments to pass to the selector
- pexp (float) – Exponent for the addition of sinusoids
- qexp (float) – Exponent to be applied on the spectrum
- beta (float) – Residual reduction factor
- init (Peaks) – Initial value for the peaks
Returns: - peaks (Peaks) – Identified peaks
- reconstruction (ndarray) – Synthesized spectrum