MDX23C DrumSep model #170

Eddycrack864 · 2024-12-20T04:29:53Z

MDX23C DrumSep by aufr33 & jarredou

Checkpoint Link: ckpt
Config Link: config

SDR: 10.8059

This is a 6 stem model: kick, snare, toms, hh (hi hats), ride, crash.

Note:

This model needs a workaround as currently the MDX23C separator only supports 2 stems, so if you use this model you will only get 2 stems instead of 6 stems.

Eddycrack864 · 2024-12-20T04:33:45Z

This is the last model that is missing from all the models that I know of that are actually usable and that exist (I think).

I will do PRs periodically if new models are published.

beveradb · 2024-12-21T03:27:01Z

Nice one, thanks! I've tested and merged this; it's obviously not ideal as it says it produces 6 stems but doesn't actually with the current code, but it was still able to separate an existing "drums" (mixed) stem from demucs into kick and everything else, which is technically still already a little bit of value add so I've added it.

If you'd be up for digging to find some inference code which actually works correctly with this model (e.g. does it work fully in UVR? or is there perhaps some other inference code somewhere else which we could look at) to understand how it's meant to be used that would be helpful to move towards getting it to output all stems correctly!

Eddycrack864 · 2024-12-28T23:20:16Z

About your questions:
Yes, this model works fully in UVR. It maps all the stems in the configuration file.

About the code that makes it work, I've been looking and reviewing the UVR and MSST code (this model works in both projects).

In UVR, the part that makes it work is here:
https://github.com/Anjok07/ultimatevocalremovergui/blob/376d50af8fa3dd71bcec4194f3b1e2f496315bd9/separate.py#L674 (a for loop that processes each audio using the stems found)

In MSST, they use something like UVR:
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/utils.py#L441 (return the list of target instruments based on the configuration)
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/inference.py#L52 (Create a copy of the prefer_target_instrument function list)
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/inference.py#L93 (for loop but kinda different from the UVR5 one)

Well, that's what I more or less understood when analyzing the code 😅

Also, I tried to make it work by adding a for loop and it worked but, the roformers stopped working (tensor mismatch), this was my change:

I changed the demix function of mdxc_separator.py

OG:

python-audio-separator/audio_separator/separator/architectures/mdxc_separator.py

Line 327 in 6ae604f

if num_stems > 1 or self.is_primary_stem_main_target:

My change:

sources = {}
   if num_stems > 1 or self.is_primary_stem_main_target:
       for key, value in zip(self.model_data_cfgdict.training.instruments, inferenced_outputs.cpu().detach().numpy()):
           self.logger.debug(f"Processing instrument: {key}")
           if self.pitch_shift != 0:
               sources[key] = self.pitch_fix(value, sample_rate, orig_mix)
           else:
               sources[key] = value

           # save every processed stem
           if not self.output_single_stem or self.output_single_stem.lower() == key.lower():
               output_path = self.get_stem_output_path(key, None)
               self.logger.info(f"Saving {key} stem to {output_path}...")
               self.final_process(output_path, sources[key], key)

       return sources
   else:
       self.logger.debug("Processing single source...")

The thing is I'm not sure how to handle this part

if self.is_primary_stem_main_target:
                self.logger.debug(f"Primary stem: {self.primary_stem_name} is main target, detaching and matching array shapes if necessary...")
                if sources[self.primary_stem_name].shape[1] != orig_mix.shape[1]:
                    sources[self.primary_stem_name] = spec_utils.match_array_shapes(sources[self.primary_stem_name], orig_mix)
                sources[self.secondary_stem_name] = orig_mix - sources[self.primary_stem_name]

            self.logger.debug("Deleting inferenced outputs to free up memory")
            del inferenced_outputs

            self.logger.debug("Returning separated sources")
            return sources

I hope this information can be useful

drumsep model

2a9175c

beveradb merged commit c610c53 into nomadkaraoke:main Dec 21, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDX23C DrumSep model #170

MDX23C DrumSep model #170

Eddycrack864 commented Dec 20, 2024 •

edited

Loading

Eddycrack864 commented Dec 20, 2024

beveradb commented Dec 21, 2024

Eddycrack864 commented Dec 28, 2024

MDX23C DrumSep model #170

MDX23C DrumSep model #170

Conversation

Eddycrack864 commented Dec 20, 2024 • edited Loading

MDX23C DrumSep by aufr33 & jarredou

Note:

Eddycrack864 commented Dec 20, 2024

beveradb commented Dec 21, 2024

Eddycrack864 commented Dec 28, 2024

Eddycrack864 commented Dec 20, 2024 •

edited

Loading