Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDX23C DrumSep model #170

Merged
merged 1 commit into from
Dec 21, 2024
Merged

MDX23C DrumSep model #170

merged 1 commit into from
Dec 21, 2024

Conversation

Eddycrack864
Copy link
Contributor

@Eddycrack864 Eddycrack864 commented Dec 20, 2024

MDX23C DrumSep by aufr33 & jarredou

Checkpoint Link: ckpt
Config Link: config

SDR: 10.8059

This is a 6 stem model: kick, snare, toms, hh (hi hats), ride, crash.

Note:

This model needs a workaround as currently the MDX23C separator only supports 2 stems, so if you use this model you will only get 2 stems instead of 6 stems.

@Eddycrack864
Copy link
Contributor Author

This is the last model that is missing from all the models that I know of that are actually usable and that exist (I think).

I will do PRs periodically if new models are published.

@beveradb beveradb merged commit c610c53 into nomadkaraoke:main Dec 21, 2024
9 checks passed
@beveradb
Copy link
Collaborator

Nice one, thanks! I've tested and merged this; it's obviously not ideal as it says it produces 6 stems but doesn't actually with the current code, but it was still able to separate an existing "drums" (mixed) stem from demucs into kick and everything else, which is technically still already a little bit of value add so I've added it.

If you'd be up for digging to find some inference code which actually works correctly with this model (e.g. does it work fully in UVR? or is there perhaps some other inference code somewhere else which we could look at) to understand how it's meant to be used that would be helpful to move towards getting it to output all stems correctly!

@Eddycrack864
Copy link
Contributor Author

About your questions:
Yes, this model works fully in UVR. It maps all the stems in the configuration file.
image

About the code that makes it work, I've been looking and reviewing the UVR and MSST code (this model works in both projects).

In UVR, the part that makes it work is here:
https://github.com/Anjok07/ultimatevocalremovergui/blob/376d50af8fa3dd71bcec4194f3b1e2f496315bd9/separate.py#L674 (a for loop that processes each audio using the stems found)

In MSST, they use something like UVR:
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/utils.py#L441 (return the list of target instruments based on the configuration)
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/inference.py#L52 (Create a copy of the prefer_target_instrument function list)
https://github.com/ZFTurbo/Music-Source-Separation-Training/blob/dfffc51153fec1b4aff6838375f1e9d88cf2b94a/inference.py#L93 (for loop but kinda different from the UVR5 one)

Well, that's what I more or less understood when analyzing the code 😅

Also, I tried to make it work by adding a for loop and it worked but, the roformers stopped working (tensor mismatch), this was my change:

I changed the demix function of mdxc_separator.py

OG:

if num_stems > 1 or self.is_primary_stem_main_target:

My change:

sources = {}
   if num_stems > 1 or self.is_primary_stem_main_target:
       for key, value in zip(self.model_data_cfgdict.training.instruments, inferenced_outputs.cpu().detach().numpy()):
           self.logger.debug(f"Processing instrument: {key}")
           if self.pitch_shift != 0:
               sources[key] = self.pitch_fix(value, sample_rate, orig_mix)
           else:
               sources[key] = value

           # save every processed stem
           if not self.output_single_stem or self.output_single_stem.lower() == key.lower():
               output_path = self.get_stem_output_path(key, None)
               self.logger.info(f"Saving {key} stem to {output_path}...")
               self.final_process(output_path, sources[key], key)

       return sources
   else:
       self.logger.debug("Processing single source...")

The thing is I'm not sure how to handle this part

if self.is_primary_stem_main_target:
                self.logger.debug(f"Primary stem: {self.primary_stem_name} is main target, detaching and matching array shapes if necessary...")
                if sources[self.primary_stem_name].shape[1] != orig_mix.shape[1]:
                    sources[self.primary_stem_name] = spec_utils.match_array_shapes(sources[self.primary_stem_name], orig_mix)
                sources[self.secondary_stem_name] = orig_mix - sources[self.primary_stem_name]

            self.logger.debug("Deleting inferenced outputs to free up memory")
            del inferenced_outputs

            self.logger.debug("Returning separated sources")
            return sources

I hope this information can be useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants