Skip to content

Commit

Permalink
Add support for MDXC models (#50)
Browse files Browse the repository at this point in the history
* Add support for MDXC models

* Updated poetry lockfile to match dependencies

* fix err: CLI does not work

* Fixed MDXC config YAML download, formatted mdxc separator class, bumped version ready for release

* Added progress bar for file downloads

* Added error handling for failed model load due to incomplete/corrupt download

* Fixed outstanding issues with YAML config loading and file download, added todo list for integration tests to write

* Moved load model into own method for consistency with mdxc class

* Refactored MDXC class to use more descriptive variable names, removed dead code, added debug logging and clearer parameters etc.

* Fixed and tested pitch shift logic for MDXC, added CLI params for other MDXC config parameters and tested these

* Added MDXC to readme

* Added thanks!

---------

Co-authored-by: Andrew Beveridge <[email protected]>
  • Loading branch information
zhzhongshi and beveradb authored Mar 15, 2024
1 parent 70ca099 commit ff2e739
Show file tree
Hide file tree
Showing 10 changed files with 528 additions and 143 deletions.
28 changes: 18 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
[![Docker pulls](https://img.shields.io/docker/pulls/beveradb/audio-separator.svg)](https://hub.docker.com/r/beveradb/audio-separator/tags)
[![codecov](https://codecov.io/gh/karaokenerds/python-audio-separator/graph/badge.svg?token=N7YK4ET5JP)](https://codecov.io/gh/karaokenerds/python-audio-separator)

Summary: Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net and VR Arch models available in UVR by @Anjok07 & @aufr33.
Summary: Easy to use audio stem separation from the command line or as a dependency in your own Python project, using the amazing MDX-Net, VR Arch, Demucs and MDXC models available in UVR by @Anjok07 & @aufr33.

Audio Separator is a Python package that allows you to separate an audio file into various stems, using models trained by @Anjok07 for use with UVR (https://github.com/Anjok07/ultimatevocalremovergui).

Expand Down Expand Up @@ -136,8 +136,9 @@ Any file listed in the list models output can be specified (with file extension)
usage: audio-separator [-h] [-v] [-d] [-e] [-l] [--log_level LOG_LEVEL] [-m MODEL_FILENAME] [--output_format OUTPUT_FORMAT] [--output_dir OUTPUT_DIR] [--model_file_dir MODEL_FILE_DIR] [--invert_spect]
[--normalization NORMALIZATION] [--single_stem SINGLE_STEM] [--sample_rate SAMPLE_RATE] [--mdx_segment_size MDX_SEGMENT_SIZE] [--mdx_overlap MDX_OVERLAP] [--mdx_batch_size MDX_BATCH_SIZE]
[--mdx_hop_length MDX_HOP_LENGTH] [--mdx_enable_denoise] [--vr_batch_size VR_BATCH_SIZE] [--vr_window_size VR_WINDOW_SIZE] [--vr_aggression VR_AGGRESSION] [--vr_enable_tta]
[--vr_high_end_process] [--vr_enable_post_process] [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_stem DEMUCS_STEM] [--demucs_segment_size DEMUCS_SEGMENT_SIZE]
[--demucs_shifts DEMUCS_SHIFTS] [--demucs_overlap DEMUCS_OVERLAP] [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED]
[--vr_high_end_process] [--vr_enable_post_process] [--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD] [--demucs_segment_size DEMUCS_SEGMENT_SIZE] [--demucs_shifts DEMUCS_SHIFTS]
[--demucs_overlap DEMUCS_OVERLAP] [--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED] [--mdxc_segment_size MDXC_SEGMENT_SIZE] [--mdxc_use_model_segment_size] [--mdxc_overlap MDXC_OVERLAP]
[--mdxc_batch_size MDXC_BATCH_SIZE] [--mdxc_pitch_shift MDXC_PITCH_SHIFT]
[audio_file]

Separate audio file into different stems.
Expand All @@ -149,11 +150,11 @@ options:
-h, --help show this help message and exit

Info and Debugging:
-v, --version show program's version number and exit
-d, --debug enable debug logging, equivalent to --log_level=debug
-e, --env_info print environment information and exit.
-l, --list_models list all supported models and exit.
--log_level LOG_LEVEL log level, e.g. info, debug, warning (default: info)
-v, --version Show the program's version number and exit.
-d, --debug Enable debug logging, equivalent to --log_level=debug.
-e, --env_info Print environment information and exit.
-l, --list_models List all supported models and exit.
--log_level LOG_LEVEL Log level, e.g. info, debug, warning (default: info).
Separation I/O Params:
-m MODEL_FILENAME, --model_filename MODEL_FILENAME model to use for separation (default: UVR-MDX-NET-Inst_HQ_3.onnx). Example: -m 2_HP-UVR.pth
Expand All @@ -164,7 +165,7 @@ Separation I/O Params:
Common Separation Parameters:
--invert_spect invert secondary stem using spectogram (default: False). Example: --invert_spect
--normalization NORMALIZATION max peak amplitude to normalize input and output audio to (default: 0.9). Example: --normalization=0.7
--single_stem SINGLE_STEM output only single stem, either instrumental or vocals. Example: --single_stem=instrumental
--single_stem SINGLE_STEM output only single stem, e.g. Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other. Example: --single_stem=Instrumental
--sample_rate SAMPLE_RATE modify the sample rate of the output audio (default: 44100). Example: --sample_rate=44100
MDX Architecture Parameters:
Expand All @@ -184,11 +185,17 @@ VR Architecture Parameters:
--vr_post_process_threshold VR_POST_PROCESS_THRESHOLD threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1

Demucs Architecture Parameters:
--demucs_stem DEMUCS_STEM stem to extract from audio file, e.g. Vocals, Drums, Bass, Other (default: All Stems). Example: --demucs_stem=vocals
--demucs_segment_size DEMUCS_SEGMENT_SIZE size of segments into which the audio is split, 1-100. higher = slower but better quality (default: Default). Example: --demucs_segment_size=256
--demucs_shifts DEMUCS_SHIFTS number of predictions with random shifts, higher = slower but better quality (default: 2). Example: --demucs_shifts=4
--demucs_overlap DEMUCS_OVERLAP overlap between prediction windows, 0.001-0.999. higher = slower but better quality (default: 0.25). Example: --demucs_overlap=0.25
--demucs_segments_enabled DEMUCS_SEGMENTS_ENABLED enable segment-wise processing (default: True). Example: --demucs_segments_enabled=False

MDXC Architecture Parameters:
--mdxc_segment_size MDXC_SEGMENT_SIZE larger consumes more resources, but may give better results (default: 256). Example: --mdxc_segment_size=256
--mdxc_use_model_segment_size use model default segment size instead of the value from the config file. Example: --mdxc_use_model_segment_size
--mdxc_overlap MDXC_OVERLAP amount of overlap between prediction windows, 2-50. higher is better but slower (default: 8). Example: --mdxc_overlap=8
--mdxc_batch_size MDXC_BATCH_SIZE larger consumes more RAM but may process slightly faster (default: 1). Example: --mdxc_batch_size=4
--mdxc_pitch_shift MDXC_PITCH_SHIFT shift audio pitch by a number of semitones while processing. may improve output for deep/high vocals. (default: 0). Example: --mdxc_pitch_shift=2
```
### As a Dependency in a Python Project
Expand Down Expand Up @@ -348,6 +355,7 @@ This project is licensed under the MIT [License](LICENSE).
- [Kuielab & Woosung Choi](https://github.com/kuielab) - Developed the original MDX-Net AI code.
- [KimberleyJSN](https://github.com/KimberleyJensen) - Advised and aided the implementation of the training scripts for MDX-Net and Demucs. Thank you!
- [Hv](https://github.com/NaJeongMo/Colab-for-MDX_B) - Helped implement chunks into the MDX-Net AI code. Thank you!
- [zhzhongshi](https://github.com/zhzhongshi) - Helped add support for the MDXC models in `audio-separator`. Thank you!
## Contact 💌
Expand Down
1 change: 1 addition & 0 deletions audio_separator/separator/architectures/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from .mdx_separator import MDXSeparator
from .vr_separator import VRSeparator
from .demucs_separator import DemucsSeparator
from .mdxc_separator import MDXCSeparator
35 changes: 21 additions & 14 deletions audio_separator/separator/architectures/mdx_separator.py
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,28 @@ def __init__(self, common_config, arch_config):
# We haven't implemented support for the checkpoint models here, so we're not using it.
# self.dim_c = 4

# Loading the model for inference
self.load_model()

self.n_bins = 0
self.trim = 0
self.chunk_size = 0
self.gen_size = 0
self.stft = None

self.primary_source = None
self.secondary_source = None
self.audio_file_path = None
self.audio_file_base = None
self.secondary_source_map = None
self.primary_source_map = None

def load_model(self):
"""
Load the model into memory from file on disk, initialize it with config from the model data,
and prepare for inferencing using hardware accelerated Torch device.
"""
self.logger.debug("Loading ONNX model for inference...")

if self.segment_size == self.dim_t:
ort_session_options = ort.SessionOptions()
if self.log_level > 10:
Expand All @@ -107,19 +127,6 @@ def __init__(self, common_config, arch_config):
self.model_run.to(self.torch_device).eval()
self.logger.warning("Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.")

self.n_bins = 0
self.trim = 0
self.chunk_size = 0
self.gen_size = 0
self.stft = None

self.primary_source = None
self.secondary_source = None
self.audio_file_path = None
self.audio_file_base = None
self.secondary_source_map = None
self.primary_source_map = None

def separate(self, audio_file_path):
"""
Separates the audio file into primary and secondary sources based on the model's configuration.
Expand Down
Loading

0 comments on commit ff2e739

Please sign in to comment.