Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to increase speed? #32

Closed
YBMENACE opened this issue Jan 12, 2024 · 20 comments
Closed

How to increase speed? #32

YBMENACE opened this issue Jan 12, 2024 · 20 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@YBMENACE
Copy link

I have a very large gpu 80GB i want to increase speed increasing batch doesn't help at all . Thanks

@beveradb
Copy link
Collaborator

What have you tried so far? Could you show me your logs from audio-separator running against one of your test files so I can see some details about your system, e.g. whether it's using CUDA already or not?

@YBMENACE
Copy link
Author

2024-01-12 17:29:06.296 - INFO - cli - Separator version 0.13.0 beginning with input file: audio.wav
2024-01-12T17:29:08.531031317Z:
2024-01-12 17:29:08.530 - INFO - separator - Separator version 0.13.0 instantiating with output_dir: processed/hdemucs_mmi/audio, output_format: wav
2024-01-12T17:29:08.531146939Z:
2024-01-12 17:29:08.530 - DEBUG - separator - Normalization threshold set to 0.9, waveform will lowered to this max amplitude to avoid clipping.
2024-01-12T17:29:08.531162699Z:
2024-01-12 17:29:08.530 - DEBUG - separator - Denoising enabled, model will be run twice to reduce noise in output audio.
2024-01-12 17:29:08.530 - DEBUG - separator - Separation settings set: sample_rate=44100, hop_length=1024, segment_size=256, overlap=0.25, batch_size=200
2024-01-12T17:29:08.531172149Z:
2024-01-12 17:29:08.530 - INFO - separator - Checking hardware specifics to configure acceleration
2024-01-12T17:29:08.531217460Z:
2024-01-12 17:29:08.531 - INFO - separator - Operating System: Linux #1 SMP Wed Sep 6 21:10:58 UTC 2023
2024-01-12T17:29:08.531236110Z:
2024-01-12 17:29:08.531 - INFO - separator - System: Linux Node: runc Release: 5.4.254-170.358.amzn2.x86_64 Machine: x86_64 Proc: x86_64
2024-01-12 17:29:08.531 - INFO - separator - Python Version: 3.10.12
2024-01-12T17:29:08.531353242Z:
2024-01-12 17:29:08.531 - INFO - separator - ONNX Runtime GPU package installed with version: 1.17.0
2024-01-12T17:29:08.651032904Z:
2024-01-12 17:29:08.650 - DEBUG - separator - Python package: onnxruntime-silicon not installed
2024-01-12T17:29:08.770596723Z:
2024-01-12 17:29:08.770 - DEBUG - separator - Python package: onnxruntime not installed
2024-01-12T17:29:08.770704015Z:
2024-01-12 17:29:08.770 - INFO - separator - Torch package installed with version: 2.1.2
2024-01-12T17:29:08.770745585Z:
2024-01-12 17:29:08.770 - INFO - separator - Torchvision package installed with version: 0.16.2
2024-01-12T17:29:08.888439825Z:
2024-01-12 17:29:08.888 - DEBUG - separator - Python package: torchaudio not installed
2024-01-12T17:29:08.920224045Z:
2024-01-12 17:29:08.919 - INFO - separator - CUDA is available in Torch, setting Torch device to CUDA
2024-01-12T17:29:08.920288766Z:
2024-01-12 17:29:08.920 - INFO - separator - ONNXruntime has CUDAExecutionProvider available, enabling acceleration
2024-01-12T17:29:08.920300626Z:
2024-01-12 17:29:08.920 - DEBUG - separator - Apple Silicon MPS/CoreML not available in Torch installation. If you expect this to work, please see README
2024-01-12 17:29:08.920 - INFO - separator - Loading model UVR-MDX-NET-Voc_FT...
2024-01-12T17:29:08.920377557Z:
2024-01-12 17:29:08.920 - DEBUG - separator - Model path set to ./mdx/UVR-MDX-NET-Voc_FT.onnx
2024-01-12T17:29:08.923009198Z:
2024-01-12 17:29:08.922 - DEBUG - separator - Reading model settings...
2024-01-12T17:29:09.037803563Z:
2024-01-12 17:29:09.037 - DEBUG - separator - Model ./mdx/UVR-MDX-NET-Voc_FT.onnx has hash 77d07b2667ddf05b9e3175941b4454a0
2024-01-12T17:29:09.037911945Z:
2024-01-12 17:29:09.037 - DEBUG - separator - Model data path set to ./mdx/model_data.json
2024-01-12T17:29:09.039536410Z:
2024-01-12 17:29:09.039 - DEBUG - separator - Loading model data...
2024-01-12T17:29:09.043078786Z:
2024-01-12 17:29:09.042 - DEBUG - separator - Model data loaded: {'compensate': 1.021, 'mdx_dim_f_set': 3072, 'mdx_dim_t_set': 8, 'mdx_n_fft_scale_set': 7680, 'primary_stem': 'Vocals'}
2024-01-12T17:29:09.043117737Z:
2024-01-12 17:29:09.042 - DEBUG - separator - Model params: primary_stem=Vocals, secondary_stem=Instrumental
2024-01-12T17:29:09.043131557Z:
2024-01-12 17:29:09.042 - DEBUG - separator - Model params: batch_size=200, compensate=1.021, segment_size=256, dim_f=3072, dim_t=256
2024-01-12T17:29:09.043140217Z:
2024-01-12 17:29:09.043 - DEBUG - separator - Model params: n_fft=7680, hop=1024
2024-01-12T17:29:09.043241658Z:
2024-01-12 17:29:09.043 - DEBUG - separator - Loading ONNX model for inference...
2024-01-12T17:29:09.415941466Z:
2024-01-12 17:29:09.415 - DEBUG - separator - Model loaded successfully using ONNXruntime inferencing session.
2024-01-12 17:29:09.415 - DEBUG - separator - Loading model completed.
2024-01-12T17:29:09.416047958Z:
2024-01-12 17:29:09.415 - INFO - separator - Load model duration: 00:00:00
2024-01-12T17:29:09.416060088Z:
2024-01-12 17:29:09.415 - INFO - separator - Starting separation process for audio_file_path: audio.wav
2024-01-12 17:29:09.415 - DEBUG - separator - Preparing mix...
2024-01-12 17:29:09.415 - DEBUG - separator - Loading audio from file: audio.wav
2024-01-12T17:29:15.120852864Z:
2024-01-12 17:29:15.120 - DEBUG - separator - Audio loaded. Sample rate: 44100, Audio shape: (2, 21051392)
2024-01-12T17:29:15.133382271Z:
2024-01-12 17:29:15.133 - DEBUG - separator - Audio file is valid and contains data.
2024-01-12 17:29:15.133 - DEBUG - separator - Mix preparation completed.
2024-01-12T17:29:15.133393471Z:
2024-01-12 17:29:15.133 - DEBUG - separator - Normalizing mix before demixing...
2024-01-12T17:29:15.163069597Z:
2024-01-12 17:29:15.162 - DEBUG - spec_utils - Maximum peak amplitude above clipping threshold, normalizing from 1.0 to max peak 0.9.
2024-01-12T17:29:15.173140206Z:
2024-01-12 17:29:15.172 - DEBUG - separator - Starting demixing process with is_match_mix: False...
2024-01-12T17:29:15.173150636Z:
2024-01-12 17:29:15.172 - DEBUG - separator - Initializing model settings...
2024-01-12T17:29:15.181574398Z:
2024-01-12 17:29:15.181 - DEBUG - separator - Model input params: n_fft=7680 hop_length=1024 dim_f=3072
2024-01-12T17:29:15.181708220Z:
2024-01-12 17:29:15.181 - DEBUG - separator - Model settings: n_bins=3841, trim=3840, chunk_size=261120, gen_size=253440
2024-01-12T17:29:15.181720050Z:
2024-01-12 17:29:15.181 - DEBUG - separator - Original mix stored. Shape: (2, 21051392)
2024-01-12 17:29:15.181 - DEBUG - separator - Standard chunk size: 261120, Overlap: 0.25
2024-01-12T17:29:15.181728301Z:
2024-01-12 17:29:15.181 - DEBUG - separator - Generated size calculated: 253440
2024-01-12T17:29:15.219492434Z:
2024-01-12 17:29:15.219 - DEBUG - separator - Mixture prepared with padding. Mixture shape: (2, 21296640)
2024-01-12 17:29:15.219 - DEBUG - separator - Step size for processing chunks: 195840 as overlap is set to 0.25.
2024-01-12T17:29:15.219608516Z:
2024-01-12 17:29:15.219 - DEBUG - separator - Total chunks to process: 109
2024-01-12T17:29:15.219617826Z:
2024-01-12 17:29:15.219 - DEBUG - separator - Processing chunk 1/109: Start 0, End 261120
2024-01-12T17:29:15.223947964Z:
2024-01-12 17:29:15.223 - DEBUG - separator - Window applied to the chunk.
2024-01-12T17:29:15.294255549Z:
/usr/local/lib/python3.10/dist-packages/audio_separator/separator/separator.py:630: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:261.)
mix_part = torch.tensor([mix_part_], dtype=torch.float32).to(self.device)
2024-01-12T17:29:15.303884121Z:
2024-01-12 17:29:15.303 - DEBUG - separator - Mix part split into batches. Number of batches: 1
2024-01-12T17:29:15.303990103Z:
2024-01-12 17:29:15.303 - DEBUG - separator - Processing mix_wave batch 1/1
2024-01-12T17:29:15.415926152Z:
/usr/local/lib/python3.10/dist-packages/torch/functional.py:650: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:863.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]

@YBMENACE
Copy link
Author

Also when increasing segment_size WARNING - separator - Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.

@beveradb
Copy link
Collaborator

Gotcha; that all looks right to me - you're using ONNX Runtime 1.17.0 so I assume you compiled it from source to get CUDA 12 support?

How long is it actually taking? In my experience, on a machine with CUDA GPU, a 4 minute track takes about 20-30 seconds to process.

If you check the full logs without debug loglevel there should be a couple of messages e.g. Load model duration and Separation duration

@YBMENACE
Copy link
Author

It takes duration: 60.22055721282959s for 8 mins track . I have a quite large gpu the model is only using 3 - 4gb of my gpu I want to increase speed I have no issues with high gpu ussage

@beveradb
Copy link
Collaborator

Also when increasing segment_size WARNING - separator - Model converted from onnx to pytorch due to segment size not matching dim_t, processing may be slower.

Yeah that's expected if you change segment size, you can see where that comes from here:

https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L317

Why are you changing the segment size? That will certainly make things slower in my experience.

Out of curiosity, have you benchmarked / compared against running the same separation using UVR GUI?
Most of my separation code is either identical to that project or very closely aligned, there should be minimal differences.

@beveradb
Copy link
Collaborator

It takes duration: 60.22055721282959s for 8 mins track . I have a quite large gpu the model is only using 3 - 4gb of my gpu I want to increase speed I have no issues with high gpu ussage

Gotcha, that time is pretty normal.

If you want to make it faster, you'll need to dig into the code and work out some way to optimize it!

The work which @nnyj did in this fork may help! https://github.com/nnyj/python-audio-separator-live#benchmark-results

See #3

PRs very welcome if you're able to improve performance :)

@beveradb beveradb added enhancement New feature or request help wanted Extra attention is needed labels Jan 12, 2024
@YBMENACE
Copy link
Author

I was thinking about using ffmpeg to segment the file 1mins segment each (variable) then process and combine final stage ? What do you think of this approach ?

@beveradb
Copy link
Collaborator

It's certainly worth trying!
Is the theory that you could process those segments in parallel?

The codebase already has PyDub as a dependency:
https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py#L16

Which is a wrapper around ffmpeg and has an easy API for slicing audio into chunks:
https://github.com/jiaaro/pydub#quickstart

So you've got a bit of a headstart; though I'm not sure off the top of my head what the right approach to parallelizing would be.

Good luck, and feel free to email me if you want to schedule a pair programming call or any other knowledge transfer :)

@YBMENACE
Copy link
Author

Hi Bveradb,

Tried to do parallel ! Segmented the file to 5 sections but your app dooesnt seem to handle multi processing especially with one loaaded model ! It works with batching but not parallel.

@beveradb
Copy link
Collaborator

You'd need to modify the code... that's why this is open source, you can just fork it and work on it

@YBMENACE
Copy link
Author

To be honnest am a newbie if you would just let me know which file to dig in to fix this and i'll try to !
Thank for your responsivness Really appreciate it !

@beveradb
Copy link
Collaborator

All of the separation logic is in the main Separator class:

https://github.com/karaokenerds/python-audio-separator/blob/main/audio_separator/separator/separator.py

Good luck! If you haven't written much Python code before you'll probably need to do quite a bit of learning in order to get to the point where you can contribute, but there's a lot of tutorials online :)

If you want to organize a pair programming call at some point, feel free to email me with a suitable date/time and I'm happy to try and help!

@YBMENACE
Copy link
Author

I was able to get it to work ! Using threading local ! Updating def separate(self, audio_file_path) ! This way variable are stored locally for each thread to avoid racing issue while running in parallel !

@YBMENACE
Copy link
Author

@YBMENACE
Copy link
Author

This cut processing time / 3 for now I'll try to optimize and make it faster ! Btw for denoising either using true or false can't seem to have any impact is that only for me or for everyone ? Or I missed out something in call here is my code separator = Separator(
denoise_enabled=True,
model_file_dir="./mdx",
output_single_stem="vocals",
output_dir="vocals",
log_level=logging.INFO,
)

@beveradb
Copy link
Collaborator

beveradb commented Jan 13, 2024

Nice work! Good to know it works as a proof of concept!

If you could add the actual segmenting / thread management functionality to audio-separator and allow users to enable that via an initialization parameter using the class or CLI (e.g. --threads=4 or something), I'd welcome that PR 😄

@YBMENACE
Copy link
Author

Once everything optimized I'll ping you for an update ! Just wanted to thank you for quick response and help :) Good work

@beveradb
Copy link
Collaborator

beveradb commented Feb 5, 2024

FYI @yassinebelatar you may want to check out the latest version of audio-separator (0.14.4 or above)

There's now support for newer models and VR arch models, some of which are much faster on my machine (e.g. 2_HP-UVR.pth) and there are more parameters exposed to you which can let you control the speed of inferencing:

MDX Architecture Parameters:
  --mdx_segment_size MDX_SEGMENT_SIZE                    larger consumes more resources, but may give better results (default: 256). Example: --mdx_segment_size=256
  --mdx_overlap MDX_OVERLAP                              amount of overlap between prediction windows, 0.001-0.999. higher is better but slower (default: 0.25). Example: --mdx_overlap=0.25
  --mdx_batch_size MDX_BATCH_SIZE                        larger consumes more RAM but may process slightly faster (default: 1). Example: --mdx_batch_size=4
  --mdx_hop_length MDX_HOP_LENGTH                        usually called stride in neural networks, only change if you know what you're doing (default: 1024). Example: --mdx_hop_length=1024

VR Architecture Parameters:
  --vr_batch_size VR_BATCH_SIZE                          number of batches to process at a time. higher = more RAM, slightly faster processing (default: 4). Example: --vr_batch_size=16
  --vr_window_size VR_WINDOW_SIZE                        balance quality and speed. 1024 = fast but lower, 320 = slower but better quality. (default: 512). Example: --vr_window_size=320
  --vr_aggression VR_AGGRESSION                          intensity of primary stem extraction, -100 - 100. typically 5 for vocals & instrumentals (default: 5). Example: --vr_aggression=2
  --vr_enable_tta                                        enable Test-Time-Augmentation; slow but improves quality (default: False). Example: --vr_enable_tta
  --vr_high_end_process                                  mirror the missing frequency range of the output (default: False). Example: --vr_high_end_process
  --vr_enable_post_process                               identify leftover artifacts within vocal output; may improve separation for some songs (default: False). Example: --vr_enable_post_process
  --vr_post_process_threshold VR_POST_PROCESS_THRESHOLD  threshold for post_process feature: 0.1-0.3 (default: 0.2). Example: --vr_post_process_threshold=0.1

@beveradb
Copy link
Collaborator

FYI @yassinebelatar there's some sample code for splitting input audio into shorter segments, separating each, and then rejoining the separated parts afterwards in this comment:
#44 (comment)

You could potentially adapt something like that to launch multiple audio-separator processes in separate threads (or even perhaps in separate docker containers)

Also, I'd encourage you to try some of the VR arch (.pth) models, as I find they provide equally good results to some of the MDX models in about half of the compute time. For example, 2_HP-UVR.pth is my go-to for simple vocal/instrumental split.

I'm going to close this issue now as I think there are several options you can explore to make more efficient use of your resources, but feel free to reply in here if you want to share your progress or get any more support with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants