Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scrolling lyrics for Karaoke videos #8

Closed
arsaboo opened this issue Aug 18, 2023 · 3 comments
Closed

Add scrolling lyrics for Karaoke videos #8

arsaboo opened this issue Aug 18, 2023 · 3 comments

Comments

@arsaboo
Copy link

arsaboo commented Aug 18, 2023

I'm looking for a way to create karaoke videos with scrolling lyrics. This tool works well to remove the vocals, but it would be nice to have some way to transcribe the audio and create a video with scrolling lyrics. Any examples that use other tools like Whisper would be appreciated.

@beveradb
Copy link
Collaborator

beveradb commented Aug 20, 2023

Hey @arsaboo - thanks for reaching out! Always nice to meet a fellow home automation and karaoke nerd ;)

I'm actually working on the same thing over in lyrics-transcriber and karaoke-generator 😄

The idea is to make a free and open source CLI tool which can take any local audio file (or a YouTube URL), separate the audio, transcribe the lyrics with word-level timestamps, and generate a karaoke video which is good enough for most people. It's still a work in progress though, the biggest unsolved challenge is the lyrics transcription still unfortunately, that part needs a ton more work.

That project is just a wrapper around a couple of other projects:

  • yt-dlp to fetch videos from YouTube for convenience
  • audio-separator to separate the audio for an instrumental backing track
  • lyrics-transcriber which runs whisper-timestamped and attempts to sync up the lyrics correctly with known lyrics fetched from genius or spotify. Note: this part is very much still a WIP, the transcription and lyrics fetch are both complete but the actual hard part (matching these up and attempting to correct the transcribed lyrics based on the real lyrics) is hard and I haven't started working on it yet.

Once I have the lyrics transcriber working and generating reasonably accurate lyrics files in LRC and ASS formats, the generation of the actual video output will likely be a third library, e.g. python-scrolling-lyrics-renderer or something, with the intention of that part taking a lyrics file, audio file, optional background image/video, and outputting various formats, e.g. a video in MP4 but also a CDG+MP3 for traditional karaoke systems.
Then, the top-level karaoke-generator tool becomes a simple CLI wrapper tying it all together, leaving the individual projects usable for other people with use cases.

If you'd be interested in collaborating on the bits which aren't yet complete, that's always welcome! I'd even happily hop on a call and talk you through how things work or get you set up etc. Feel free to message me directly if you're keen.

As for examples of other tools, there are two things I know of which are probably of interest to you:

  • The Tuul (the-tuul.com) is an open-source web based tool for making karaoke videos, and the author is lovely :) It does audio separation (though, currently using the outdated spleeter model) and video generation using ffmpeg, but the lyrics syncing is expected to be handled by the user through the web UI.

  • Youka (youka.io) does everything fully automatically, and is worth installing and trying so you can see what the limitations are - it does a pretty impressive job all things considered! It's the main evidence I have which shows this is even viable at all. Sadly it's no longer open-source, the author has a pretty reasonable pricing model for a monthly subscription now to cover his server costs and dev time etc. However, you can still take some inspiration from a 3 year old version of the code from before he pulled it, here: https://github.com/beveradb/youka-desktop/tree/master/src/lib

@arsaboo
Copy link
Author

arsaboo commented Aug 21, 2023

@beveradb This is phenomenal and exactly what I am after. I will check your repositories out. Hopefully, I can contribute and plug some of the holes. We can get on a call once I have had a chance to explore.

@beveradb
Copy link
Collaborator

beveradb commented Aug 21, 2023

Awesome, glad to hear :)

Hope you don't mind @arsaboo but as this isn't exactly audio-separation related, I'm going to close this issue and move discussion / progress over to nomadkaraoke/python-lyrics-transcriber#1

I've also written up a bunch more detail in that issue, e.g. a rough outline of my proposed approach to improving the quality of synced lyrics, which in my opinion is the main unsolved challenge with fully automatic karaoke generation!

(the scrolling lyrics rendering is actually already pretty much solved using ASS and ffmpeg, as seen in the_tuul by @incidentist, so will be easy to integrate into https://github.com/karaokenerds/karaoke-generator once the lyrics transcription is good enough)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants