New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

More explicit and fine tuning command line arguments #7

Open

porg opened this issue Feb 4, 2024 · 2 comments

porg commented Feb 4, 2024

I do not know the exact current capabilities of the app.
Nor what features you may be willing to implement or make user accessible.
If I could sketch out the future of lyrics-transcriber then its manpage would read like this:

Synopsis

lyrics-transcriber [ options ] [ unsynchronized-lyrics-file ] <audio-file>

audio-file — Must be supplied. Needed for base operation.
unsynchronized-lyrics-file — If supplied then spelling, punctuation, line-wrappings are preserved as-is and the service of the app is to create the correlated timestamps.
- Helpful for lesser known languages, aid it further by specifying --language-input.
- Or for strict editorial control for well supported languages to get exactly the orthography, punctuation and line-wrappings you want.

Optional arguments, multiple values are comma separated

-i --language-input (one or more: IETF Language Tags)
- Aid processing by explicitly stating what language(s) and/or regional variant(s) and/or dialect(s) occur(s) in the input.
- Order of supplied arguments carries no meaning/priority.
-v --voices [ <n> || VoiceName1,VoiceName2,…,VoiceName3 ]
- Aid processing by explicitly stating the amount of voices in the input
- Either just as a number, or by naming the voices (in order of occurrence in audio)
-V --voice-isolation <n>
- How radical it tries to separate the voice track
- 1 moderately - 99 extreme
- 0 disables voice isolation entirely, use with voice only audio-file to expedite processing
-c --correlate-lyrics (one or more services: all, genius, spotify)
- Correlate initial speech-to-text results against lyrics databases to further improve the results.
-k --api-key-openai <key>
- Required for online features such as --correlate-lyrics.
- Supply as argument or environment variable API_KEY_OPENAI or in ~/.conf/lyrics-transcriber.cfg
-e --export (one or more of these: all, ass, json, lrc-midico, mp4, srt)
-o --language-output (one or more: IETF Language Tags)
- Machine translations to be included in each of the --export formats
- All-in-one file formats get all translations integrated in one file
  - JSON
  - MP4 (embedded as subtitle tracks)
- Lyrics/subtitle files get one file per each machine translation
  - e.g. file.en.ass, file.en.ass, file.fr.ass and file.fr.lrc
    according to --filename template containing a %language% token.

Filename pattern for the output file(s)

--filename "<filename template>"
- Default: %basename%.%language%.%ext%
- Literal characters are used as-is.
- Variables are wrapped within percentage symbols "%"
- %basename% — The input file's basename, that is the name without the file extension.
- Literal characters strings like a "." or "--" or " karaoke " get inserted as-is.
- %language% — The language of the machine translation version.
- %ext% — The file extension which is to be used for the respective --export file format.
- %hash% — The hash checksum of the input file.

Shorthand arguments

-a → --export=all→ exports all available file formats (=default)
-A → --export=ass
-J → --export=json
-L → --export=lrc-midico
-M → --export=mp4
-S → --export=srt

The text was updated successfully, but these errors were encountered:

Collaborator

beveradb commented Feb 5, 2024

Hey @porg , just wanted to let you know that I've read your comments, appreciate your interest in this project and agree with a lot of what you've said!

I'll give you proper, detailed responses when I have a break from other work (likely in a few days) but just wanted to say I'm very open to feedback and suggestions and willing to implement some things in this repo, but of course pull requests with code (or documentation, tests, etc) are much appreciated!

If you wanted to chat sooner I'm on the diveBar karaoke discord (https://m.youtube.com/c/diveBarKaraoke) as @beveradb and very happy to have a call and talk you through things!

Author

porg commented Feb 5, 2024

Thanks for the immediate first response before being able to respond deeper and the offer for a direct telco call/chat. I have contacted you on that other communication channel and am looking forward to a conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment