Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More explicit and fine tuning command line arguments #7

Open
porg opened this issue Feb 4, 2024 · 2 comments
Open

More explicit and fine tuning command line arguments #7

porg opened this issue Feb 4, 2024 · 2 comments

Comments

@porg
Copy link

porg commented Feb 4, 2024

  • I do not know the exact current capabilities of the app.
  • Nor what features you may be willing to implement or make user accessible.
  • If I could sketch out the future of lyrics-transcriber then its manpage would read like this:

Synopsis

lyrics-transcriber [ options ] [ unsynchronized-lyrics-file ] <audio-file>

  • audio-file — Must be supplied. Needed for base operation.
  • unsynchronized-lyrics-file — If supplied then spelling, punctuation, line-wrappings are preserved as-is and the service of the app is to create the correlated timestamps.
    • Helpful for lesser known languages, aid it further by specifying --language-input.
    • Or for strict editorial control for well supported languages to get exactly the orthography, punctuation and line-wrappings you want.

Optional arguments, multiple values are comma separated

  • -i --language-input (one or more: IETF Language Tags)
    • Aid processing by explicitly stating what language(s) and/or regional variant(s) and/or dialect(s) occur(s) in the input.
    • Order of supplied arguments carries no meaning/priority.
  • -v --voices [ <n> || VoiceName1,VoiceName2,…,VoiceName3 ]
    • Aid processing by explicitly stating the amount of voices in the input
    • Either just as a number, or by naming the voices (in order of occurrence in audio)
  • -V --voice-isolation <n>
    • How radical it tries to separate the voice track
    • 1 moderately - 99 extreme
    • 0 disables voice isolation entirely, use with voice only audio-file to expedite processing
  • -c --correlate-lyrics (one or more services: all, genius, spotify)
    • Correlate initial speech-to-text results against lyrics databases to further improve the results.
  • -k --api-key-openai <key>
    • Required for online features such as --correlate-lyrics.
    • Supply as argument or environment variable API_KEY_OPENAI or in ~/.conf/lyrics-transcriber.cfg
  • -e --export (one or more of these: all, ass, json, lrc-midico, mp4, srt)
  • -o --language-output (one or more: IETF Language Tags)
    • Machine translations to be included in each of the --export formats
    • All-in-one file formats get all translations integrated in one file
      • JSON
      • MP4 (embedded as subtitle tracks)
    • Lyrics/subtitle files get one file per each machine translation
      • e.g. file.en.ass, file.en.ass, file.fr.ass and file.fr.lrc
        according to --filename template containing a %language% token.

Filename pattern for the output file(s)

  • --filename "<filename template>"
    • Default: %basename%.%language%.%ext%
    • Literal characters are used as-is.
    • Variables are wrapped within percentage symbols "%"
    • %basename% — The input file's basename, that is the name without the file extension.
    • Literal characters strings like a "." or "--" or " karaoke " get inserted as-is.
    • %language% — The language of the machine translation version.
    • %ext% — The file extension which is to be used for the respective --export file format.
    • %hash% — The hash checksum of the input file.

Shorthand arguments

  • -a → --export=all→ exports all available file formats (=default)
  • -A → --export=ass
  • -J → --export=json
  • -L → --export=lrc-midico
  • -M → --export=mp4
  • -S → --export=srt
@beveradb
Copy link
Collaborator

beveradb commented Feb 5, 2024

Hey @porg , just wanted to let you know that I've read your comments, appreciate your interest in this project and agree with a lot of what you've said!

I'll give you proper, detailed responses when I have a break from other work (likely in a few days) but just wanted to say I'm very open to feedback and suggestions and willing to implement some things in this repo, but of course pull requests with code (or documentation, tests, etc) are much appreciated!

If you wanted to chat sooner I'm on the diveBar karaoke discord (https://m.youtube.com/c/diveBarKaraoke) as @beveradb and very happy to have a call and talk you through things!

@porg
Copy link
Author

porg commented Feb 5, 2024

Thanks for the immediate first response before being able to respond deeper and the offer for a direct telco call/chat. I have contacted you on that other communication channel and am looking forward to a conversation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants