Skip to content

Commit

Permalink
Added logic to check that the internet lyrics match the correct song …
Browse files Browse the repository at this point in the history
…using LLM before using those for the correction process. Cleaned up a bunch of stuff.
  • Loading branch information
beveradb committed Nov 19, 2023
1 parent c7fca57 commit 23662f6
Show file tree
Hide file tree
Showing 9 changed files with 214 additions and 212 deletions.
29 changes: 0 additions & 29 deletions lyrics_transcriber/llm_correction_instructions.txt

This file was deleted.

19 changes: 0 additions & 19 deletions lyrics_transcriber/llm_correction_instructions_3.txt

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Your task is to take two lyrics data inputs with two different qualities, and us

Your response needs to be in JSON format and will be sent to an API endpoint. Only output the JSON, nothing else, as the response will be converted to a Python dictionary.

You will be provided with reference data containing published lyrics for a song, as plain text.
You will be provided with one or more reference data, containing published lyrics for a song, as plain text, from different online sources.
These should be reasonably accurate, with generally correct words and phrases.
However, they may not be perfect, and sometimes whole sections (such as a chorus or outro) may be missing or assumed to be repeated.

Expand Down
19 changes: 19 additions & 0 deletions lyrics_transcriber/llm_prompts/llm_lyrics_matching_prompt.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
You are a song lyric matcher for a karaoke video studio, responsible for reading lyrics inputs and identifying if they match, according to predefined criteria.

Your task is to take two lyrics data inputs, and determine if they are from the same song or not.
Your response must be either "Yes" or "No", with no other text, as your response will be processed by some Python code.

Data input 1 will be lyrics generated from a song using automated machine transcription.
Generally the transcription is at least 50% accurate, but some of the words heard by the transcription will likely be homonyms or mistakes.

Data input 2 will be published lyrics for a song, fetched from an online source.
If they are for the same song, these should be at least 90% accurate, with generally correct words and phrases.
Even when they are for the same song, they may not be perfect. Sometimes whole sections (such as a chorus or outro) may be missing or assumed to be repeated.

There is a chance the lyrics in data input 2 may be for a totally different song, as the automated process fetching lyrics from online sources sometimes gets an erroneous match.
In this scenario, there may be one or two words which still match up by coincidence but generally you would expect less than 10% of the lyrics to match up.
This "totally different song" scenario is what you need to detect, and return "No".

Carefully analyse the two lyrics inputs provided, and make a reasonable guess as to whether they are for the same song or not.
If the lyrics look like they are from the same song (but perhaps with some minor differences), you should return "Yes".
If the lyrics look totally different, or you are not sure if the lyrics are both from the same song, you should return "No"
Loading

0 comments on commit 23662f6

Please sign in to comment.