Skip to content

Commit

Permalink
Renamed prompts, added promptfoo config for testing and iterating on …
Browse files Browse the repository at this point in the history
…LLM prompts, etc.
  • Loading branch information
beveradb committed Nov 21, 2023
1 parent 1a51aba commit d9e3fee
Show file tree
Hide file tree
Showing 7 changed files with 165 additions and 59 deletions.
10 changes: 10 additions & 0 deletions lyrics_transcriber/llm_prompts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
To get started, set your OPENAI_API_KEY environment variable.

Next, edit promptfooconfig.yaml.

Then run:
```
promptfoo eval
```

Afterwards, you can view the results by running `promptfoo view`
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Your task is to take two lyrics data inputs with two different qualities, and us

Your response needs to be in JSON format and will be sent to an API endpoint. Only output the JSON, nothing else, as the response will be converted to a Python dictionary.

You will be provided with one or more reference data, containing published lyrics for a song, as plain text, from different online sources.
You will be provided with reference lyrics for the song, as plain text, from an online source.
These should be reasonably accurate, with generally correct words and phrases.
However, they may not be perfect, and sometimes whole sections (such as a chorus or outro) may be missing or assumed to be repeated.

Expand Down Expand Up @@ -37,3 +37,19 @@ The response JSON object needs to contain all of the following fields:
- end: The end timestamp for this word, estimated if not known for sure.
- confidence: Your self-assessed confidence score (from 0 to 1) of how likely it is that this word is accurate. If the word has not changed from the data input, keep the existing confidence value.

Reference lyrics:

{{reference_lyrics}}

Previous two corrected lines:

{{previous_two_corrected_lines}}

Upcoming two uncorrected lines:

{{upcoming_two_uncorrected_lines}}

Data input:

{{segment_input}}

Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
You are a song lyric corrector for a karaoke video studio, specializing in correcting lyrics for synchronization with music videos. Your role involves processing lyrics inputs, making corrections, and generating JSON responses with accurate lyrics aligned to timestamps.

Task:
- Receive lyrics data inputs of varying quality.
- Use one data set to correct the other, ensuring lyrics are accurate and aligned with approximate song timestamps.
- Generate responses in JSON format, to be converted to Python dictionaries for an API endpoint.

Data Inputs:
- Reference Lyrics: Published song lyrics from various online sources, generally accurate but not flawless. Be aware of potentially missing or incorrect sections (e.g., choruses, outros).
- Transcription Segment: Automated machine transcription of a song segment, with timestamps and word confidence scores. Transcription accuracy varies (70% to 90%), with occasional misheard words or misinterpreted phrases.

Additional Context:
- When available, you'll receive the previous 2 corrected lines and the next 1 uncorrected segment for context.

Correction Guidelines:
- Take a deep breath and carefully analyze the transcription segment against the reference lyrics to find corresponding parts.
- Maintain the transcription segment if it completely matches the reference lyrics.
- Correct misheard or similar-sounding words.
- Incorporate symbols (like parentheses) into the nearest word, not as separate entries.
- Removing a word or two for accuracy is permissible.

Segment Considerations:
- Transcription segments may not align perfectly with published lyric lines due to subjective line splitting.
- Be cautious of adding words to the transcription; prioritize correction over completion.
- Avoid duplicating words already present in the "Next (un-corrected) transcript segment".

JSON Response Structure:
- id: Segment ID from input data.
- text: Corrected lyrics for the segment.
- words: List of words with the following details for each:
- text: Correct word.
- start: Estimated start timestamp.
- end: Estimated end timestamp.
- confidence: Confidence score (0-1) on word accuracy. Retain existing score if unchanged.

Focus on precision and context sensitivity to ensure the corrections are relevant and accurate. Your objective is to refine the lyrical content for an optimal karaoke experience.
39 changes: 39 additions & 0 deletions lyrics_transcriber/llm_prompts/promptfooconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# This configuration runs each prompt through a series of example inputs and checks if they meet requirements.
# Learn more: https://promptfoo.dev/docs/configuration/guide

prompts:
- file://llm_prompt_lyrics_correction_*.txt
providers: [openai:gpt-3.5-turbo-0613, openai:gpt-4-1106-preview]
tests:
- description: First test case - automatic review
vars:
var1: first variable's value
var2: another value
var3: some other value
# For more information on assertions, see https://promptfoo.dev/docs/configuration/expected-outputs
assert:
- type: equals
value: expected LLM output goes here
- type: contains
value: some text
- type: javascript
value: 1 / (output.length + 1) # prefer shorter outputs

- description: Second test case - manual review
# Test cases don't need assertions if you prefer to manually review the output
vars:
var1: new value
var2: another value
var3: third value

- description: Third test case - other types of automatic review
vars:
var1: yet another value
var2: and another
var3: dear llm, please output your response in json format
assert:
- type: contains-json
- type: similar
value: ensures that output is semantically similar to this text
- type: model-graded-closedqa
value: ensure that output contains a reference to X
119 changes: 62 additions & 57 deletions lyrics_transcriber/transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ def __init__(
log_formatter=None,
transcription_model="medium",
llm_model="gpt-4-1106-preview",
llm_prompt_matching="lyrics_transcriber/llm_prompts/llm_prompt_lyrics_matching_andrew_handwritten_20231118.txt",
llm_prompt_correction="lyrics_transcriber/llm_prompts/llm_prompt_lyrics_correction_andrew_handwritten_20231118.txt",
render_video=False,
video_resolution="360p",
video_background_image=None,
Expand Down Expand Up @@ -62,24 +64,29 @@ def __init__(

self.transcription_model = transcription_model
self.llm_model = llm_model
self.llm_prompt_matching = llm_prompt_matching
self.llm_prompt_correction = llm_prompt_correction
self.openai_client = OpenAI()
self.openai_client.log = self.log_level

self.render_video = render_video
self.video_resolution = video_resolution
self.video_background_image = video_background_image
self.video_background_color = video_background_color
self.font_size = 100

match video_resolution:
case "4k":
self.video_resolution_num = ("3840", "2160")
self.font_size = 250
case "1080p":
self.video_resolution_num = ("1920", "1080")
self.font_size = 140
case "720p":
self.video_resolution_num = ("1280", "720")
self.font_size = 100
case "360p":
self.video_resolution_num = ("640", "360")
self.font_size = 50
case _:
raise ValueError("Invalid video_resolution value. Must be one of: 4k, 1080p, 720p, 360p")

Expand Down Expand Up @@ -170,7 +177,7 @@ def copy_files_to_output_dir(self):
def validate_lyrics_match_song(self):
at_least_one_online_lyrics_validated = False

with open("lyrics_transcriber/llm_prompts/llm_lyrics_matching_prompt.txt", "r") as file:
with open(self.llm_prompt_matching, "r") as file:
llm_matching_instructions = file.read()

for online_lyrics_source in ["genius", "spotify"]:
Expand All @@ -183,7 +190,7 @@ def validate_lyrics_match_song(self):
f'Data input 1:\n{self.outputs["transcribed_lyrics_text"]}\nData input 2:\n{self.outputs[online_lyrics_text_key]}\n'
)

# self.logger.debug(f"llm_instructions:\n{llm_instructions}\ndata_input_str:\n{data_input_str}")
# self.logger.debug(f"system_prompt:\n{system_prompt}\ndata_input_str:\n{data_input_str}")

self.logger.debug(f"making API call to LLM model {self.llm_model} to validate {online_lyrics_source} lyrics match")
response = self.openai_client.chat.completions.create(
Expand Down Expand Up @@ -245,93 +252,91 @@ def write_corrected_lyrics_data_file(self):

corrected_lyrics_dict = {"segments": []}

with open("lyrics_transcriber/llm_prompts/llm_lyrics_correction_prompt.txt", "r") as file:
llm_instructions = file.read()
with open(self.llm_prompt_correction, "r") as file:
system_prompt_template = file.read()

reference_data_count = 1

if self.outputs["genius_lyrics_text"]:
llm_instructions += f'\nReference data {reference_data_count}:\n{self.outputs["genius_lyrics_text"]}\n'
reference_data_count += 1

if self.outputs["spotify_lyrics_text"]:
llm_instructions += f'\nReference data {reference_data_count}:\n{self.outputs["spotify_lyrics_text"]}\n'
reference_data_count += 1

# TODO: Add more to the LLM instructions (or consider post-processing cleanup) to get rid of overlapping segments
# when there are background vocals or other overlapping lyrics
reference_lyrics = self.outputs["genius_lyrics_text"] or self.outputs["spotify_lyrics_text"]
system_prompt = system_prompt_template.replace("{{reference_lyrics}}", reference_lyrics)

# TODO: Test if results are cleaner when using the vocal file from a background vocal audio separation model

# TODO: Record more info about the correction process (e.g before/after diffs for each segment) to a file for debugging
# TODO: Possibly add a step after segment-based correct to get the LLM to self-analyse the diff

self.outputs["llm_transcript"] = ""
self.outputs["llm_transcript_filepath"] = os.path.join(
self.cache_dir, "lyrics-" + self.get_song_slug() + "-llm-correction-transcript.txt"
)
self.outputs["llm_transcript"] = ""

total_segments = len(self.outputs["transcription_data_dict"]["segments"])
self.logger.info(f"Beginning correction using LLM, total segments: {total_segments}")

with open(self.outputs["llm_transcript_filepath"], "a", buffering=1) as llm_transcript_file:
self.logger.debug(f"writing LLM chat instructions: {self.outputs['llm_transcript_filepath']}")
llm_instructions_header = f"--- SYSTEM instructions passed in for all segments ---:\n\n"
self.outputs["llm_transcript"] += llm_instructions_header + llm_instructions + "\n"
llm_transcript_file.write(llm_instructions_header + llm_instructions + "\n")

llm_transcript_header = f"--- SYSTEM instructions passed in for all segments ---:\n\n{system_prompt}\n"
self.outputs["llm_transcript"] += llm_transcript_header
llm_transcript_file.write(llm_transcript_header)

for segment in self.outputs["transcription_data_dict"]["segments"]:
# Don't waste dollars on GPT when testing, Andrew ;)
# # Don't waste OpenAI dollars when testing!
# if segment["id"] > 10:
# break

simplified_segment = {
"id": segment["id"],
"start": segment["start"],
"end": segment["end"],
"confidence": segment["confidence"],
"text": segment["text"],
"words": segment["words"],
}

simplified_segment_str = json.dumps(simplified_segment)
# continue
# if segment["id"] < 20 or segment["id"] > 24:
# continue

llm_transcript_segment = ""
segment_input = json.dumps(
{
"id": segment["id"],
"start": segment["start"],
"end": segment["end"],
"confidence": segment["confidence"],
"text": segment["text"],
"words": segment["words"],
}
)

extra_context_prompt = ""
previous_two_corrected_lines = ""
upcoming_two_uncorrected_lines = ""

if segment["id"] > 2:
extra_context_prompt = "Context: Previous two corrected lines:\n\n"

for previous_segment in corrected_lyrics_dict["segments"]:
if previous_segment["id"] == (segment["id"] - 2):
extra_context_prompt += previous_segment["text"].strip() + "\n"
break

for previous_segment in corrected_lyrics_dict["segments"]:
if previous_segment["id"] == (segment["id"] - 1):
extra_context_prompt += previous_segment["text"].strip() + "\n"
break
if previous_segment["id"] in (segment["id"] - 2, segment["id"] - 1):
previous_two_corrected_lines += previous_segment["text"].strip() + "\n"

for next_segment in self.outputs["transcription_data_dict"]["segments"]:
if next_segment["id"] == (segment["id"] + 1):
extra_context_prompt += "Context: Next (un-corrected) transcript segment:\n\n"
extra_context_prompt += next_segment["text"].strip() + "\n"
break

data_input_str = f"{extra_context_prompt}\nData input:\n\n{simplified_segment_str}\n"
if next_segment["id"] in (segment["id"] + 1, segment["id"] + 2):
upcoming_two_uncorrected_lines += next_segment["text"].strip() + "\n"

llm_transcript_segment += f"--- Segment {segment['id']} / {total_segments} ---\n"
llm_transcript_segment += f"Previous two corrected lines:\n\n{previous_two_corrected_lines}\nUpcoming two uncorrected lines:\n\n{upcoming_two_uncorrected_lines}\nData input:\n\n{segment_input}\n"

# fmt: off
segment_prompt = system_prompt_template.replace(
"{{previous_two_corrected_lines}}", previous_two_corrected_lines
).replace(
"{{upcoming_two_uncorrected_lines}}", upcoming_two_uncorrected_lines
).replace(
"{{segment_input}}", segment_input
)

self.logger.info(
f'Calling completion model {self.llm_model} with instructions and data input for segment {segment["id"]} / {total_segments}:'
)
# self.logger.debug(data_input_str)

llm_transcript_segment = f"--- INPUT for segment {segment['id']} / {total_segments} ---:\n\n"
llm_transcript_segment += data_input_str

response = self.openai_client.chat.completions.create(
model=self.llm_model,
response_format={"type": "json_object"},
messages=[{"role": "system", "content": llm_instructions}, {"role": "user", "content": data_input_str}],
seed=10,
temperature=0.4,
messages=[
{
"role": "user",
"content": segment_prompt
}
],
)
# fmt: on

message = response.choices[0].message.content
finish_reason = response.choices[0].finish_reason
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "lyrics-transcriber"
version = "0.12.6"
version = "0.12.7"
description = "Automatically create synchronised lyrics files in ASS and MidiCo LRC formats with word-level timestamps, using Whisper and lyrics from Genius and Spotify"
authors = ["Andrew Beveridge <[email protected]>"]
license = "MIT"
Expand Down

0 comments on commit d9e3fee

Please sign in to comment.