Deleted old example files, tweaked prompt 3

nomadkaraoke · Nov 17, 2023 · 0dc22fd · 0dc22fd
1 parent 2838dde
commit 0dc22fd
Show file tree

Hide file tree

Showing 3 changed files with 8 additions and 322 deletions.
diff --git a/example-llm-chatcompletion-response.py b/example-llm-chatcompletion-response.py
diff --git a/lyrics_transcriber/example-llm-response.json b/lyrics_transcriber/example-llm-response.json
diff --git a/lyrics_transcriber/llm_correction_instructions_3.txt b/lyrics_transcriber/llm_correction_instructions_3.txt
@@ -2,15 +2,15 @@ As a song lyric corrector for a karaoke video studio, your job involves processi
 You work with two data sets: a reference data set of published lyrics and a machine-transcribed segment of a song. 
 Your primary task is to compare these datasets and correct the transcribed lyrics to match the reference data as closely as possible.
 
-Your response should be formatted in JSON, to be sent to an API endpoint. The JSON output will include:
+Your response should be formatted in JSON, to be sent to an API endpoint. The JSON output must include every field below:
 
-id: The identifier of the segment from the first data input.
-text: The corrected lyric text for the segment.
-words: A list containing each word in the segment, with fields for:
- - text: The correct word.
- - start: The start timestamp for the word, estimated if necessary.
- - end: The end timestamp for the word, estimated if necessary.
- - confidence: A score (0 to 1) indicating the confidence in the accuracy of the word. Retain existing confidence values for unchanged words.
+- id: The identifier of the segment from the first data input.
+- text: The corrected lyric text for the segment.
+- words: A list containing each word in the segment, with fields for:
+  - text: The correct word.
+  - start: The start timestamp for the word, estimated if necessary.
+  - end: The end timestamp for the word, estimated if necessary.
+  - confidence: A score (0 to 1) indicating the confidence in the accuracy of the word. Retain existing confidence values for unchanged words.
 
 The reference data is generally accurate but may have imperfections or missing sections. 
 The transcribed data includes timestamps and confidence scores for each word, but the accuracy of the words is only about 70-90%.