-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
64,435 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Late Night Host App powered by Deepgram | ||
|
||
|
||
A test application used to download some YouTube videos from our favorite Late Night talk show hosts and transcribe them using Deepgram. | ||
|
||
|
||
## References | ||
* [Deepgram How to Video](https://www.youtube.com/watch?v=tt6fkobEtWk) | ||
* https://github.com/deepgram-devs/ExplainTuber-Analysis | ||
|
||
|
||
### Project Imports | ||
|
||
- I recommend using [yt-dlp](https://github.com/yt-dlp/yt-dlp) instead of [youtube-dl](https://github.com/ytdl-org/youtube-dl) due to this issue: | ||
- https://github.com/ytdl-org/youtube-dl/issues/31530 |
Binary file added
BIN
+9.19 MB
late-night-host-app/audio/Conan Remembers David Bowie | CONAN on TBS.mp3.mp3
Binary file not shown.
Binary file added
BIN
+8.74 MB
... A Lot of Trust To Be A Human Being These Days - P!NK on Her New Album, Trustfall.mp3.mp3
Binary file not shown.
Binary file added
BIN
+12.2 MB
...dio/James Acaster Became a Stand-Up Comedian to Infiltrate a Gang of Drug Dealers.mp3.mp3
Binary file not shown.
Binary file added
BIN
+20.9 MB
...n Doing Incredibly Dangerous Stunts, Mission Impossible & Top Gun with Val Kilmer.mp3.mp3
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
from deepgram import Deepgram as DG | ||
import asyncio, json, os | ||
|
||
# is there a way in python to get this into an environment variable? | ||
dg_key = "d222aa38a2749d3ba17340f08263acee4be1d4b8" | ||
dg = DG(dg_key) | ||
|
||
options = { | ||
"diarize": True, | ||
"punctuate": True, | ||
"paragraphs": True, | ||
"numerals": True, | ||
"model": "general", | ||
"tier": "enhanced", | ||
"profanity_filter": True, | ||
} | ||
|
||
|
||
async def main(): | ||
videos = os.listdir("./audio") | ||
for video in videos: | ||
print("Currently processing:", video) | ||
with open(f"./audio/{video}", "rb") as audio: | ||
source = {"buffer": audio, "mimetype": "audio/mpeg3"} | ||
res = await dg.transcription.prerecorded(source, options) | ||
|
||
# outputs to a directory called transcripts | ||
with open(f"./transcripts/{video[:-4]}.json", "w") as transcript: | ||
json.dump(res, transcript) | ||
return | ||
|
||
|
||
asyncio.run(main()) |
9,420 changes: 9,420 additions & 0 deletions
9,420
late-night-host-app/transcripts/Conan Remembers David Bowie | CONAN on TBS.mp3.json
Large diffs are not rendered by default.
Oops, something went wrong.
10,392 changes: 10,392 additions & 0 deletions
10,392
...A Lot of Trust To Be A Human Being These Days - P!NK on Her New Album, Trustfall.mp3.json
Large diffs are not rendered by default.
Oops, something went wrong.
17,210 changes: 17,210 additions & 0 deletions
17,210
...ts/James Acaster Became a Stand-Up Comedian to Infiltrate a Gang of Drug Dealers.mp3.json
Large diffs are not rendered by default.
Oops, something went wrong.
27,328 changes: 27,328 additions & 0 deletions
27,328
... Doing Incredibly Dangerous Stunts, Mission Impossible & Top Gun with Val Kilmer.mp3.json
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import yt_dlp | ||
|
||
obrien_vids = [ | ||
"https://youtu.be/P4d8QrRJvsE", | ||
] | ||
|
||
kimmel_vids = [ | ||
"https://youtu.be/KA3KDbU5dqk", | ||
] | ||
|
||
meyers_vids = [ | ||
"https://youtu.be/q5UsZ2YdyKE", | ||
] | ||
|
||
colbert_vids = [ | ||
"https://youtu.be/PO508nFSIaM", | ||
] | ||
|
||
|
||
ydl_parameters = { | ||
"format": "bestaudio/best", | ||
"postprocessors": [ | ||
{ | ||
"key": "FFmpegExtractAudio", | ||
"preferredcodec": "mp3", | ||
"preferredquality": "192", | ||
} | ||
], | ||
# change this to change the path you want your downloads to be located | ||
"outtmpl": "./audio/%(title)s.mp3", | ||
} | ||
with yt_dlp.YoutubeDL(ydl_parameters) as ydl: | ||
ydl.download(obrien_vids) | ||
ydl.download(kimmel_vids) | ||
ydl.download(meyers_vids) | ||
ydl.download(colbert_vids) | ||
print() |