Skip to content

Commit

Permalink
adds late night host app
Browse files Browse the repository at this point in the history
  • Loading branch information
jpvajda committed Feb 26, 2023
1 parent e4669b4 commit 4309efd
Show file tree
Hide file tree
Showing 11 changed files with 64,435 additions and 0 deletions.
15 changes: 15 additions & 0 deletions late-night-host-app/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Late Night Host App powered by Deepgram


A test application used to download some YouTube videos from our favorite Late Night talk show hosts and transcribe them using Deepgram.


## References
* [Deepgram How to Video](https://www.youtube.com/watch?v=tt6fkobEtWk)
* https://github.com/deepgram-devs/ExplainTuber-Analysis


### Project Imports

- I recommend using [yt-dlp](https://github.com/yt-dlp/yt-dlp) instead of [youtube-dl](https://github.com/ytdl-org/youtube-dl) due to this issue:
- https://github.com/ytdl-org/youtube-dl/issues/31530
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
33 changes: 33 additions & 0 deletions late-night-host-app/deepgram_transcribe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
from deepgram import Deepgram as DG
import asyncio, json, os

# is there a way in python to get this into an environment variable?
dg_key = "d222aa38a2749d3ba17340f08263acee4be1d4b8"
dg = DG(dg_key)

options = {
"diarize": True,
"punctuate": True,
"paragraphs": True,
"numerals": True,
"model": "general",
"tier": "enhanced",
"profanity_filter": True,
}


async def main():
videos = os.listdir("./audio")
for video in videos:
print("Currently processing:", video)
with open(f"./audio/{video}", "rb") as audio:
source = {"buffer": audio, "mimetype": "audio/mpeg3"}
res = await dg.transcription.prerecorded(source, options)

# outputs to a directory called transcripts
with open(f"./transcripts/{video[:-4]}.json", "w") as transcript:
json.dump(res, transcript)
return


asyncio.run(main())

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

37 changes: 37 additions & 0 deletions late-night-host-app/youtube_download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import yt_dlp

obrien_vids = [
"https://youtu.be/P4d8QrRJvsE",
]

kimmel_vids = [
"https://youtu.be/KA3KDbU5dqk",
]

meyers_vids = [
"https://youtu.be/q5UsZ2YdyKE",
]

colbert_vids = [
"https://youtu.be/PO508nFSIaM",
]


ydl_parameters = {
"format": "bestaudio/best",
"postprocessors": [
{
"key": "FFmpegExtractAudio",
"preferredcodec": "mp3",
"preferredquality": "192",
}
],
# change this to change the path you want your downloads to be located
"outtmpl": "./audio/%(title)s.mp3",
}
with yt_dlp.YoutubeDL(ydl_parameters) as ydl:
ydl.download(obrien_vids)
ydl.download(kimmel_vids)
ydl.download(meyers_vids)
ydl.download(colbert_vids)
print()

0 comments on commit 4309efd

Please sign in to comment.