Real-time Speech to Text Application

This is an interview assistant with real-time speech-to-text feature that uses OpenAI's Whisper model to transcribe audio from your microphone and provides real-time translation and AI-powered question answering.

Requirements

Python 3.7 or higher
A working microphone
Windows/Linux/MacOS
OpenAI API key (for translation and question answering)

Installation

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required packages:

pip install -r requirements.txt

Configure your OpenAI API key (optional, for translation and AI features):
- Create a .env file in the project root
- Add your API key: OPENAI_API_KEY=your-key-here
- You can also use openai compatiable api here by setting OPENAI_API_BASE_URL
- Or configure it through the UI in the API Settings

Usage

Run the application:

python stt_app.py

Configure the transcription:
- Select your language (English/Chinese)
- Choose a Whisper model
- Enable/disable auto-translation
- Enable/disable AI question answering
- Select an AI role for answering questions
Select Whisper Model:
- Tiny: Fastest but least accurate
- Base: Good balance of speed and accuracy
- Small: Better accuracy than base, slightly slower
- Medium: High accuracy, slower processing
- Large: Best accuracy, but slowest processing
- Faster variants available for each model size
Choose AI Assistant Role (when AI answering is enabled):
- General Assistant: Balanced, general-purpose responses
- Technical Interviewer: Focus on technical accuracy and depth
- HR Interviewer: Focus on soft skills and behavioral aspects
- Meeting Participant: Helps clarify and summarize discussions
- Student: Educational context from a student's perspective
- Teacher: Educational explanations with teaching context
- Custom: Define your own AI role behavior
Click "Start Streaming" to begin transcription
The interface shows three panels:
- Original transcription (streaming word by word)
- Real-time translation (if enabled)
- AI answers to questions (if enabled)
Click "Stop Streaming" to end the session

Features

Real-time word-by-word transcription display
Automatic sentence detection and formatting
Support for both English and Chinese languages
Real-time translation between English and Chinese
AI-powered question detection and answering
Customizable AI assistant roles
Multiple Whisper model options for different accuracy/speed trade-offs
Real-time language switching during transcription
Model loading status indicator
Export and load conversation history
Conversation context awareness for AI answers

Notes

The application uses Whisper models of varying sizes:
- Tiny: ~39M parameters
- Base: ~74M parameters
- Small: ~244M parameters
- Medium: ~769M parameters
- Large: ~1.5B parameters
The first time you run the application with a particular model, it will download that model (this may take a few minutes)
Larger models provide better accuracy but require more processing power and memory
Audio is processed in real-time with word-by-word display
AI answers consider recent conversation context for more relevant responses
Translation and AI features require a valid OpenAI API key
Language can be switched in real-time without stopping the stream
Model selection is only available before starting the stream

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
audiostream		audiostream
db		db
translation		translation
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
stt_app.py		stt_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-time Speech to Text Application

Requirements

Installation

Usage

Features

Notes

About

Releases

Packages

Languages

ldy5413/interview-assistant

Folders and files

Latest commit

History

Repository files navigation

Real-time Speech to Text Application

Requirements

Installation

Usage

Features

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages