Skip to content

ldy5413/interview-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Real-time Speech to Text Application

This is an interview assistant with real-time speech-to-text feature that uses OpenAI's Whisper model to transcribe audio from your microphone and provides real-time translation and AI-powered question answering.

Requirements

  • Python 3.7 or higher
  • A working microphone
  • Windows/Linux/MacOS
  • OpenAI API key (for translation and question answering)

Installation

  1. Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install the required packages:
pip install -r requirements.txt
  1. Configure your OpenAI API key (optional, for translation and AI features):
    • Create a .env file in the project root
    • Add your API key: OPENAI_API_KEY=your-key-here
    • You can also use openai compatiable api here by setting OPENAI_API_BASE_URL
    • Or configure it through the UI in the API Settings

Usage

  1. Run the application:
python stt_app.py
  1. Configure the transcription:

    • Select your language (English/Chinese)
    • Choose a Whisper model
    • Enable/disable auto-translation
    • Enable/disable AI question answering
    • Select an AI role for answering questions
  2. Select Whisper Model:

    • Tiny: Fastest but least accurate
    • Base: Good balance of speed and accuracy
    • Small: Better accuracy than base, slightly slower
    • Medium: High accuracy, slower processing
    • Large: Best accuracy, but slowest processing
    • Faster variants available for each model size
  3. Choose AI Assistant Role (when AI answering is enabled):

    • General Assistant: Balanced, general-purpose responses
    • Technical Interviewer: Focus on technical accuracy and depth
    • HR Interviewer: Focus on soft skills and behavioral aspects
    • Meeting Participant: Helps clarify and summarize discussions
    • Student: Educational context from a student's perspective
    • Teacher: Educational explanations with teaching context
    • Custom: Define your own AI role behavior
  4. Click "Start Streaming" to begin transcription

  5. The interface shows three panels:

    • Original transcription (streaming word by word)
    • Real-time translation (if enabled)
    • AI answers to questions (if enabled)
  6. Click "Stop Streaming" to end the session

Features

  • Real-time word-by-word transcription display
  • Automatic sentence detection and formatting
  • Support for both English and Chinese languages
  • Real-time translation between English and Chinese
  • AI-powered question detection and answering
  • Customizable AI assistant roles
  • Multiple Whisper model options for different accuracy/speed trade-offs
  • Real-time language switching during transcription
  • Model loading status indicator
  • Export and load conversation history
  • Conversation context awareness for AI answers

Notes

  • The application uses Whisper models of varying sizes:
    • Tiny: ~39M parameters
    • Base: ~74M parameters
    • Small: ~244M parameters
    • Medium: ~769M parameters
    • Large: ~1.5B parameters
  • The first time you run the application with a particular model, it will download that model (this may take a few minutes)
  • Larger models provide better accuracy but require more processing power and memory
  • Audio is processed in real-time with word-by-word display
  • AI answers consider recent conversation context for more relevant responses
  • Translation and AI features require a valid OpenAI API key
  • Language can be switched in real-time without stopping the stream
  • Model selection is only available before starting the stream

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages