A modular audio analysis pipeline that separates, transcribes, and classifies audio content using machine learning.
- Source Separation - Isolate vocals from background using Demucs
- Speech Recognition - Transcribe speech (99+ languages) using OpenAI Whisper
- Sound Classification - Identify 521 sound categories using YAMNet
- Noise Reduction - Spectral gating for cleaner audio
├── main.py # Pipeline orchestrator
├── config.py # Configuration settings
├── requirements.txt # Dependencies
├── src/
│ ├── separator.py # Audio source separation
│ ├── speech_analyser.py
│ ├── non_speech_analyser.py
│ └── denoise.py # Optional noise reduction
├── samples/ # Input audio files
└── output/
├── separated/ # Separated audio stems
├── transcriptions/ # Text output
└── reports/ # Final analysis
git clone https://github.com/Harshita20052809/Speech_nonspeech_recognizer.git
cd Speech_nonspeech_recognizer
pip install -r requirements.txt-
Place your audio file in
samples/or updateconfig.py:INPUT_AUDIO = os.path.join(BASE_DIR, "samples", "your_file.wav")
-
Run the pipeline:
python main.py
-
Check
output/reports/final_report.txtfor results.
Input Audio (.wav)
↓
[1] Separation (Demucs) → vocals.wav, other.wav
↓
[2] Transcription (Whisper) → transcription.txt
↓
[3] Classification (YAMNet) → nonspeech_report.txt
↓
[4] Report Generation → final_report.txt
| Component | Model | Description |
|---|---|---|
| Separation | Demucs (htdemucs) | Hybrid transformer for source separation |
| Transcription | Whisper (large) | Multilingual speech recognition |
| Classification | YAMNet | Audio event classification (521 classes) |
Edit config.py to customize:
INPUT_AUDIO = "samples/1.wav" # Input file
WHISPER_MODEL = "large" # tiny, base, small, medium, large
TOP_N_SOUNDS = 10 # Number of sounds to report
NOISE_REDUCTION_STRENGTH = 0.9 # 0.0 to 1.0- Python 3.8+
- ~3GB disk space for models (downloaded on first run)
- CUDA optional for GPU acceleration
MIT License - see LICENSE
Author: Rohit
Built with:
- Demucs (Meta AI)
- OpenAI Whisper
- YAMNet (Google)