Audio AI · VAD · Model evaluation
2025-2026Audio-Language Models for Voice Activity Detection
This research evaluates how audio-language models detect speech when the audio is short, noisy, reverberant, or filtered. The project compares Qwen2-Audio-7B, Qwen2-Audio-7B with LoRA, Qwen3-Omni-30B, and Silero VAD on the same degraded test bank. The best result came from Qwen2-Audio-7B with LoRA and OPRO-Template: 93.3% balanced accuracy on 21,340 degraded clips.
