OmniVoice Studio — The best way to Use It
01 / 08
What Is OmniVoice Studio?
OmniVoice Studio is an open-source desktop software for voice cloning, video dubbing, real-time dictation, and speaker diarization. All the things runs regionally in your machine. No API keys, no cloud account, no subscription required.
- 646 languages supported for TTS through the default OmniVoice engine
- 99 languages for transcription through WhisperX
- Obtainable on macOS, Home windows, and Linux
- GPU is non-obligatory — full pipeline runs on CPU
- Free for private, academic, and analysis use (FSL-1.1-ALv2)
OmniVoice Studio — The best way to Use It
02 / 08
System Necessities
A GPU is non-obligatory. With out one, TTS runs roughly 3× slower on CPU. With ≤8 GB VRAM, TTS robotically offloads to CPU throughout transcription — no config wanted.
| Part | Minimal | Beneficial |
|---|---|---|
| OS | Win 10 / macOS 12+ / Ubuntu 20.04+ | Any fashionable 64-bit OS |
| RAM | 8 GB | 16 GB+ |
| VRAM | 4 GB (auto-offloads) | 8 GB+ (RTX 3060+) |
| Disk | 10 GB free | 20 GB+ SSD |
| Python | 3.10+ | 3.11–3.12 |
| GPU | Optionally available | CUDA / MPS / ROCm |
OmniVoice Studio — The best way to Use It
03 / 08
Set up
The undertaking recommends operating from supply. Set up three stipulations first: ffmpeg, Bun (JS runtime), and uv (Python package deal supervisor).
git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun set up
bun dev
Frontend masses at http://localhost:5173 | API runs on port 8000.
Mannequin weights obtain robotically on first era.
Pre-built installers out there: macOS DMG, Home windows MSI, Linux AppImage and .deb — see the Releases web page on GitHub.
OmniVoice Studio — The best way to Use It
04 / 08
Voice Cloning
Voice cloning makes use of zero-shot studying — it clones a voice from a clip as brief as 3 seconds, with out prior coaching on that voice. The default OmniVoice engine situations a diffusion-based TTS mannequin on the reference audio.
- Go to the Voice Clone tab within the UI
- Add or file a 3-second audio clip of the goal voice
- Enter your textual content and choose a goal language (646 out there)
- Click on Generate — output is saved to your undertaking library
Voice Gallery: Search YouTube, browse classes, and obtain reference clips straight contained in the app to construct your voice library.
OmniVoice Studio — The best way to Use It
05 / 08
Video Dubbing
The complete dubbing pipeline runs regionally: transcribe → translate → synthesize → mux. Demucs isolates vocals so the unique background audio is preserved within the remaining export.
- Go to the Dub tab — paste a YouTube URL or add a neighborhood file
- WhisperX transcribes speech with word-level alignment
- Choose a goal language; translation runs robotically
- TTS engine re-voices the transcript; Demucs preserves background audio
- Export the ultimate MP4 with dubbed audio blended in
Batch Queue: Drop as much as 50 movies and stroll away. Every job has its personal progress bar monitoring via the total pipeline.
OmniVoice Studio — The best way to Use It
06 / 08
Dictation & Speaker Diarization
Dictation works system-wide from any software. Diarization identifies particular person audio system in a multi-speaker audio file utilizing Pyannote + WhisperX.
- Press ⌘+⇧+Area (macOS) to open the floating dictation widget
- Speech streams through WebSocket and auto-pastes into the lively enter discipline
- Add a multi-speaker file to the Diarization tab
- Pyannote identifies who stated what; every speaker will get an auto-extracted voice profile
- Assign a TTS voice per speaker for per-speaker dubbing
Hugging Face token required for Pyannote diarization. See docs/setup/huggingface-token.md within the repo.
OmniVoice Studio — The best way to Use It
07 / 08
TTS Engines
Six TTS engines are in-built. Change through Settings → TTS Engine or the env var:OMNIVOICE_TTS_BACKEND=cosyvoice
| Engine | Languages | Clone | Platform |
|---|---|---|---|
| OmniVoice (default) | 600+ | ✓ | CUDA / MPS / CPU |
| CosyVoice 3 | 9 + 18 dialects | ✓ | CUDA / MPS / CPU |
| MLX-Audio | Multi | Varies | Apple Silicon solely |
| VoxCPM2 | 30 | ✓ | CUDA / MPS / CPU |
| MOSS-TTS-Nano | 20 | ✓ | CUDA / CPU |
| KittenTTS | English | ✗ | CPU solely |
Customized engine: Subclass TTSBackend in backend/companies/tts_backend.py and add it to _REGISTRY. ~50 strains of Python.
OmniVoice Studio — The best way to Use It
08 / 08
MCP Server & Assets
OmniVoice Studio ships a built-in MCP Server, exposing voice and dubbing capabilities to any MCP-compatible consumer — Claude, Cursor, or your individual tooling — with out opening the desktop UI.
- MCP Server begins alongside the FastAPI backend on bun dev
- Level your MCP consumer on the native server to entry all endpoints
- AudioSeal (Meta) embeds an invisible neural watermark in all generated audio for AI provenance
- GitHub: github.com/debpalash/OmniVoice-Studio
- Set up docs: docs/set up/ (macos / home windows / linux / docker)
- Troubleshooting: docs/set up/troubleshooting.md
- Discord: discord.gg/bzQavDfVV9
