Meet OmniVoice Studio: A Native, Open-Supply Various to ElevenLabs

OmniVoice Studio — The best way to Use It
01 / 08

What Is OmniVoice Studio?

OmniVoice Studio is an open-source desktop software for voice cloning, video dubbing, real-time dictation, and speaker diarization. All the things runs regionally in your machine. No API keys, no cloud account, no subscription required.

646 languages supported for TTS through the default OmniVoice engine
99 languages for transcription through WhisperX
Obtainable on macOS, Home windows, and Linux
GPU is non-obligatory — full pipeline runs on CPU
Free for private, academic, and analysis use (FSL-1.1-ALv2)

OmniVoice Studio — The best way to Use It
02 / 08

System Necessities

A GPU is non-obligatory. With out one, TTS runs roughly 3× slower on CPU. With ≤8 GB VRAM, TTS robotically offloads to CPU throughout transcription — no config wanted.

Part	Minimal	Beneficial
OS	Win 10 / macOS 12+ / Ubuntu 20.04+	Any fashionable 64-bit OS
RAM	8 GB	16 GB+
VRAM	4 GB (auto-offloads)	8 GB+ (RTX 3060+)
Disk	10 GB free	20 GB+ SSD
Python	3.10+	3.11–3.12
GPU	Optionally available	CUDA / MPS / ROCm

OmniVoice Studio — The best way to Use It
03 / 08

Set up

The undertaking recommends operating from supply. Set up three stipulations first: ffmpeg, Bun (JS runtime), and uv (Python package deal supervisor).

git clone https://github.com/debpalash/OmniVoice-Studio.git cd OmniVoice-Studio uv sync bun set up bun dev

Frontend masses at http://localhost:5173 | API runs on port 8000.
Mannequin weights obtain robotically on first era.

Pre-built installers out there: macOS DMG, Home windows MSI, Linux AppImage and .deb — see the Releases web page on GitHub.

OmniVoice Studio — The best way to Use It
04 / 08

Voice Cloning

Voice cloning makes use of zero-shot studying — it clones a voice from a clip as brief as 3 seconds, with out prior coaching on that voice. The default OmniVoice engine situations a diffusion-based TTS mannequin on the reference audio.

Go to the Voice Clone tab within the UI
Add or file a 3-second audio clip of the goal voice
Enter your textual content and choose a goal language (646 out there)
Click on Generate — output is saved to your undertaking library

Voice Gallery: Search YouTube, browse classes, and obtain reference clips straight contained in the app to construct your voice library.

OmniVoice Studio — The best way to Use It
05 / 08

Video Dubbing

The complete dubbing pipeline runs regionally: transcribe → translate → synthesize → mux. Demucs isolates vocals so the unique background audio is preserved within the remaining export.

Go to the Dub tab — paste a YouTube URL or add a neighborhood file
WhisperX transcribes speech with word-level alignment
Choose a goal language; translation runs robotically
TTS engine re-voices the transcript; Demucs preserves background audio
Export the ultimate MP4 with dubbed audio blended in

Batch Queue: Drop as much as 50 movies and stroll away. Every job has its personal progress bar monitoring via the total pipeline.

OmniVoice Studio — The best way to Use It
06 / 08

Dictation & Speaker Diarization

Dictation works system-wide from any software. Diarization identifies particular person audio system in a multi-speaker audio file utilizing Pyannote + WhisperX.

Press ⌘+⇧+Area (macOS) to open the floating dictation widget
Speech streams through WebSocket and auto-pastes into the lively enter discipline
Add a multi-speaker file to the Diarization tab
Pyannote identifies who stated what; every speaker will get an auto-extracted voice profile
Assign a TTS voice per speaker for per-speaker dubbing

Hugging Face token required for Pyannote diarization. See docs/setup/huggingface-token.md within the repo.

OmniVoice Studio — The best way to Use It
07 / 08

TTS Engines

Six TTS engines are in-built. Change through Settings → TTS Engine or the env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

Engine	Languages	Clone	Platform
OmniVoice (default)	600+	✓	CUDA / MPS / CPU
CosyVoice 3	9 + 18 dialects	✓	CUDA / MPS / CPU
MLX-Audio	Multi	Varies	Apple Silicon solely
VoxCPM2	30	✓	CUDA / MPS / CPU
MOSS-TTS-Nano	20	✓	CUDA / CPU
KittenTTS	English	✗	CPU solely

Customized engine: Subclass TTSBackend in backend/companies/tts_backend.py and add it to _REGISTRY. ~50 strains of Python.

OmniVoice Studio — The best way to Use It
08 / 08

MCP Server & Assets

OmniVoice Studio ships a built-in MCP Server, exposing voice and dubbing capabilities to any MCP-compatible consumer — Claude, Cursor, or your individual tooling — with out opening the desktop UI.

MCP Server begins alongside the FastAPI backend on bun dev
Level your MCP consumer on the native server to entry all endpoints
AudioSeal (Meta) embeds an invisible neural watermark in all generated audio for AI provenance

GitHub: github.com/debpalash/OmniVoice-Studio
Set up docs: docs/set up/ (macos / home windows / linux / docker)
Troubleshooting: docs/set up/troubleshooting.md
Discord: discord.gg/bzQavDfVV9

Source link

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

Meet OmniVoice Studio: A Native, Open-Supply Various to ElevenLabs

Read More