Tuesday, May 26, 2026

Meet OmniVoice Studio: A Native, Open-Supply Various to ElevenLabs

Share


OmniVoice Studio — The best way to Use It
01 / 08

What Is OmniVoice Studio?

OmniVoice Studio is an open-source desktop software for voice cloning, video dubbing, real-time dictation, and speaker diarization. All the things runs regionally in your machine. No API keys, no cloud account, no subscription required.

  • 646 languages supported for TTS through the default OmniVoice engine
  • 99 languages for transcription through WhisperX
  • Obtainable on macOS, Home windows, and Linux
  • GPU is non-obligatory — full pipeline runs on CPU
  • Free for private, academic, and analysis use (FSL-1.1-ALv2)

OmniVoice Studio — The best way to Use It
02 / 08

System Necessities

A GPU is non-obligatory. With out one, TTS runs roughly 3× slower on CPU. With ≤8 GB VRAM, TTS robotically offloads to CPU throughout transcription — no config wanted.

Part Minimal Beneficial
OS Win 10 / macOS 12+ / Ubuntu 20.04+ Any fashionable 64-bit OS
RAM 8 GB 16 GB+
VRAM 4 GB (auto-offloads) 8 GB+ (RTX 3060+)
Disk 10 GB free 20 GB+ SSD
Python 3.10+ 3.11–3.12
GPU Optionally available CUDA / MPS / ROCm

OmniVoice Studio — The best way to Use It
03 / 08

Set up

The undertaking recommends operating from supply. Set up three stipulations first: ffmpeg, Bun (JS runtime), and uv (Python package deal supervisor).

git clone https://github.com/debpalash/OmniVoice-Studio.git
cd OmniVoice-Studio
uv sync
bun set up
bun dev

Frontend masses at http://localhost:5173  |  API runs on port 8000.
Mannequin weights obtain robotically on first era.

Pre-built installers out there: macOS DMG, Home windows MSI, Linux AppImage and .deb — see the Releases web page on GitHub.

OmniVoice Studio — The best way to Use It
04 / 08

Voice Cloning

Voice cloning makes use of zero-shot studying — it clones a voice from a clip as brief as 3 seconds, with out prior coaching on that voice. The default OmniVoice engine situations a diffusion-based TTS mannequin on the reference audio.

  • Go to the Voice Clone tab within the UI
  • Add or file a 3-second audio clip of the goal voice
  • Enter your textual content and choose a goal language (646 out there)
  • Click on Generate — output is saved to your undertaking library

Voice Gallery: Search YouTube, browse classes, and obtain reference clips straight contained in the app to construct your voice library.

OmniVoice Studio — The best way to Use It
05 / 08

Video Dubbing

The complete dubbing pipeline runs regionally: transcribe → translate → synthesize → mux. Demucs isolates vocals so the unique background audio is preserved within the remaining export.

  • Go to the Dub tab — paste a YouTube URL or add a neighborhood file
  • WhisperX transcribes speech with word-level alignment
  • Choose a goal language; translation runs robotically
  • TTS engine re-voices the transcript; Demucs preserves background audio
  • Export the ultimate MP4 with dubbed audio blended in

Batch Queue: Drop as much as 50 movies and stroll away. Every job has its personal progress bar monitoring via the total pipeline.

OmniVoice Studio — The best way to Use It
06 / 08

Dictation & Speaker Diarization

Dictation works system-wide from any software. Diarization identifies particular person audio system in a multi-speaker audio file utilizing Pyannote + WhisperX.

  • Press ⌘+⇧+Area (macOS) to open the floating dictation widget
  • Speech streams through WebSocket and auto-pastes into the lively enter discipline
  • Add a multi-speaker file to the Diarization tab
  • Pyannote identifies who stated what; every speaker will get an auto-extracted voice profile
  • Assign a TTS voice per speaker for per-speaker dubbing

Hugging Face token required for Pyannote diarization. See docs/setup/huggingface-token.md within the repo.

OmniVoice Studio — The best way to Use It
07 / 08

TTS Engines

Six TTS engines are in-built. Change through Settings → TTS Engine or the env var:
OMNIVOICE_TTS_BACKEND=cosyvoice

Engine Languages Clone Platform
OmniVoice (default) 600+ CUDA / MPS / CPU
CosyVoice 3 9 + 18 dialects CUDA / MPS / CPU
MLX-Audio Multi Varies Apple Silicon solely
VoxCPM2 30 CUDA / MPS / CPU
MOSS-TTS-Nano 20 CUDA / CPU
KittenTTS English CPU solely

Customized engine: Subclass TTSBackend in backend/companies/tts_backend.py and add it to _REGISTRY. ~50 strains of Python.

OmniVoice Studio — The best way to Use It
08 / 08

MCP Server & Assets

OmniVoice Studio ships a built-in MCP Server, exposing voice and dubbing capabilities to any MCP-compatible consumer — Claude, Cursor, or your individual tooling — with out opening the desktop UI.

  • MCP Server begins alongside the FastAPI backend on bun dev
  • Level your MCP consumer on the native server to entry all endpoints
  • AudioSeal (Meta) embeds an invisible neural watermark in all generated audio for AI provenance
  • GitHub: github.com/debpalash/OmniVoice-Studio
  • Set up docs: docs/set up/ (macos / home windows / linux / docker)
  • Troubleshooting: docs/set up/troubleshooting.md
  • Discord: discord.gg/bzQavDfVV9



Source link

Read more

Read More