Qwen3-TTS

Qwen3-TTS screenshot

Qwen3-TTS

Powerful open-source speech generation: voice clone, design, human-like TTS, and natural language voice control.

0 views
Added Feb 07, 2026
Open Source

About

Qwen3-TTS is a series of advanced speech generation models from Qwen, offering capabilities for voice cloning, voice design, ultra-high-quality human-like speech synthesis, and natural language-based voice control. It supports 10 major languages and multiple dialects, featuring strong contextual understanding for adaptive control of tone, speaking rate, and emotional expression based on instructions and text semantics. The models are built on a self-developed Qwen3-TTS-Tokenizer-12Hz for efficient acoustic compression and semantic modeling, utilizing a universal end-to-end architecture with a discrete multi-codebook LM for enhanced versatility and performance. It also supports extreme low-latency streaming generation with an innovative Dual-Track hybrid architecture, achieving synthesis latency as low as 97ms. The intelligent text understanding enables flexible control over acoustic attributes through natural language instructions.

Key features include:

  • Voice clone (3-second rapid voice clone)
  • Voice design
  • Ultra-high-quality human-like speech generation
  • Natural language-based voice control
  • Multi-language and dialect support
  • Low-latency streaming generation

Models are available for download via Hugging Face and ModelScope. The project includes Python package usage for custom voice generation, voice design, and voice clone, with examples for environment setup, tokenizer encode/decode, and local web UI demo.

Code Example

from qwen_tts import Qwen3TTSModel
import torch
import soundfile as sf

model = Qwen3TTSModel.from_pretrained(
    "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
    device_map="cuda:0",
    dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

wavs, sr = model.generate_custom_voice(
    text="其实我真的有发现,我是一个特别善于观察别人情绪的人。",
    language="Chinese",
    speaker="Vivian",
    instruct="用特别愤怒的语气说",
)
sf.write("output_custom_voice.wav", wavs[0], sr)

Categories

Alternatives

Coqui TTS
Suggested
Tacotron 2
Suggested
VITS
Suggested