Google Gemini 3.1 Flash TTS:支持音频标签的可控文本转语音
来源:X/Twitter · @GoogleAI · 2026-04-17
发布内容
Today we launched Gemini 3.1 Flash TTS, our most expressive and controllable text-to-speech model yet.
This launch includes audio tags! 🗣🏷
Audio tags are a seamless way to guide vocal style, pace, and delivery using natural language commands embedded directly in your text. Want a different tempo or tone? Just tag the audio to steer the AI-speech output!
The model supports 70+ languages (24 of which are high-quality evaluated languages, including: Japanese, Hindi, and Arabic).
可用性
Gemini 3.1 Flash TTS is rolling out in Google Vids and is available today in preview via the Gemini API and in @GoogleAIStudio.
Whether you're creating a pitch deck or recording a passion project, transform your scripts into studio-quality narration: https://t.co/MG2YIQwKb6
社区反响
- @travasites: "Nice to see TTS getting the 'Flash' treatment — low latency matters for real time narration. Curious how the voice quality compares to ElevenLabs or Azure TTS, and whether the API supports streaming or SSML."
- @dan_in_robots: "Audio tags for steering speech output, exactly the practical control developers need. Already testing in AI Studio."
- @PromptSlinger: "Wait so the pitch deck thing, does it handle different speaker voices in the same script? Or is it one narrator per video?"
关键参数
| 特性 | 详情 | |------|------| | 模型 | Gemini 3.1 Flash TTS | | 语言支持 | 70+ 语言(24 种高质量评估语言) | | 特色功能 | Audio Tags(自然语言嵌入的语音风格控制) | | 可用平台 | Google Vids、Gemini API、AI Studio |