/v1beta/models/{model}:generateContent 调用,不是 OpenAI 兼容的 /v1/audio/speech。POST https://cloud.dataeyes.ai/v1beta/models/gemini-3.1-flash-tts-preview:generateContentAuthorization: Bearer sk-你的API密钥{
"candidates": [
{
"content": {
"parts": [
{
"inlineData": {
"mimeType": "audio/l16; rate=24000; channels=1",
"data": "AAAAAAAAAA...(base64 编码的 PCM 音频数据)"
}
}
]
}
}
],
"usageMetadata": {
"promptTokenCount": 15,
"candidatesTokenCount": 200,
"totalTokenCount": 215
}
}| 字段 | 说明 |
|---|---|
mimeType | audio/l16; rate=24000; channels=1 — 16bit PCM,24kHz 采样率,单声道 |
data | base64 编码的原始 PCM 音频数据 |
speechConfig.voiceConfig.prebuiltVoiceConfig.voiceName 中指定:| 声音名称 | 说明 |
|---|---|
| Kore | 女声 |
| Charon | 男声 |
| Fenrir | 男声 |
| Aoede | 女声 |
| Puck | 男声 |
| Leda | 女声 |
| 类型 | 付费层级(每 100 万 token) |
|---|---|
| 输入(文字) | $1.00 |
| 输出(音频) | $20.00 |
/v1/audio/speech 端点responseModalities 必须设置为 ["AUDIO"],否则返回文本而非语音streamGenerateContent?alt=sse:POST /v1beta/models/gemini-3.1-flash-tts-preview:streamGenerateContent?alt=sse"Hello, welcome to our service. This is a test of text to speech."audio/l16; rate=24000; channels=1