Media API
MiniMax media generation — speech, music, and video
Generate speech, music, and video using MiniMax's media models through a single API.
Endpoints Overview
| Method | Endpoint | Model | Description |
|---|---|---|---|
POST | /api/v1/media/speech | Speech 2.6 | Text-to-speech (returns audio binary) |
POST | /api/v1/media/music | Music 2.5 | Music generation from text/lyrics |
POST | /api/v1/media/video | Hailuo 2.3 | Submit video generation job |
GET | /api/v1/media/video/status | — | Poll video job status |
Text-to-Speech
POST /api/v1/media/speech
Converts text to speech audio. Returns audio binary directly (mp3, wav, pcm, or flac).
Request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to synthesize (max 10,000 characters) |
model | string | No | speech-2.6-hd (default) or speech-2.6-turbo |
voice_setting.voice_id | string | No | Voice ID (default: female-shaonv). 300+ system voices available. |
voice_setting.speed | number | No | Playback speed (default: 1.0) |
voice_setting.vol | number | No | Volume (default: 1.0) |
audio_setting.format | string | No | mp3 (default), wav, pcm, flac |
Example
40 Languages
Speech 2.6 supports 40+ languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more. The model automatically detects the language from input text.
Music Generation
POST /api/v1/media/music
Generate music from a text prompt and/or lyrics.
Request
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | No* | Music style/mood description (max 2,000 chars) |
lyrics | string | No* | Song lyrics with structure tags (max 3,500 chars) |
model | string | No | music-2.5 (default) or music-2.5+ |
is_instrumental | boolean | No | Generate instrumental only (music-2.5+ only) |
lyrics_optimizer | boolean | No | Auto-generate lyrics from prompt |
output_format | string | No | url (default) or hex |
*At least one of prompt or lyrics is required.
Lyrics Structure Tags
Use these tags to structure your lyrics:
Response
Video Generation (Hailuo)
Video generation uses an async submit-and-poll pattern.
Submit Job
POST /api/v1/media/video
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
prompt | string | Yes | Video description (max 2,000 chars) |
model | string | No | MiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast |
duration | integer | No | Duration in seconds (default: 6) |
resolution | string | No | 720P, 768P, or 1080P (default) |
prompt_optimizer | boolean | No | Auto-optimize prompt (default: true) |
Camera Control
Use [command] syntax in your prompt for camera movements:
| Command | Effect |
|---|---|
[Push in] | Camera moves forward |
[Zoom out] | Camera zooms out |
[Pan left] / [Pan right] | Horizontal pan |
[Tilt up] / [Tilt down] | Vertical tilt |
[Truck left] / [Truck right] | Camera slides sideways |
[Static shot] | No camera movement |
Response
Poll Status
GET /api/v1/media/video/status?task_id=abc123
Status values: Queueing, Processing, Success, Fail.
Full Workflow
Processing Time
Video generation typically takes 1-3 minutes depending on resolution and duration. Use the Fast model (MiniMax-Hailuo-2.3-Fast) for quicker results at slightly lower quality.