ConcurredConcurred API

Media API

MiniMax media generation — speech, music, and video

Generate speech, music, and video using MiniMax's media models through a single API.

Endpoints Overview

MethodEndpointModelDescription
POST/api/v1/media/speechSpeech 2.6Text-to-speech (returns audio binary)
POST/api/v1/media/musicMusic 2.5Music generation from text/lyrics
POST/api/v1/media/videoHailuo 2.3Submit video generation job
GET/api/v1/media/video/statusPoll video job status

Text-to-Speech

POST /api/v1/media/speech

Converts text to speech audio. Returns audio binary directly (mp3, wav, pcm, or flac).

Request

{
  "text": "Hello world, this is a test.",
  "model": "speech-2.6-hd",
  "voice_setting": {
    "voice_id": "female-shaonv",
    "speed": 1.0,
    "vol": 1.0
  },
  "audio_setting": {
    "sample_rate": 44100,
    "bitrate": 256000,
    "format": "mp3"
  }
}

Parameters

ParameterTypeRequiredDescription
textstringYesText to synthesize (max 10,000 characters)
modelstringNospeech-2.6-hd (default) or speech-2.6-turbo
voice_setting.voice_idstringNoVoice ID (default: female-shaonv). 300+ system voices available.
voice_setting.speednumberNoPlayback speed (default: 1.0)
voice_setting.volnumberNoVolume (default: 1.0)
audio_setting.formatstringNomp3 (default), wav, pcm, flac

Example

curl https://agent-heavy.vercel.app/api/v1/media/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world!", "model": "speech-2.6-hd"}' \
  --output speech.mp3

40 Languages

Speech 2.6 supports 40+ languages including English, Chinese, Japanese, Korean, French, German, Spanish, and more. The model automatically detects the language from input text.


Music Generation

POST /api/v1/media/music

Generate music from a text prompt and/or lyrics.

Request

{
  "model": "music-2.5",
  "prompt": "An upbeat pop song about summer",
  "lyrics": "[Verse]\nSunshine on my face\nWarm breeze in my hair\n[Chorus]\nSummer days are here again",
  "output_format": "url"
}

Parameters

ParameterTypeRequiredDescription
promptstringNo*Music style/mood description (max 2,000 chars)
lyricsstringNo*Song lyrics with structure tags (max 3,500 chars)
modelstringNomusic-2.5 (default) or music-2.5+
is_instrumentalbooleanNoGenerate instrumental only (music-2.5+ only)
lyrics_optimizerbooleanNoAuto-generate lyrics from prompt
output_formatstringNourl (default) or hex

*At least one of prompt or lyrics is required.

Lyrics Structure Tags

Use these tags to structure your lyrics:

[Verse], [Chorus], [Bridge], [Intro], [Outro], [Pre-Chorus], [Hook]

Response

{
  "success": true,
  "data": {
    "status": 2,
    "audio": "https://cdn.minimax.io/...",
    "duration": 180,
    "sample_rate": 44100,
    "format": "mp3"
  }
}

Video Generation (Hailuo)

Video generation uses an async submit-and-poll pattern.

Submit Job

POST /api/v1/media/video

{
  "prompt": "A cinematic sunset over mountains [Push in]",
  "model": "MiniMax-Hailuo-2.3",
  "duration": 6,
  "resolution": "1080P"
}

Parameters

ParameterTypeRequiredDescription
promptstringYesVideo description (max 2,000 chars)
modelstringNoMiniMax-Hailuo-2.3 (default) or MiniMax-Hailuo-2.3-Fast
durationintegerNoDuration in seconds (default: 6)
resolutionstringNo720P, 768P, or 1080P (default)
prompt_optimizerbooleanNoAuto-optimize prompt (default: true)

Camera Control

Use [command] syntax in your prompt for camera movements:

CommandEffect
[Push in]Camera moves forward
[Zoom out]Camera zooms out
[Pan left] / [Pan right]Horizontal pan
[Tilt up] / [Tilt down]Vertical tilt
[Truck left] / [Truck right]Camera slides sideways
[Static shot]No camera movement

Response

{
  "success": true,
  "data": {
    "task_id": "abc123...",
    "status": "submitted",
    "model": "MiniMax-Hailuo-2.3"
  }
}

Poll Status

GET /api/v1/media/video/status?task_id=abc123

{
  "success": true,
  "data": {
    "task_id": "abc123...",
    "status": "Success",
    "file_id": "file_xyz",
    "download_url": "https://cdn.minimax.io/..."
  }
}

Status values: Queueing, Processing, Success, Fail.

Full Workflow

# 1. Submit job
TASK_ID=$(curl -s https://agent-heavy.vercel.app/api/v1/media/video \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Ocean waves at sunset [Push in]"}' | jq -r '.data.task_id')
 
# 2. Poll until complete
curl "https://agent-heavy.vercel.app/api/v1/media/video/status?task_id=$TASK_ID" \
  -H "Authorization: Bearer YOUR_API_KEY"

Processing Time

Video generation typically takes 1-3 minutes depending on resolution and duration. Use the Fast model (MiniMax-Hailuo-2.3-Fast) for quicker results at slightly lower quality.

On this page