Text-to-Speech Service

Deliver natural-sounding speech with configurable voices, streaming playback, and file-based output. All endpoints use a consistent JSON payload via POST requests, making integration simple and straightforward.

Need an API Key? If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register

Shared Request Payload

All text-to-speech endpoints use the same JSON schema sent via POST request. Simply adjust parameters like output_format or stream depending on your use case.

{
  "text": "Hello, this is a test request.",
  "voice": "Indus-hi-maya",
  "output_format": "wav",
  "model": "indus-tts-v1",
  "api_key": "YOUR_API_KEY",
  "normalize": true,
  "stream": true,
  "speed": 1,
  "pitch_shift": 0,
  "loudness_db": 0
}

Payload Fields

Name	Type	Default	Description
`text`	string	required	The text to be synthesized into speech.
`voice`	string	Indus-hi-maya	The voice model to be used (e.g., "Indus-hi-maya").
`output_format`	string	wav	Audio format for output (e.g., "wav", "mp3", "pcm").
`model`	string	indus-tts-v1	The TTS model to use (e.g., "indus-tts-v1").
`api_key`	string	required	Authentication API key.
`normalize`	boolean	true	Whether to normalize text before synthesis (default: true).
`stream`	boolean	true	Whether to stream the output (default: true).
`speed`	number	1	Speed of speech synthesis (default: 1).
`pitch_shift`	number	0	Pitch shift adjustment (default: 0).
`loudness_db`	number	0	Loudness adjustment in decibels (default: 0).

POST/v1/audio/speech

Synthesize Speech

This endpoint is used to synthesize speech (TTS - Text-to-Speech) and stream the audio data.

Functionality

Converts input text into speech audio.
Uses credit system authentication.
Returns audio data directly in the response body.
Supports streaming for real-time audio playback.

Inputs

Name	Type	Default	Description
`text`	string	required	The text to be synthesized into speech.
`voice`	string	Indus-hi-maya	The voice model to be used (e.g., "Indus-hi-maya").
`output_format`	string	wav	Audio format for output (e.g., "wav", "mp3", "pcm").
`model`	string	indus-tts-v1	The TTS model to use (e.g., "indus-tts-v1").
`api_key`	string	required	Authentication API key.
`normalize`	boolean	true	Whether to normalize text before synthesis (default: true).
`stream`	boolean	true	Whether to stream the output (default: true).
`speed`	number	1	Speed of speech synthesis (default: 1).
`pitch_shift`	number	0	Pitch shift adjustment (default: 0).
`loudness_db`	number	0	Loudness adjustment in decibels (default: 0).

Outputs

Status	Type	Default	Description
`200 OK`	audio/wav	-	Returns synthesized speech audio as binary data.
`422 Validation Error`	application/json	-	Validation failure. Inspect detail array.

200 OK

Binary audio data (WAV format)

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

POST/v1/audio/speech/file

Synthesize Speech File

This endpoint is used to synthesize speech (TTS - Text-to-Speech) and return the complete audio file as a downloadable file.

Functionality

Converts input text into speech audio.
Returns the synthesized audio as a complete file download.
Unlike /v1/audio/speech, this endpoint returns the full audio file at once.

Inputs

Name	Type	Default	Description
`text`	string	required	The text to be synthesized into speech.
`voice`	string	Indus-hi-maya	The voice model to be used (e.g., "Indus-hi-maya").
`output_format`	string	wav	Audio format for output (e.g., "wav", "mp3", "pcm").
`model`	string	indus-tts-v1	The TTS model to use (e.g., "indus-tts-v1").
`api_key`	string	required	Authentication API key.
`normalize`	boolean	true	Whether to normalize text before synthesis (default: true).
`stream`	boolean	true	Whether to stream the output (default: true).
`speed`	number	1	Speed of speech synthesis (default: 1).
`pitch_shift`	number	0	Pitch shift adjustment (default: 0).
`loudness_db`	number	0	Loudness adjustment in decibels (default: 0).

Outputs

Status	Type	Default	Description
`200 OK`	audio/wav	-	Returns the synthesized speech audio as a downloadable file.
`422 Validation Error`	application/json	-	Validation failure. Inspect detail array.

200 OK

Binary audio data (WAV format)

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

POST/v1/audio/speech/preview

Speech Preview

This endpoint provides a preview of how text will be processed for speech synthesis without actually generating audio.

Functionality

Accepts input text and parameters, then shows how the text would be processed by the TTS system.
Does not generate audio, only returns metadata and analysis of the input.
Useful for estimating credits, duration, and validating parameters before synthesis.

Inputs

Name	Type	Default	Description
`text`	string	required	The text to be synthesized into speech.
`voice`	string	Indus-hi-maya	The voice model to be used (e.g., "Indus-hi-maya").
`output_format`	string	wav	Audio format for output (e.g., "wav", "mp3", "pcm").
`model`	string	indus-tts-v1	The TTS model to use (e.g., "indus-tts-v1").
`api_key`	string	required	Authentication API key.
`normalize`	boolean	true	Whether to normalize text before synthesis (default: true).
`stream`	boolean	true	Whether to stream the output (default: true).
`speed`	number	1	Speed of speech synthesis (default: 1).
`pitch_shift`	number	0	Pitch shift adjustment (default: 0).
`loudness_db`	number	0	Loudness adjustment in decibels (default: 0).

Outputs

Status	Type	Default	Description
`200 OK`	application/json	-	Returns detailed analysis including character count, word count, estimated duration, credit cost, and configuration details.
`422 Validation Error`	application/json	-	Validation failure. Inspect detail array.

200 OK

{
  "analysis": {
    "total_characters": 30,
    "total_words": 6,
    "estimated_duration_seconds": 2.4,
    "estimated_credits": 0.04,
    "chunking_strategy": {
      "total_chunks": 1,
      "max_words_per_chunk": 15,
      "overlap_words": 0,
      "chunks": [
        {
          "index": 0,
          "word_count": 6,
          "text_preview": "Hello, this is a test request.",
          "is_final": true
        }
      ]
    }
  },
  "configuration": {
    "voice": "Indus-hi-maya",
    "model": "indus-tts-v1",
    "output_format": "wav",
    "stream": true,
    "temperature": 0.6,
    "max_tokens": 1800,
    "top_p": 0.8,
    "repetition_penalty": 1.1,
    "bitrate": null
  },
  "user_info": {
    "user_id": "USR_A3E785AF",
    "credits_remaining": 399.58,
    "tts_unit_cost": 1,
    "sufficient_credits": true
  },
  "output_settings": {
    "format": "wav",
    "voice": "Indus-hi-maya",
    "model": "indus-tts-v1",
    "streaming": true,
    "sample_rate": 24000,
    "channels": 1,
    "bit_depth": 16
  },
  "text_processing": {
    "original_text": "Hello, this is a test request.",
    "processed_text": null,
    "normalization_applied": false,
    "normalize_setting": true,
    "character_change": 0
  },
  "size_estimates": {
    "pcm_bytes": 115200,
    "wav_bytes": 115244,
    "mp3_bytes": 38400,
    "target_format_bytes": 115244
  }
}

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

GET/api/voice/get-voices

List Available Voices

Retrieves the catalog of voices available for speech synthesis across multiple languages.

Functionality

Returns a comprehensive list of available voices organized by language.
Each voice includes name, voice_id, and gender information.
Supports multiple languages including Hindi, English, Bengali, Kannada, Marathi, Telugu, Arabic, and regional languages.
No authentication required for this endpoint.

Outputs

Status	Type	Default	Description
`200 OK`	application/json	-	Returns voice catalog organized by language with name, voice_id, and gender for each voice.
`422 Validation Error`	application/json	-	Validation failure. Inspect detail array.

200 OK

{
  "status_code": 200,
  "message": "Voices fetched successfully",
  "error": null,
  "data": {
    "hindi": [
      {
        "name": "Maya",
        "voice_id": "Indus-hi-maya",
        "gender": "female"
      },
      {
        "name": "Urvashi",
        "voice_id": "Indus-hi-Urvashi",
        "gender": "female"
      },
      {
        "name": "Aditi",
        "voice_id": "Indus-hi-Aditi",
        "gender": "female"
      },
      {
        "name": "Arjun",
        "voice_id": "Indus-hi-Arjun",
        "gender": "male"
      }
    ],
    "english": [
      {
        "name": "Maya",
        "voice_id": "Indus-en-maya",
        "gender": "female"
      },
      {
        "name": "Urvashi",
        "voice_id": "Indus-en-Urvashi",
        "gender": "female"
      }
    ],
    "bengali": [
      {
        "name": "Alivia",
        "voice_id": "Indus-bn-Alivia",
        "gender": "female"
      },
      {
        "name": "Sayan",
        "voice_id": "Indus-bn-Sayan",
        "gender": "male"
      }
    ],
    "kannada": [
      {
        "name": "Aahna",
        "voice_id": "Indus-bn-Aahna",
        "gender": "female"
      },
      {
        "name": "Chinmay",
        "voice_id": "Indus-bn-Chinmay",
        "gender": "male"
      }
    ],
    "arabic": [
      {
        "name": "Fatima",
        "voice_id": "Indus-ar-Fatima",
        "gender": "female"
      },
      {
        "name": "Hamdan",
        "voice_id": "Indus-ar-Hamdan",
        "gender": "male"
      }
    ]
  }
}

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

Quick Integration

Reference snippets for generating speech instantly.

API EndpointLanguage

import requests

url = "https://voice.induslabs.io/v1/audio/speech"
payload = {
    "text": "Hello from IndusLabs Voice API",
    "voice": "Indus-hi-maya",
    "output_format": "mp3",
    "stream": False,
    "model": "indus-tts-v1",
    "api_key": "YOUR_API_KEY"
}

response = requests.post(url, headers={"Content-Type": "application/json"}, json=payload, timeout=30)
response.raise_for_status()

with open("output.mp3", "wb") as f:
    f.write(response.content)
print("Audio saved to output.mp3")