Large Language Model Service

Generate intelligent chat completions using state-of-the-art language models. Our API is fully compatible with the OpenAI SDK, making integration seamless. Use /v1/chat/completions for both streaming and non-streaming responses.

Need an API Key? If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register

OpenAI SDK Compatibility

Our LLM service is fully compatible with the OpenAI Python and JavaScript SDKs. Simply set the base_url parameter to https://voice.induslabs.io/v1 and use your API key.

Voice Agent Optimization

Enable finetune: true in the extra_body parameter to use our fine-tuned model specifically optimized for voice agent use cases. This model delivers superior performance in conversational AI applications with more natural, context-aware responses tailored for voice interactions.

Available Models

GPT OSS 120Bgpt-oss-120b
High-performance 120B parameter model
Llama 4 Maverickllama-4-maverick
17B parameter instruct model with 128k context

Authentication

All requests require authentication via the Authorization header with a Bearer token:

Authorization: Bearer YOUR_API_KEY

Your API key can be found in your dashboard at playground.induslabs.io

Complete Usage Examples

Simple Chat Completion

from openai import OpenAI

# Initialize the client
client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# Simple completion
response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Hello! How are you?"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)

Voice Agent with Fine-tuned Model

from openai import OpenAI

# Initialize the client
client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# Voice agent optimized completion
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hi, introduce yourself?"}
    ],
    temperature=0.7,
    max_tokens=1000,
    extra_body={
        "finetune": True,  # Enable voice-optimized model
        "language": "en",
        "gender": "female",
        "accent": "american"
    }
)

print(response.choices[0].message.content)

Multilingual Voice Agent (Hindi)

from openai import OpenAI

client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# Hindi language with Indian accent
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {"role": "user", "content": "नमस्ते, आप कैसे हैं?"}
    ],
    temperature=0.7,
    max_tokens=1000,
    extra_body={
        "finetune": True,
        "language": "hi",
        "gender": "female",
        "accent": "indian"
    }
)

print(response.choices[0].message.content)

Spanish Voice Agent

from openai import OpenAI

client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# Spanish with Mexican accent
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {"role": "user", "content": "¿Puedes presentarte?"}
    ],
    temperature=0.7,
    max_tokens=1000,
    extra_body={
        "finetune": True,
        "language": "es",
        "gender": "female",
        "accent": "mexican"
    }
)

print(response.choices[0].message.content)

Gender-Specific Voice Response

from openai import OpenAI

client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# Male voice with British accent
response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Tell me something interesting!"}
    ],
    temperature=0.7,
    max_tokens=1000,
    extra_body={
        "finetune": True,
        "language": "en",
        "gender": "male",
        "accent": "british"
    }
)

print(response.choices[0].message.content)

Voice Agent with Stop Sequences

from openai import OpenAI

client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

# With stop sequences and max tokens
response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Write a short motivational quote!"}
    ],
    temperature=0.7,
    max_tokens=150,
    stop=["!", "
"],
    extra_body={
        "finetune": True,
        "language": "en"
    }
)

print(response.choices[0].message.content)

POST/v1/chat/completions

Chat Completions

Generate chat completions using large language models. Supports both streaming and non-streaming responses, with optional voice agent optimization.

Functionality

OpenAI SDK compatible - use the official OpenAI SDK with a custom base URL.
Supports multi-turn conversations by including message history.
Streaming mode returns Server-Sent Events (SSE) for real-time responses.
Non-streaming mode returns complete response with usage metrics.
Use extra_body.finetune=true to enable our fine-tuned model, which performs significantly better for voice agent use cases.

Request Parameters

Name	Type	Default	Description
`messages`	array	required	Array of message objects with role and content.
`model`	string	required	Model ID (e.g., "gpt-oss-120b", "llama-4-maverick").
`temperature`	number	1.0	Sampling temperature (0–2). Higher values = more random.
`max_tokens`	integer	null	Maximum tokens to generate in completion.
`top_p`	number	1.0	Nucleus sampling parameter (0–1).
`stream`	boolean	false	Whether to stream responses via SSE.
`stop`	array	null	Stop sequences to end generation early.
`extra_body`	object	null	Additional parameters for voice agent optimization (see below).

Outputs

Status	Type	Description
`200 OK`	application/json	Returns complete chat completion with usage metrics.
`401 Unauthorized`	application/json	Invalid or missing API key.
`422 Validation Error`	application/json	Validation failure. Inspect detail array.
`503 Service Unavailable`	application/json	LLM service temporarily unavailable.

extra_body Parameters (Optional)

Name	Type	Default	Description
`finetune`	boolean	false	Enable fine-tuned model optimized for voice agent use cases.
`language`	string	null	Target language code (e.g., "en", "hi", "es", "fr").
`gender`	string	null	Voice gender preference: "male" or "female".
`accent`	string	null	Accent/dialect (e.g., "american", "british", "indian", "mexican").

200 OK (Non-streaming)

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-oss-120b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I assist you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 20,
    "total_tokens": 32
  }
}

401 Unauthorized

{"detail": "Invalid API key"}

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

POST/v1/chat/completions (Streaming)

Chat Completions (Streaming)

Stream chat completions in real-time using Server-Sent Events. Set stream: true in the request body.

Functionality

Returns chunks as they are generated for low-latency responses.
Each chunk contains a delta with incremental content.
Stream ends with a [DONE] message.
Usage metrics are included in the final chunk.

Request Parameters

Name	Type	Default	Description
`messages`	array	required	Array of message objects with role and content.
`model`	string	required	Model ID (e.g., "gpt-oss-120b", "llama-4-maverick").
`temperature`	number	1.0	Sampling temperature (0–2). Higher values = more random.
`max_tokens`	integer	null	Maximum tokens to generate in completion.
`top_p`	number	1.0	Nucleus sampling parameter (0–1).
`stream`	boolean	false	Whether to stream responses via SSE.
`stop`	array	null	Stop sequences to end generation early.
`extra_body`	object	null	Additional parameters for voice agent optimization (see below).

Outputs

Status	Type	Description
`200 OK`	text/event-stream	Returns chat completion chunks via SSE.
`401 Unauthorized`	application/json	Invalid or missing API key.
`422 Validation Error`	application/json	Validation failure. Inspect detail array.

200 OK (Streaming)

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

401 Unauthorized

{"detail": "Invalid API key"}

422 Validation Error

{
  "detail": [
    {
      "loc": ["string", 0],
      "msg": "string",
      "type": "string"
    }
  ]
}

GET/v1/chat/models

List Available Models

Retrieve a list of available chat models with their metadata.

Functionality

Returns OpenAI-compatible model list format.
Use the model IDs in your completion requests.
No authentication required for this endpoint.

Outputs

Status	Type	Description
`200 OK`	application/json	Returns list of available models.

200 OK (Models List)

{
  "object": "list",
  "data": [
    {
      "id": "gpt-oss-120b",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai"
    },
    {
      "id": "llama-4-maverick",
      "object": "model",
      "created": 1677610602,
      "owned_by": "meta"
    }
  ]
}

Quick Integration

Sample requests to get chat completions running in minutes.

API EndpointLanguage

from openai import OpenAI

client = OpenAI(
    base_url="https://voice.induslabs.io/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Hello! How are you?"}
    ],
    temperature=0.7,
    max_tokens=1000
)

print(response.choices[0].message.content)