Skip to main content

Large Language Model Service

Generate intelligent chat completions using state-of-the-art language models. Our API is fully compatible with the OpenAI SDK, making integration seamless. Use /v1/chat/completions for both streaming and non-streaming responses.

Need an API Key? If you don't have an API key yet, you can create one here: https://playground.induslabs.io/register

OpenAI SDK Compatibility

Our LLM service is fully compatible with the OpenAI Python and JavaScript SDKs. Simply set the base_url parameter to https://voice.induslabs.io/v1 and use your API key.

Voice Agent Optimization

Enable finetune: true in the extra_body parameter to use our fine-tuned model specifically optimized for voice agent use cases. This model delivers superior performance in conversational AI applications with more natural, context-aware responses tailored for voice interactions.

Available Models
  • GPT OSS 120Bgpt-oss-120b
    High-performance 120B parameter model
  • Llama 4 Maverickllama-4-maverick
    17B parameter instruct model with 128k context

Authentication

All requests require authentication via the Authorization header with a Bearer token:

Authorization: Bearer YOUR_API_KEY

Your API key can be found in your dashboard at playground.induslabs.io

Complete Usage Examples

Simple Chat Completion

from openai import OpenAI

# Initialize the client
client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# Simple completion
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Hello! How are you?"}
],
temperature=0.7,
max_tokens=1000
)

print(response.choices[0].message.content)

Voice Agent with Fine-tuned Model

from openai import OpenAI

# Initialize the client
client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# Voice agent optimized completion
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi, introduce yourself?"}
],
temperature=0.7,
max_tokens=1000,
extra_body={
"finetune": True, # Enable voice-optimized model
"language": "en",
"gender": "female",
"accent": "american"
}
)

print(response.choices[0].message.content)

Multilingual Voice Agent (Hindi)

from openai import OpenAI

client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# Hindi language with Indian accent
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[
{"role": "user", "content": "नमस्ते, आप कैसे हैं?"}
],
temperature=0.7,
max_tokens=1000,
extra_body={
"finetune": True,
"language": "hi",
"gender": "female",
"accent": "indian"
}
)

print(response.choices[0].message.content)

Spanish Voice Agent

from openai import OpenAI

client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# Spanish with Mexican accent
response = client.chat.completions.create(
model="llama-4-maverick",
messages=[
{"role": "user", "content": "¿Puedes presentarte?"}
],
temperature=0.7,
max_tokens=1000,
extra_body={
"finetune": True,
"language": "es",
"gender": "female",
"accent": "mexican"
}
)

print(response.choices[0].message.content)

Gender-Specific Voice Response

from openai import OpenAI

client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# Male voice with British accent
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Tell me something interesting!"}
],
temperature=0.7,
max_tokens=1000,
extra_body={
"finetune": True,
"language": "en",
"gender": "male",
"accent": "british"
}
)

print(response.choices[0].message.content)

Voice Agent with Stop Sequences

from openai import OpenAI

client = OpenAI(
base_url="https://voice.induslabs.io/v1",
api_key="YOUR_API_KEY"
)

# With stop sequences and max tokens
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[
{"role": "user", "content": "Write a short motivational quote!"}
],
temperature=0.7,
max_tokens=150,
stop=["!", "
"],
extra_body={
"finetune": True,
"language": "en"
}
)

print(response.choices[0].message.content)
POST/v1/chat/completions

Chat Completions

Generate chat completions using large language models. Supports both streaming and non-streaming responses, with optional voice agent optimization.

Functionality
  • OpenAI SDK compatible - use the official OpenAI SDK with a custom base URL.
  • Supports multi-turn conversations by including message history.
  • Streaming mode returns Server-Sent Events (SSE) for real-time responses.
  • Non-streaming mode returns complete response with usage metrics.
  • Use extra_body.finetune=true to enable our fine-tuned model, which performs significantly better for voice agent use cases.

Request Parameters

NameTypeDefaultDescription
messagesarrayrequiredArray of message objects with role and content.
modelstringrequiredModel ID (e.g., "gpt-oss-120b", "llama-4-maverick").
temperaturenumber1.0Sampling temperature (0–2). Higher values = more random.
max_tokensintegernullMaximum tokens to generate in completion.
top_pnumber1.0Nucleus sampling parameter (0–1).
streambooleanfalseWhether to stream responses via SSE.
stoparraynullStop sequences to end generation early.
extra_bodyobjectnullAdditional parameters for voice agent optimization (see below).

Outputs

StatusTypeDescription
200 OKapplication/jsonReturns complete chat completion with usage metrics.
401 Unauthorizedapplication/jsonInvalid or missing API key.
422 Validation Errorapplication/jsonValidation failure. Inspect detail array.
503 Service Unavailableapplication/jsonLLM service temporarily unavailable.

extra_body Parameters (Optional)

NameTypeDefaultDescription
finetunebooleanfalseEnable fine-tuned model optimized for voice agent use cases.
languagestringnullTarget language code (e.g., "en", "hi", "es", "fr").
genderstringnullVoice gender preference: "male" or "female".
accentstringnullAccent/dialect (e.g., "american", "british", "indian", "mexican").

200 OK (Non-streaming)

{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "gpt-oss-120b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking. How can I assist you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 20,
"total_tokens": 32
}
}

401 Unauthorized

{"detail": "Invalid API key"}

422 Validation Error

{
"detail": [
{
"loc": ["string", 0],
"msg": "string",
"type": "string"
}
]
}
POST/v1/chat/completions (Streaming)

Chat Completions (Streaming)

Stream chat completions in real-time using Server-Sent Events. Set stream: true in the request body.

Functionality
  • Returns chunks as they are generated for low-latency responses.
  • Each chunk contains a delta with incremental content.
  • Stream ends with a [DONE] message.
  • Usage metrics are included in the final chunk.

Request Parameters

NameTypeDefaultDescription
messagesarrayrequiredArray of message objects with role and content.
modelstringrequiredModel ID (e.g., "gpt-oss-120b", "llama-4-maverick").
temperaturenumber1.0Sampling temperature (0–2). Higher values = more random.
max_tokensintegernullMaximum tokens to generate in completion.
top_pnumber1.0Nucleus sampling parameter (0–1).
streambooleanfalseWhether to stream responses via SSE.
stoparraynullStop sequences to end generation early.
extra_bodyobjectnullAdditional parameters for voice agent optimization (see below).

Outputs

StatusTypeDescription
200 OKtext/event-streamReturns chat completion chunks via SSE.
401 Unauthorizedapplication/jsonInvalid or missing API key.
422 Validation Errorapplication/jsonValidation failure. Inspect detail array.

200 OK (Streaming)

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"gpt-oss-120b","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

401 Unauthorized

{"detail": "Invalid API key"}

422 Validation Error

{
"detail": [
{
"loc": ["string", 0],
"msg": "string",
"type": "string"
}
]
}
GET/v1/chat/models

List Available Models

Retrieve a list of available chat models with their metadata.

Functionality
  • Returns OpenAI-compatible model list format.
  • Use the model IDs in your completion requests.
  • No authentication required for this endpoint.

Outputs

StatusTypeDescription
200 OKapplication/jsonReturns list of available models.

200 OK (Models List)

{
"object": "list",
"data": [
{
"id": "gpt-oss-120b",
"object": "model",
"created": 1677610602,
"owned_by": "openai"
},
{
"id": "llama-4-maverick",
"object": "model",
"created": 1677610602,
"owned_by": "meta"
}
]
}