Kurdish TTS & STT API
A simple HTTP API for Kurdish text-to-speech and speech-to-text, covering both Sorani (Central Kurdish) and Kurmanji (Northern Kurdish). Send text, get natural Kurdish speech from 664 voices; or send audio, get an accurate transcript — by file upload or live streaming. There is a free tier, and paid plans start at $5/month.
Machine-readable spec: OpenAPI 3.1 (/openapi.json). Get an API key in Settings → API. See pricing & plans.
Authentication
All endpoints except GET /api/get-speakers require an API key in the x-api-key request header. Base URL: https://www.kurdishtts.com.
TTS and STT use separate keys. A TTS key authenticates the text-to-speech endpoint; an STT key authenticates the speech-to-text endpoints. They are not interchangeable — a TTS key returns 401 against an STT endpoint. Generate both in Settings → API. Keep keys server-side; never ship them in client code.
Text-to-Speech — POST /api/tts-proxy
Converts Kurdish text to speech. Returns audio/wav by default, or JSON with base64 audio and word-level timestamps when include_timestamps is true. The dialect is derived from the speaker_id prefix (sorani_… / kurmanji_…).
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Text to synthesize. Max 500 chars (free) / 5000 (paid). |
speaker_id | string | Yes | Voice id from /api/get-speakers, e.g. sorani_1, kurmanji_236. |
model_version | "v3" | "v4" | No | Default v3. Strict — any other value is a 422. |
include_timestamps | boolean | No | Default false. true → JSON with base64 audio + word timestamps. |
speed | number | No | 0.25–4.0, higher = faster (industry convention; inverted internally). |
temperature / stability | number | No | v4 only, mutually exclusive. Omit for default — 0.0 is a 422. |
seed | integer | No | v4 only. Reproducible output; echoed as generation.seed_used. |
pitch, top_p, repetition_penalty… | number | No | Optional v4 / post-processing controls. |
Example — cURL
curl -X POST https://www.kurdishtts.com/api/tts-proxy \
-H "x-api-key: YOUR_TTS_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text":"سڵاو، چۆنیت؟","speaker_id":"sorani_1"}' \
--output speech.wavExample — Python
import requests
resp = requests.post(
"https://www.kurdishtts.com/api/tts-proxy",
headers={"x-api-key": "YOUR_TTS_API_KEY"},
json={"text": "سڵاو، چۆنیت؟", "speaker_id": "sorani_1"},
)
with open("speech.wav", "wb") as f:
f.write(resp.content)Example — JavaScript
const res = await fetch("https://www.kurdishtts.com/api/tts-proxy", {
method: "POST",
headers: {
"x-api-key": "YOUR_TTS_API_KEY",
"Content-Type": "application/json",
},
body: JSON.stringify({ text: "سڵاو، چۆنیت؟", speaker_id: "sorani_1" }),
});
const audio = await res.arrayBuffer(); // audio/wavGood to know
speedfollows the industry convention: higher = faster.- For v4 controls, omitting a field means "use the default". Sending
temperature: 0.0is a hard422. - A v4 response can be
200withgeneration.collapsed: true— treat that as a failed generation (it is not billed).
Speech-to-Text (file upload) — POST /api/stt-proxy
Transcribes an uploaded audio file (WAV/MP3/FLAC/OGG/M4A) sent as multipart/form-data. One credit is debited per successful transcription. Max file size and transcript length depend on your plan (free: 10 MB / 500 chars; starter: 50 MB / unlimited; pro: 100 MB / unlimited).
| Field | Type | Required | Description |
|---|---|---|---|
file | file | Yes | Audio file (WAV/MP3/FLAC/OGG/M4A). |
dialect | "sorani" | "kurmanji" | Yes | Kurdish dialect of the audio. |
Example — cURL
curl -X POST https://www.kurdishtts.com/api/stt-proxy \
-H "x-api-key: YOUR_STT_API_KEY" \
-F "file=@audio.wav" \
-F "dialect=sorani"Example — Python
import requests
resp = requests.post(
"https://www.kurdishtts.com/api/stt-proxy",
headers={"x-api-key": "YOUR_STT_API_KEY"},
files={"file": open("audio.wav", "rb")},
data={"dialect": "sorani"},
)
print(resp.json()["text"])Response JSON includes text, detected_dialect, detected_script, and language. On the free plan a long transcript may be clipped — indicated by truncated: true and truncation_limit.
Speech-to-Text (live streaming) — POST /api/stt-stream-connect
Real-time transcription over a WebSocket, in two steps:
POST /api/stt-stream-connectwith your STT key and{ "dialect": "sorani" }→ returns a temporarywebsocket_url(connect within 5 minutes; it does not carry your key).- Open the WebSocket and stream raw 16-bit PCM, mono, 16 kHz audio as binary frames. Send
{ "type": "control", "event": "finalize" }to flush. The server streams{ "text": "…", "is_final": bool }messages and{ "type": "control", "event": "done" }when complete.
One streaming session is debited per connect. Session limits and max duration depend on your plan (free: 20 sessions / 2 min; starter: 100 / 10 min; pro: 500 / 30 min).
Example — JavaScript (browser)
const API_BASE = "https://www.kurdishtts.com";
const API_KEY = "YOUR_STT_API_KEY";
let ws;
async function connect(dialect) {
const res = await fetch(API_BASE + "/api/stt-stream-connect", {
method: "POST",
headers: { "x-api-key": API_KEY, "Content-Type": "application/json" },
body: JSON.stringify({ dialect }),
});
if (!res.ok) throw new Error((await res.json()).detail || "Failed to connect");
const data = await res.json();
console.log("Sessions remaining:", data.streaming_sessions_remaining);
ws = new WebSocket(data.websocket_url); // temporary URL, no key inside
ws.onopen = () => capture();
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === "control" && msg.event === "done") return;
if (msg.text) console.log(msg.is_final ? "Final:" : "Partial:", msg.text);
};
}
async function capture() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const ctx = new AudioContext({ sampleRate: 16000 }); // 16 kHz required
const source = ctx.createMediaStreamSource(stream);
const processor = ctx.createScriptProcessor(4096, 1, 1);
processor.onaudioprocess = (e) => {
const input = e.inputBuffer.getChannelData(0);
const pcm16 = new Int16Array(input.length); // 16-bit mono PCM
for (let i = 0; i < input.length; i++) {
pcm16[i] = Math.max(-32768, Math.min(32767, input[i] * 32768));
}
if (ws && ws.readyState === WebSocket.OPEN) ws.send(pcm16.buffer);
};
source.connect(processor);
processor.connect(ctx.destination);
}
// Call when the speaker is done:
function finalize() {
if (ws) ws.send(JSON.stringify({ type: "control", event: "finalize" }));
}
connect("sorani");Example — Python
import asyncio, json
import requests, websockets
import numpy as np
import sounddevice as sd
API_BASE = "https://www.kurdishtts.com"
API_KEY = "YOUR_STT_API_KEY"
async def stream_stt(dialect="sorani"):
resp = requests.post(
API_BASE + "/api/stt-stream-connect",
headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
json={"dialect": dialect},
)
resp.raise_for_status()
data = resp.json()
print("Sessions remaining:", data["streaming_sessions_remaining"])
async with websockets.connect(data["websocket_url"]) as ws:
async def receive():
async for message in ws:
msg = json.loads(message)
if msg.get("type") == "control" and msg.get("event") == "done":
return
if "text" in msg:
print("Final:" if msg.get("is_final") else "Partial:", msg["text"])
receiver = asyncio.create_task(receive())
def callback(indata, frames, time, status):
pcm16 = (indata[:, 0] * 32767).astype(np.int16) # 16-bit mono
asyncio.run_coroutine_threadsafe(ws.send(pcm16.tobytes()), asyncio.get_event_loop())
with sd.InputStream(samplerate=16000, channels=1, callback=callback, blocksize=4096):
await asyncio.sleep(30) # record for 30s
await ws.send(json.dumps({"type": "control", "event": "finalize"}))
await receiver
asyncio.run(stream_stt("sorani"))List voices — GET /api/get-speakers
A public, unauthenticated catalog of available voices. Use each returned id as the speaker_id for text-to-speech. Pass ?model_version=v3 or v4 to filter (v3 ≈ 198 voices, v4 ≈ 664 voices).
curl "https://www.kurdishtts.com/api/get-speakers?model_version=v4"Each voice has id, name, dialect (sorani/kurmanji) and gender.
Errors & limits
Successful responses use 200. Common error statuses:
400— bad request (missing field, invalid dialect, file too large, invalid speed).401— missing or invalid API key (check you are using the right key space).403— plan inactive, quota/credits exhausted, or a voice/model not on your plan. Body includeserror,detail, and oftenupgrade_url.422— model validation error (e.g.temperature: 0.0or an unknownmodel_version); thedetailarray carries specifics.
Plans, quotas and prices are on the pricing page. The full machine-readable contract is at /openapi.json.