Kurdish TTS & STT API

A simple HTTP API for Kurdish text-to-speech and speech-to-text, covering both Sorani (Central Kurdish) and Kurmanji (Northern Kurdish). Send text, get natural Kurdish speech from 664 voices; or send audio, get an accurate transcript — by file upload or live streaming. There is a free tier, and paid plans start at $5/month.

Machine-readable spec: OpenAPI 3.1 (/openapi.json). Get an API key in Settings → API. See pricing & plans.

Authentication

All endpoints except GET /api/get-speakers require an API key in the x-api-key request header. Base URL: https://www.kurdishtts.com.

TTS and STT use separate keys. A TTS key authenticates the text-to-speech endpoint; an STT key authenticates the speech-to-text endpoints. They are not interchangeable — a TTS key returns 401 against an STT endpoint. Generate both in Settings → API. Keep keys server-side; never ship them in client code.

Text-to-Speech — POST /api/tts-proxy

Converts Kurdish text to speech. Returns audio/wav by default, or JSON with base64 audio and word-level timestamps when include_timestamps is true. The dialect is derived from the speaker_id prefix (sorani_… / kurmanji_…).

FieldTypeRequiredDescription
textstringYesText to synthesize. Max 500 chars (free) / 5000 (paid).
speaker_idstringYesVoice id from /api/get-speakers, e.g. sorani_1, kurmanji_236.
model_version"v3" | "v4"NoDefault v3. Strict — any other value is a 422.
include_timestampsbooleanNoDefault false. true → JSON with base64 audio + word timestamps.
speednumberNo0.25–4.0, higher = faster (industry convention; inverted internally).
temperature / stabilitynumberNov4 only, mutually exclusive. Omit for default — 0.0 is a 422.
seedintegerNov4 only. Reproducible output; echoed as generation.seed_used.
pitch, top_p, repetition_penalty…numberNoOptional v4 / post-processing controls.

Example — cURL

curl -X POST https://www.kurdishtts.com/api/tts-proxy \
  -H "x-api-key: YOUR_TTS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"سڵاو، چۆنیت؟","speaker_id":"sorani_1"}' \
  --output speech.wav

Example — Python

import requests

resp = requests.post(
    "https://www.kurdishtts.com/api/tts-proxy",
    headers={"x-api-key": "YOUR_TTS_API_KEY"},
    json={"text": "سڵاو، چۆنیت؟", "speaker_id": "sorani_1"},
)
with open("speech.wav", "wb") as f:
    f.write(resp.content)

Example — JavaScript

const res = await fetch("https://www.kurdishtts.com/api/tts-proxy", {
  method: "POST",
  headers: {
    "x-api-key": "YOUR_TTS_API_KEY",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ text: "سڵاو، چۆنیت؟", speaker_id: "sorani_1" }),
});
const audio = await res.arrayBuffer(); // audio/wav

Good to know

  • speed follows the industry convention: higher = faster.
  • For v4 controls, omitting a field means "use the default". Sending temperature: 0.0 is a hard 422.
  • A v4 response can be 200 with generation.collapsed: true — treat that as a failed generation (it is not billed).

Speech-to-Text (file upload) — POST /api/stt-proxy

Transcribes an uploaded audio file (WAV/MP3/FLAC/OGG/M4A) sent as multipart/form-data. One credit is debited per successful transcription. Max file size and transcript length depend on your plan (free: 10 MB / 500 chars; starter: 50 MB / unlimited; pro: 100 MB / unlimited).

FieldTypeRequiredDescription
filefileYesAudio file (WAV/MP3/FLAC/OGG/M4A).
dialect"sorani" | "kurmanji"YesKurdish dialect of the audio.

Example — cURL

curl -X POST https://www.kurdishtts.com/api/stt-proxy \
  -H "x-api-key: YOUR_STT_API_KEY" \
  -F "file=@audio.wav" \
  -F "dialect=sorani"

Example — Python

import requests

resp = requests.post(
    "https://www.kurdishtts.com/api/stt-proxy",
    headers={"x-api-key": "YOUR_STT_API_KEY"},
    files={"file": open("audio.wav", "rb")},
    data={"dialect": "sorani"},
)
print(resp.json()["text"])

Response JSON includes text, detected_dialect, detected_script, and language. On the free plan a long transcript may be clipped — indicated by truncated: true and truncation_limit.

Speech-to-Text (live streaming) — POST /api/stt-stream-connect

Real-time transcription over a WebSocket, in two steps:

  1. POST /api/stt-stream-connect with your STT key and { "dialect": "sorani" } → returns a temporary websocket_url (connect within 5 minutes; it does not carry your key).
  2. Open the WebSocket and stream raw 16-bit PCM, mono, 16 kHz audio as binary frames. Send { "type": "control", "event": "finalize" } to flush. The server streams { "text": "…", "is_final": bool } messages and { "type": "control", "event": "done" } when complete.

One streaming session is debited per connect. Session limits and max duration depend on your plan (free: 20 sessions / 2 min; starter: 100 / 10 min; pro: 500 / 30 min).

Example — JavaScript (browser)

const API_BASE = "https://www.kurdishtts.com";
const API_KEY = "YOUR_STT_API_KEY";
let ws;

async function connect(dialect) {
  const res = await fetch(API_BASE + "/api/stt-stream-connect", {
    method: "POST",
    headers: { "x-api-key": API_KEY, "Content-Type": "application/json" },
    body: JSON.stringify({ dialect }),
  });
  if (!res.ok) throw new Error((await res.json()).detail || "Failed to connect");

  const data = await res.json();
  console.log("Sessions remaining:", data.streaming_sessions_remaining);

  ws = new WebSocket(data.websocket_url); // temporary URL, no key inside
  ws.onopen = () => capture();
  ws.onmessage = (event) => {
    const msg = JSON.parse(event.data);
    if (msg.type === "control" && msg.event === "done") return;
    if (msg.text) console.log(msg.is_final ? "Final:" : "Partial:", msg.text);
  };
}

async function capture() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
  const ctx = new AudioContext({ sampleRate: 16000 }); // 16 kHz required
  const source = ctx.createMediaStreamSource(stream);
  const processor = ctx.createScriptProcessor(4096, 1, 1);
  processor.onaudioprocess = (e) => {
    const input = e.inputBuffer.getChannelData(0);
    const pcm16 = new Int16Array(input.length); // 16-bit mono PCM
    for (let i = 0; i < input.length; i++) {
      pcm16[i] = Math.max(-32768, Math.min(32767, input[i] * 32768));
    }
    if (ws && ws.readyState === WebSocket.OPEN) ws.send(pcm16.buffer);
  };
  source.connect(processor);
  processor.connect(ctx.destination);
}

// Call when the speaker is done:
function finalize() {
  if (ws) ws.send(JSON.stringify({ type: "control", event: "finalize" }));
}

connect("sorani");

Example — Python

import asyncio, json
import requests, websockets
import numpy as np
import sounddevice as sd

API_BASE = "https://www.kurdishtts.com"
API_KEY = "YOUR_STT_API_KEY"

async def stream_stt(dialect="sorani"):
    resp = requests.post(
        API_BASE + "/api/stt-stream-connect",
        headers={"x-api-key": API_KEY, "Content-Type": "application/json"},
        json={"dialect": dialect},
    )
    resp.raise_for_status()
    data = resp.json()
    print("Sessions remaining:", data["streaming_sessions_remaining"])

    async with websockets.connect(data["websocket_url"]) as ws:
        async def receive():
            async for message in ws:
                msg = json.loads(message)
                if msg.get("type") == "control" and msg.get("event") == "done":
                    return
                if "text" in msg:
                    print("Final:" if msg.get("is_final") else "Partial:", msg["text"])

        receiver = asyncio.create_task(receive())

        def callback(indata, frames, time, status):
            pcm16 = (indata[:, 0] * 32767).astype(np.int16)  # 16-bit mono
            asyncio.run_coroutine_threadsafe(ws.send(pcm16.tobytes()), asyncio.get_event_loop())

        with sd.InputStream(samplerate=16000, channels=1, callback=callback, blocksize=4096):
            await asyncio.sleep(30)  # record for 30s

        await ws.send(json.dumps({"type": "control", "event": "finalize"}))
        await receiver

asyncio.run(stream_stt("sorani"))

List voices — GET /api/get-speakers

A public, unauthenticated catalog of available voices. Use each returned id as the speaker_id for text-to-speech. Pass ?model_version=v3 or v4 to filter (v3 ≈ 198 voices, v4 ≈ 664 voices).

curl "https://www.kurdishtts.com/api/get-speakers?model_version=v4"

Each voice has id, name, dialect (sorani/kurmanji) and gender.

Errors & limits

Successful responses use 200. Common error statuses:

  • 400 — bad request (missing field, invalid dialect, file too large, invalid speed).
  • 401 — missing or invalid API key (check you are using the right key space).
  • 403 — plan inactive, quota/credits exhausted, or a voice/model not on your plan. Body includes error, detail, and often upgrade_url.
  • 422 — model validation error (e.g. temperature: 0.0 or an unknown model_version); the detail array carries specifics.

Plans, quotas and prices are on the pricing page. The full machine-readable contract is at /openapi.json.