Text-to-Speech API

Speech in
31 languages.
One API call.

Production-grade multilingual TTS at a fraction of the cost of ElevenLabs or Google. On-device ONNX inference — no GPU, no third-party data sharing, sub-10s latency on warm requests.

Quick start

# Synthesise speech in any language
curl -X POST \
  https://api.narrateai.dev/v1/synthesise \
  -H "Authorization: Bearer nai_your_key" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Bonjour le monde",
    "lang": "fr",
    "voice": "M1"
  }'

# Response
{
  "url": "https://...",
  "duration": 1.84,
  "chars_used": 17,
  "request_id": "req_..."
}
31×
Languages supported
~7s
Warm request latency
99M
Parameter model
Cheaper than ElevenLabs
01

Built for developers

//

Simple REST API

One POST endpoint. Text in, presigned audio URL out. No SDKs to install, no complex auth flows — just an API key in the Authorization header.

31

Broad language coverage

English, French, German, Japanese, Arabic, Hindi and 25 more. Pass a two-letter ISO code and get native-quality synthesis with no language-specific pricing.

Expression tags

Inline <laugh>, <breath>, and <sigh> tags give you natural prosody for conversational content without post-processing.

Low COGS, honest pricing

We run ONNX inference on ARM64 Lambda — no GPU fleet, no cloud TTS markup. Savings passed directly to you at $0.05/1k chars.

Usage metering built in

Every response includes chars_used and a request_id. Query your usage at any time. Monthly quotas enforced per key so you never get surprise bills.

Audio hosted for you

Generated WAV files stored in S3 with a 1-hour presigned URL. Audio is automatically purged after 7 days. Bring your own storage coming soon.

02

Try it live

03

API reference

POST /v1/synthesise

Synthesise speech

Convert text to speech. Returns a presigned S3 URL valid for 1 hour. Audio is stored as WAV and purged after 7 days.

Request headers

HeaderValue
Authorization Bearer nai_your_key required
Content-Type application/json

Request body

ParameterTypeDescription
text string required Text to synthesise. Max 5,000 characters. Supports <laugh>, <breath>, <sigh> expression tags.
lang string optional ISO 639-1 language code. Default: en. See full language list below.
voice string optional Voice style name. Default: M1.

Response

{
  "url":        "https://s3.amazonaws.com/...",
  "duration":   3.42,
  "chars_used": 84,
  "request_id": "a0bb225e-aa77-4990-bf92"
}

Supported languages

Pass the two-letter ISO 639-1 code in the lang field.

CodeLanguageCodeLanguage
enEnglishkoKorean
frFrenchjaJapanese
deGermanarArabic
esSpanishhiHindi
ptPortugueseruRussian
itItaliannlDutch
plPolishtrTurkish
svSwedishukUkrainian
viVietnameseidIndonesian

Error codes

StatusCodeMeaning
401UnauthorizedMissing or invalid API key
400Bad RequestMissing text, unsupported language, or text exceeds 5,000 chars
429Quota ExceededMonthly character quota reached for your plan
500Server ErrorSynthesis failed — retry with exponential backoff

Rate limits

The API is rate limited at 50 requests/second globally with a burst of 100. Per-key limits apply based on your plan tier. If you need higher throughput, contact us.

04

Simple, honest pricing

Starter

Free

500k chars / month

  • ~8 hours of audio
  • All 31 languages
  • 1 API key
  • Community support

Scale

$149/mo

10M chars / month

  • ~160 hours of audio
  • All 31 languages
  • 10 API keys
  • Usage dashboard + API
  • Priority support

Over 10M chars/month? Talk to us about volume pricing.