Speech AI Examples

License: MIT

Production-ready examples for integrating Brainiall Speech AI APIs into your applications and AI agents.

APIs

API	Model Size	What It Does
Pronunciation Assessment	17 MB	Scores pronunciation accuracy at word and phoneme level
Speech-to-Text (STT)	17 MB (shared)	Transcribes audio with word-level timestamps and confidence
Text-to-Speech (TTS)	115 MB	Generates natural speech from text, 12 English voices (#1 TTS Arena)

All three models combined weigh under 150 MB and run on CPU. No GPU required. STT and Pronunciation share the same compact 17MB model.

Quick Start

1. Get an API Key

Subscribe on the Azure Marketplace or contact us at [email protected].

2. Set Your Key

export SPEECH_AI_API_KEY="your-subscription-key"

3. Run an Example

Python:

pip install httpx
python python/basic_usage.py

JavaScript (Node.js 18+):

node javascript/basic_usage.js

curl:

bash curl/examples.sh

Examples

File	Description
`python/basic_usage.py`	All 3 APIs in one script — assess, transcribe, synthesize
`python/pronunciation_tutor.py`	Interactive pronunciation tutor using all 3 APIs together
`javascript/basic_usage.js`	Node.js examples for all 3 APIs
`curl/examples.sh`	curl commands for every endpoint
`mcp/claude-desktop-config.json`	MCP config for Claude Desktop
`mcp/cursor-config.json`	MCP config for Cursor IDE

MCP Integration

These APIs are available as MCP servers for AI agents and IDE integrations:

Platform	URL	Pricing
Smithery	pronunciation-assessment	Free (discovery)
MCPize	pronunciation-assessment	$9.99/mo
Apify	pronunciation-assessment-mcp	$0.02/call

See the mcp/ directory for configuration examples.

Marketplaces

Marketplace	Status	Link
Azure Marketplace	Live	View Listing
AWS Marketplace	Coming Soon	—

API Reference

Base URL

https://apim-ai-apis.azure-api.net

Authentication

All requests require the Ocp-Apim-Subscription-Key header:

Ocp-Apim-Subscription-Key: your-key-here

Pronunciation Assessment

POST /pronunciation/assess/base64
Content-Type: application/json

{
  "audio": "<base64-encoded-wav>",
  "text": "hello world",
  "format": "wav"
}

Response:

{
  "overallScore": 85.5,
  "words": [
    {
      "word": "hello",
      "score": 90.0,
      "phonemes": [
        {"phoneme": "HH", "score": 95.0},
        {"phoneme": "AH", "score": 85.0},
        {"phoneme": "L", "score": 92.0},
        {"phoneme": "OW", "score": 88.0}
      ]
    }
  ]
}

Speech-to-Text

POST /stt/transcribe/base64
Content-Type: application/json

{
  "audio": "<base64-encoded-wav>",
  "include_timestamps": true
}

Response:

{
  "text": "hello world",
  "language": "en",
  "words": [
    {"word": "hello", "start": 0.0, "end": 0.45},
    {"word": "world", "start": 0.50, "end": 0.95}
  ]
}

Text-to-Speech

POST /tts/synthesize
Content-Type: application/json

{
  "text": "Hello, welcome to Speech AI.",
  "voice": "af_heart",
  "speed": 1.0,
  "format": "wav"
}

Response: Binary WAV audio data.

Available TTS Voices

GET /tts/voices

Health Checks

GET /pronunciation/health
GET /stt/health
GET /tts/health

Try It Live

The HuggingFace Demo lets you test pronunciation assessment directly in your browser — no API key needed.

License

MIT — Brainiall

Speech AI Examples