Speech AI Examples
Production-ready examples for integrating Brainiall Speech AI APIs into your applications and AI agents.
APIs
| API | Model Size | What It Does |
|---|---|---|
| Pronunciation Assessment | 17 MB | Scores pronunciation accuracy at word and phoneme level |
| Speech-to-Text (STT) | 17 MB (shared) | Transcribes audio with word-level timestamps and confidence |
| Text-to-Speech (TTS) | 115 MB | Generates natural speech from text, 12 English voices (#1 TTS Arena) |
All three models combined weigh under 150 MB and run on CPU. No GPU required. STT and Pronunciation share the same compact 17MB model.
Quick Start
1. Get an API Key
Subscribe on the Azure Marketplace or contact us at [email protected].
2. Set Your Key
export SPEECH_AI_API_KEY="your-subscription-key"
3. Run an Example
Python:
pip install httpx
python python/basic_usage.py
JavaScript (Node.js 18+):
node javascript/basic_usage.js
curl:
bash curl/examples.sh
Examples
| File | Description |
|---|---|
python/basic_usage.py |
All 3 APIs in one script — assess, transcribe, synthesize |
python/pronunciation_tutor.py |
Interactive pronunciation tutor using all 3 APIs together |
javascript/basic_usage.js |
Node.js examples for all 3 APIs |
curl/examples.sh |
curl commands for every endpoint |
mcp/claude-desktop-config.json |
MCP config for Claude Desktop |
mcp/cursor-config.json |
MCP config for Cursor IDE |
MCP Integration
These APIs are available as MCP servers for AI agents and IDE integrations:
| Platform | URL | Pricing |
|---|---|---|
| Smithery | pronunciation-assessment | Free (discovery) |
| MCPize | pronunciation-assessment | $9.99/mo |
| Apify | pronunciation-assessment-mcp | $0.02/call |
See the mcp/ directory for configuration examples.
Marketplaces
| Marketplace | Status | Link |
|---|---|---|
| Azure Marketplace | Live | View Listing |
| AWS Marketplace | Coming Soon | — |
API Reference
Base URL
https://apim-ai-apis.azure-api.net
Authentication
All requests require the Ocp-Apim-Subscription-Key header:
Ocp-Apim-Subscription-Key: your-key-here
Pronunciation Assessment
POST /pronunciation/assess/base64
Content-Type: application/json
{
"audio": "<base64-encoded-wav>",
"text": "hello world",
"format": "wav"
}
Response:
{
"overallScore": 85.5,
"words": [
{
"word": "hello",
"score": 90.0,
"phonemes": [
{"phoneme": "HH", "score": 95.0},
{"phoneme": "AH", "score": 85.0},
{"phoneme": "L", "score": 92.0},
{"phoneme": "OW", "score": 88.0}
]
}
]
}
Speech-to-Text
POST /stt/transcribe/base64
Content-Type: application/json
{
"audio": "<base64-encoded-wav>",
"include_timestamps": true
}
Response:
{
"text": "hello world",
"language": "en",
"words": [
{"word": "hello", "start": 0.0, "end": 0.45},
{"word": "world", "start": 0.50, "end": 0.95}
]
}
Text-to-Speech
POST /tts/synthesize
Content-Type: application/json
{
"text": "Hello, welcome to Speech AI.",
"voice": "af_heart",
"speed": 1.0,
"format": "wav"
}
Response: Binary WAV audio data.
Available TTS Voices
GET /tts/voices
Health Checks
GET /pronunciation/health
GET /stt/health
GET /tts/health
Try It Live
The HuggingFace Demo lets you test pronunciation assessment directly in your browser — no API key needed.
License
MIT — Brainiall