Plug in all your MCP servers, APIs, and LLM providers. Route everything through a unified endpoint. Aggregate, govern, and control your AI stack.
Features
- MCP Server Aggregation: Connect multiple MCP servers (STDIO, SSE, HTTP) through a single endpoint
- LLM Provider Routing: Unified interface for OpenAI, Anthropic, Google, and other LLM providers
- Context-Aware Tool Search: Intelligent fuzzy search across all connected tools using natural language queries
- Protocol Support: Supports STDIO (subprocess), SSE (Server-Sent Events), and streamable HTTP MCP servers
- Flexible Configuration: TOML-based configuration with environment variable substitution
- Security: Built-in CORS, CSRF protection, OAuth2, and TLS support
- Rate Limiting: Multi-level rate limiting with in-memory or Redis backends
- Docker Ready: Available as a container image with minimal configuration needed
Installation
Quick Install (Linux/Windows (WSL)/macOS)
curl -fsSL https://nexusrouter.com/install | bash
Docker
Pull the latest image:
docker pull ghcr.io/grafbase/nexus:latest
Or use the stable version:
docker pull ghcr.io/grafbase/nexus:stable
Or use a specific version:
docker pull ghcr.io/grafbase/nexus:X.Y.Z
Build from Source
git clone https://github.com/grafbase/nexus
cd nexus
cargo build --release
Running Nexus
Using the Binary
nexus
Using Docker
docker run -p 8000:8000 -v /path/to/config:/etc/nexus.toml ghcr.io/grafbase/nexus:latest
Docker Compose Example
services:
nexus:
image: ghcr.io/grafbase/nexus:latest
ports:
- "8000:8000"
volumes:
- ./nexus.toml:/etc/nexus.toml
environment:
- GITHUB_TOKEN=${GITHUB_TOKEN}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
Configuration
Create a nexus.toml
file to configure Nexus:
# LLM Provider configuration
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
forward_token = true
# Model configuration (at least one model required per provider)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
# MCP Server configuration
[mcp.servers.github]
url = "https://api.githubcopilot.com/mcp/"
auth.token = "{{ env.GITHUB_TOKEN }}"
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/Users/YOUR_USERNAME/Desktop"]
[mcp.servers.python_server]
cmd = ["python", "-m", "mcp_server"]
env = { PYTHONPATH = "/opt/mcp" }
cwd = "/workspace"
Configuration Options
Server Configuration
server.listen_address
: The address and port Nexus will listen on (default:127.0.0.1:8000
)server.health.enabled
: Enable health endpoint (default:true
)server.health.path
: Health check endpoint path (default:/health
)
LLM Configuration
llm.enabled
: Enable LLM functionality (default:true
)llm.path
: LLM endpoint path (default:/llm
)
For detailed LLM provider configuration, see the LLM Provider Configuration section below.
MCP Configuration
mcp.enabled
: Enable MCP functionality (default:true
)mcp.path
: MCP endpoint path (default:/mcp
)mcp.enable_structured_content
: Control MCP search tool response format (default:true
)- When
true
: Uses modernstructuredContent
field for better performance and type safety - When
false
: Uses legacycontent
field withContent::json
objects for compatibility with older MCP clients
- When
MCP Server Types
STDIO Servers: Launch local processes that communicate via standard input/output
[mcp.servers.my_tool] cmd = ["path/to/executable", "--arg1", "--arg2"] # Optional: Set environment variables env = { DEBUG = "1", API_KEY = "{{ env.MY_API_KEY }}" } # Optional: Set working directory cwd = "/path/to/working/directory" # Optional: Configure stderr handling (default: "null") stderr = "inherit" # Show in console # or stderr = { file = "/var/log/mcp/server.log" } # Log to file
Note: STDIO servers must output valid JSON-RPC messages on stdout. The
cmd
array must have at least one element (the executable).SSE Servers: Connect to Server-Sent Events endpoints
[mcp.servers.my_sse_server] protocol = "sse" url = "http://example.com/sse" message_url = "http://example.com/messages" # Optional
HTTP Servers: Connect to streamable HTTP endpoints
[mcp.servers.my_http_server] protocol = "streamable-http" url = "https://api.example.com/mcp"
For remote MCP servers, if you omit the protocol Nexus will first try streamable HTTP and then SSE.
Authentication
Add service token authentication to any server:
[mcp.servers.my_server.auth]
token = "your-token-here"
# Or use environment variables
token = "{{ env.MY_API_TOKEN }}"
If you enable OAuth2 authentication to your server, and your downstream servers all use the same authentication server, you can configure Nexus to forward the request access token to the downstream server.
[mcp.servers.my_server.auth]
type = "forward"
OAuth2 Authentication
Configure OAuth2 authentication to protect your Nexus endpoints:
[server.oauth]
url = "https://your-oauth-provider.com/.well-known/jwks.json"
poll_interval = "5m"
expected_issuer = "https://your-oauth-provider.com"
expected_audience = "your-service-audience"
[server.oauth.protected_resource]
resource = "https://your-nexus-instance.com"
authorization_servers = ["https://your-oauth-provider.com"]
OAuth2 configuration options:
url
: JWKs endpoint URL for token validationpoll_interval
: How often to refresh JWKs (optional, default: no polling)expected_issuer
: Expectediss
claim in JWT tokens (optional)expected_audience
: Expectedaud
claim in JWT tokens (optional)protected_resource.resource
: URL of this protected resourceprotected_resource.authorization_servers
: List of authorization server URLs
When OAuth2 is enabled, all endpoints except /health
and /.well-known/oauth-protected-resource
require valid JWT tokens in the Authorization: Bearer <token>
header.
Rate Limiting
Nexus supports comprehensive rate limiting to prevent abuse and ensure fair resource usage:
# Global rate limiting configuration
[server.rate_limits]
enabled = true
# Storage backend configuration
[server.rate_limits.storage]
type = "memory" # or "redis" for distributed rate limiting
# For Redis backend:
# url = "redis://localhost:6379"
# key_prefix = "nexus:rate_limit:"
# Global rate limit (applies to all requests)
[server.rate_limits.global]
limit = 1000
interval = "60s"
# Per-IP rate limit
[server.rate_limits.per_ip]
limit = 100
interval = "60s"
# Per-MCP server rate limits
[mcp.servers.my_server.rate_limits]
limit = 50
interval = "60s"
# Tool-specific rate limits (override server defaults)
[mcp.servers.my_server.rate_limits.tools]
expensive_tool = { limit = 10, interval = "60s" }
cheap_tool = { limit = 100, interval = "60s" }
Rate Limiting Features:
- Multiple levels: Global, per-IP, per-server, and per-tool limits
- Storage backends: In-memory (single instance) or Redis (distributed)
- Flexible intervals: Configure time windows for each limit
- Tool-specific overrides: Set different limits for expensive operations
Redis Backend Configuration:
[server.rate_limits.storage]
type = "redis"
url = "redis://localhost:6379"
key_prefix = "nexus:rate_limit:"
response_timeout = "1s"
connection_timeout = "5s"
# Connection pool settings
[server.rate_limits.storage.pool]
max_size = 16
min_idle = 0
timeout_create = "5s"
timeout_wait = "5s"
timeout_recycle = "300s"
# TLS configuration for Redis (optional)
[server.rate_limits.storage.tls]
enabled = true
ca_cert_path = "/path/to/ca.crt"
client_cert_path = "/path/to/client.crt" # For mutual TLS
client_key_path = "/path/to/client.key"
# insecure = true # WARNING: Only for development/testing, skips certificate validation
Note: When configuring tool-specific rate limits, Nexus will warn if you reference tools that don't exist.
LLM Token Rate Limiting
Nexus provides token-based rate limiting for LLM providers to help control costs and prevent abuse. Unlike request-based rate limits, token rate limits count an estimate of actual tokens consumed.
Prerequisites
IMPORTANT: LLM rate limiting requires client identification to be enabled:
[server.client_identification]
enabled = true
# Choose identification methods (at least one required)
client_id.jwt_claim = "sub" # Extract ID from JWT 'sub' claim
# or
client_id.http_header = "X-Client-ID" # Extract ID from HTTP header
# Optional: Limit groups per user (at most one allowed)
group_id.jwt_claim = "groups" # JWT claim containing user's group
# or
group_id.http_header = "X-Group-ID" # Extract ID from HTTP header
# You must provide a list of allowed groups
[server.client_identification.validation]
group_values = ["free", "pro", "max"]
Without client identification, rate limits cannot be enforced and requests will fail with a configuration error.
Configuration Hierarchy
Token rate limits can be configured at four levels, from most to least specific:
- Model per user + group: Specific model for specific each user in a group
- Model per user: Specific model for each user
- Provider per user + group: All models from provider for each user in a group
- Provider per user: All models from provider for each user
The most specific applicable limit is always used.
Basic Configuration
# Provider-level default rate limit (applies to all models)
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000 # 100K input tokens
interval = "1m" # Per minute
# Model-specific rate limit (overrides provider default)
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000 # More restrictive for expensive model
interval = "30s"
Group-Based Rate Limits
Configure different limits for user groups (requires group_id
and group_values
in client identification):
# Provider-level group limits
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
enterprise = { input_token_limit = 1000000, interval = "60s" }
# Model-specific group limits (override provider groups)
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
enterprise = { input_token_limit = 500000, interval = "60s" }
The limits are per user, but you can define different limits if the user is part of a specific group. If the user does not belong to any group, they will be assigned to the per-user limits.
Complete Example
# Client identification (REQUIRED for rate limiting)
[server.client_identification]
enabled = true
client_id.jwt_claim = "sub"
group_id.jwt_claim = "subscription_tier"
[server.client_identification.validation]
group_values = ["free", "pro", "enterprise"]
# OpenAI provider with comprehensive rate limiting
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Provider-level defaults
[llm.providers.openai.rate_limits.per_user]
input_token_limit = 100000
interval = "60s"
[llm.providers.openai.rate_limits.per_user.groups]
free = { input_token_limit = 10000, interval = "60s" }
pro = { input_token_limit = 100000, interval = "60s" }
# GPT-4 specific limits (more restrictive)
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-4.rate_limits.per_user]
input_token_limit = 50000
interval = "60s"
[llm.providers.openai.models.gpt-4.rate_limits.per_user.groups]
free = { input_token_limit = 5000, interval = "60s" }
pro = { input_token_limit = 50000, interval = "60s" }
# GPT-3.5 uses provider defaults
[llm.providers.openai.models.gpt-3-5-turbo]
How Token Counting Works
- Input Tokens Only: Rate limiting is based solely on input tokens counted from the request's messages and system prompts
- No Output Buffering: Output tokens and
max_tokens
parameter are NOT considered in rate limit calculations - Pre-check: Input tokens are checked against rate limits before processing
- Token Accumulation: Uses a sliding window algorithm to track usage over time
Note: The rate limiting is designed to be predictable and based only on what the client sends, not on variable output sizes.
Rate Limit Response
When rate limited, the server returns a 429 status code. No Retry-After headers are sent to maintain consistency with downstream LLM provider behavior.
Error Responses
When rate limits are exceeded:
{
"error": {
"message": "Rate limit exceeded: Token rate limit exceeded. Please try again later.",
"type": "rate_limit_error",
"code": 429
}
}
Important Notes
- Per-User Limits: All limits are per individual user/client ID
- No Shared Pools: Currently, there are no shared/global token pools
- Streaming Support: Token counting works with both regular and streaming responses
- Provider Agnostic: Works consistently across all LLM providers
- Validation: Configuration is validated at startup; invalid group names will cause errors
TLS Configuration
Configure TLS for downstream connections:
[mcp.servers.my_server.tls]
verify_certs = true
accept_invalid_hostnames = false
root_ca_cert_path = "/path/to/ca.pem"
client_cert_path = "/path/to/client.pem"
client_key_path = "/path/to/client.key"
LLM Provider Configuration
Nexus provides a unified interface for multiple LLM providers, allowing you to route chat completions through various services with a consistent API.
Enabling LLM Routing
[llm]
enabled = true # Enable LLM functionality (default: true)
path = "/llm" # LLM endpoint path (default: "/llm")
Supported Providers
Nexus currently supports four major LLM providers:
- OpenAI (including OpenAI-compatible APIs)
- Anthropic (Claude models)
- Google (Gemini models)
- AWS Bedrock (Multiple model families via AWS)
Provider Configuration
Configure one or more LLM providers in your nexus.toml
:
OpenAI Provider
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Optional: Use a custom base URL (for Azure OpenAI, proxies, or compatible APIs)
base_url = "https://api.openai.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.openai.models.gpt-4]
# Optional: Rename the model for your users
# rename = "smart-model" # Users will see "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
# Models without rename use their original ID
Anthropic Provider
[llm.providers.anthropic]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://api.anthropic.com/v1" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
[llm.providers.anthropic.models.claude-3-opus-20240229]
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
Google Provider
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
# Optional: Use a custom base URL
base_url = "https://generativelanguage.googleapis.com/v1beta" # Default
# Model Configuration (REQUIRED - at least one model must be configured)
# Note: Model IDs with dots must be quoted in TOML
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models.gemini-pro]
AWS Bedrock Provider
[llm.providers.bedrock]
type = "bedrock"
# Optional: AWS profile to use (defaults to environment settings)
profile = "{{ env.AWS_PROFILE }}"
# Optional: AWS region (defaults to environment or us-east-1)
region = "us-west-2"
# Model Configuration (REQUIRED - at least one model must be configured)
# Bedrock uses model IDs with dots, so they must be quoted
[llm.providers.bedrock.models."anthropic.claude-3-5-sonnet-20241022-v2:0"]
[llm.providers.bedrock.models."anthropic.claude-3-opus-20240229-v1:0"]
[llm.providers.bedrock.models."amazon.nova-micro-v1:0"]
[llm.providers.bedrock.models."meta.llama3-8b-instruct-v1:0"]
[llm.providers.bedrock.models."ai21.jamba-1.5-mini-v1:0"]
# Rename models for simpler access
[llm.providers.bedrock.models.claude-haiku]
rename = "anthropic.claude-3-5-haiku-20241022-v1:0" # Users will access as "bedrock/claude-haiku"
[llm.providers.bedrock.models.jamba-mini]
rename = "ai21.jamba-1.5-mini-v1:0" # Users will access as "bedrock/jamba-mini"
AWS Bedrock provides access to multiple foundation models through a single API. Key features:
- Unified Access: Use models from Anthropic, Amazon, Meta, Cohere, and more through one interface
- AWS Integration: Leverages AWS credentials and IAM for authentication
- Regional Availability: Models may vary by AWS region
- Native Streaming: Full streaming support for all compatible models
Authentication: Bedrock uses standard AWS credential chain:
- Environment variables (
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
) - AWS profile (specified in config or via
AWS_PROFILE
environment variable) - IAM role (when running on EC2/ECS/Lambda)
- AWS SSO credentials
Supported Model Families:
- AI21 Jamba: Jamba 1.5 Mini and Large models with 256K context window
- Anthropic Claude: All Claude 3 models (Opus, Sonnet, Haiku) and Claude Instant
- Amazon Nova: Nova Micro, Lite, Pro models
- Amazon Titan: Titan Text and Embeddings models
- Meta Llama: Llama 2 and Llama 3 models
- Cohere Command: Command and Command R models
- DeepSeek: DeepSeek R1 reasoning models
- Mistral: Mistral 7B and Mixtral models
Model ID Format: Bedrock model IDs follow the pattern provider.model-name-version:revision
, for example:
ai21.jamba-1.5-mini-v1:0
anthropic.claude-3-5-sonnet-20241022-v2:0
amazon.nova-micro-v1:0
meta.llama3-8b-instruct-v1:0
Model Configuration
Each LLM provider requires explicit model configuration. This ensures that only the models you want to expose are available through Nexus.
Basic Model Configuration
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}"
# Each model you want to expose must be explicitly configured
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
Model Renaming
You can rename models to provide custom identifiers for your users:
[llm.providers.openai.models.gpt-4]
rename = "smart-model" # Users will access this as "openai/smart-model"
[llm.providers.openai.models.gpt-3-5-turbo]
rename = "fast-model" # Users will access this as "openai/fast-model"
This is useful for:
- Creating business-friendly model names
- Abstracting away provider-specific model names
- Providing consistent naming across different providers
TOML Syntax for Model IDs
Model IDs that contain dots must be quoted in TOML:
# Correct - dots in model IDs require quotes
[llm.providers.google.models."gemini-1.5-flash"]
[llm.providers.google.models."gemini-1.5-pro"]
# Also correct - no dots, no quotes needed
[llm.providers.google.models.gemini-pro]
Multiple Provider Configuration
You can configure multiple instances of the same provider type with different names:
# Primary OpenAI account
[llm.providers.openai_primary]
type = "openai"
api_key = "{{ env.OPENAI_PRIMARY_KEY }}"
[llm.providers.openai_primary.models.gpt-4]
[llm.providers.openai_primary.models.gpt-3-5-turbo]
# Secondary OpenAI account or Azure OpenAI
[llm.providers.openai_secondary]
type = "openai"
api_key = "{{ env.OPENAI_SECONDARY_KEY }}"
base_url = "https://my-azure-instance.openai.azure.com/v1"
[llm.providers.openai_secondary.models.gpt-4]
rename = "azure-gpt-4" # Distinguish from primary account
# Anthropic
[llm.providers.claude]
type = "anthropic"
api_key = "{{ env.ANTHROPIC_API_KEY }}"
[llm.providers.claude.models.claude-3-opus-20240229]
# Google Gemini
[llm.providers.gemini]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
[llm.providers.gemini.models."gemini-1.5-flash"]
Token Forwarding
Nexus supports token forwarding, allowing users to provide their own API keys at request time instead of using the configured keys. This feature is opt-in and disabled by default.
Configuring Token Forwarding
Enable token forwarding for any provider by setting forward_token = true
:
[llm.providers.openai]
type = "openai"
api_key = "{{ env.OPENAI_API_KEY }}" # Fallback key (optional with token forwarding)
forward_token = true # Enable token forwarding for this provider
[llm.providers.openai.models.gpt-4]
[llm.providers.openai.models.gpt-3-5-turbo]
[llm.providers.anthropic]
type = "anthropic"
# No api_key required when token forwarding is enabled
forward_token = true
[llm.providers.anthropic.models.claude-3-5-sonnet-20241022]
[llm.providers.google]
type = "google"
api_key = "{{ env.GOOGLE_API_KEY }}"
forward_token = false # Explicitly disabled (default)
[llm.providers.google.models."gemini-1.5-flash"]
Using Token Forwarding
When token forwarding is enabled for a provider, users can pass their own API key using the X-Provider-API-Key
header:
# Using your own OpenAI key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-your-openai-key" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}'
# Using your own Anthropic key
curl -X POST http://localhost:8000/llm/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Provider-API-Key: sk-ant-your-anthropic-key" \
-d '{
"model": "anthropic/claude-3-opus-20240229",
"messages": [{"role": "user", "content": "Hello"}]
}'
Token Forwarding Behavior
When token forwarding is enabled (
forward_token = true
):- User-provided keys (via header) take priority
- Falls back to configured key if no header is provided
- Returns 401 error if neither key is available
When token forwarding is disabled (
forward_token = false
, default):- Always uses the configured API key
- Ignores the
X-Provider-API-Key
header - Returns 401 error if no configured key exists
Security Considerations
- OAuth2 Integration: When OAuth2 is enabled, users must still authenticate with Nexus even when using token forwarding
- Key Validation: API keys are validated by the provider's API
- No Logging: User-provided keys are never logged
- HTTPS Recommended: Always use HTTPS in production to protect API keys in transit
Provider Limitations
- AWS Bedrock: Token forwarding is not supported for Bedrock providers. Bedrock uses AWS IAM credentials and request signing, which cannot be provided via simple API key headers. You must configure AWS credentials at the provider level (via environment variables, AWS profile, or explicit credentials in configuration).
Using the LLM API
Once configured, you can interact with LLM providers through Nexus's unified API:
List Available Models
curl http://localhost:8000/llm/models
Response:
{
"object": "list",
"data": [
{
"id": "openai_primary/gpt-4-turbo",
"object": "model",
"created": 1677651200,
"owned_by": "openai"
},
{
"id": "claude/claude-3-5-sonnet-20241022",
"object": "model",
"created": 1709164800,
"owned_by": "anthropic"
},
{
"id": "gemini/gemini-1.5-pro",
"object": "model",
"created": 1710000000,
"owned_by": "google"
}
]
}
Chat Completions
Send a chat completion request using the OpenAI-compatible format:
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai_primary/gpt-4-turbo",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello, how are you?"}
],
"temperature": 0.7,
"max_tokens": 150
}'
The model name format is <provider_name>/<model_id>
. Nexus automatically routes the request to the appropriate provider and transforms the request/response as needed.
Streaming Responses
Nexus supports streaming responses for all LLM providers using Server-Sent Events (SSE):
curl -X POST http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [
{"role": "user", "content": "Write a short poem"}
],
"stream": true,
"max_tokens": 100
}'
When stream: true
is set, the response will be streamed as Server-Sent Events with the following format:
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"role":"assistant","content":"Here"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{"content":" is"}}]}
data: {"id":"msg_123","object":"chat.completion.chunk","created":1234567890,"model":"anthropic/claude-3-5-sonnet-20241022","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":25,"total_tokens":35}}
data: [DONE]
Streaming is supported for all providers (OpenAI, Anthropic, Google) and provides:
- Real-time token delivery as they're generated
- Consistent chunk format across all providers
- Usage statistics in the final chunk
- Standard SSE format compatible with OpenAI SDKs
Provider-Specific Considerations
OpenAI
- Supports all standard OpenAI models (GPT-3.5, GPT-4, etc.)
- Compatible with Azure OpenAI endpoints
- Supports function calling (when available)
- Supports streaming responses with Server-Sent Events (SSE)
Anthropic
- System messages are automatically extracted and placed in the
system
field - Messages must alternate between user and assistant roles
- Requires explicit
max_tokens
parameter (defaults to 4096 if not specified) - Supports all Claude models (Opus, Sonnet, Haiku)
- Supports streaming responses with Server-Sent Events (SSE)
- Assistant role is automatically mapped to "model" role
- System messages are placed in the
systemInstruction
field - Supports Gemini models
- Returns appropriate safety ratings when available
- Supports streaming responses with Server-Sent Events (SSE)
AWS Bedrock
- Automatically detects and routes to appropriate model family (Anthropic, Amazon, Meta, etc.)
- Each model family has its own request/response format, handled transparently
- Uses AWS SDK for authentication and request signing
- Supports all Bedrock features including streaming and model-specific parameters
- Regional endpoint selection based on configuration or AWS defaults
- Model availability varies by AWS region
Using Nexus with LLM SDKs
Nexus provides an OpenAI-compatible API, making it easy to use with existing LLM SDKs and libraries. Simply point the SDK to your Nexus instance instead of the provider's API.
OpenAI SDK (Python)
from openai import OpenAI
# Point to your Nexus instance
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token" # Use a JWT token if OAuth2 is enabled, or any string if not
)
# Use any configured provider/model
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet-20241022",
messages=[
{"role": "user", "content": "Hello!"}
]
)
# Streaming works seamlessly
stream = client.chat.completions.create(
model="openai/gpt-4-turbo",
messages=[
{"role": "user", "content": "Write a poem"}
],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
OpenAI SDK (Node.js/TypeScript)
import OpenAI from 'openai';
// Configure to use Nexus
const openai = new OpenAI({
baseURL: 'http://localhost:8000/llm',
apiKey: 'your-service-token', // Use a JWT token if OAuth2 is enabled, or any string if not
});
// Use any provider through Nexus
const response = await openai.chat.completions.create({
model: 'google/gemini-1.5-pro',
messages: [
{ role: 'user', content: 'Explain quantum computing' }
],
});
// Streaming with any provider
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-3-opus-20240229',
messages: [
{ role: 'user', content: 'Write a story' }
],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
LangChain Integration
from langchain_openai import ChatOpenAI
# Use Nexus as the LLM provider
llm = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="openai/gpt-4-turbo"
)
# Works with any configured provider
claude = ChatOpenAI(
base_url="http://localhost:8000/llm",
api_key="your-service-token", # Use a JWT token if OAuth2 is enabled
model="anthropic/claude-3-5-sonnet-20241022"
)
cURL with jq for Command Line
# Regular completion (with OAuth2 authentication if enabled)
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "openai/gpt-4",
"messages": [{"role": "user", "content": "Hello"}]
}' | jq -r '.choices[0].message.content'
# Streaming with SSE parsing
curl -s http://localhost:8000/llm/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-jwt-token" \
-d '{
"model": "anthropic/claude-3-5-sonnet-20241022",
"messages": [{"role": "user", "content": "Write a haiku"}],
"stream": true
}' | grep "^data: " | sed 's/^data: //' | jq -r 'select(.choices != null) | .choices[0].delta.content // empty'
Authentication with OAuth2
When OAuth2 is enabled in your Nexus configuration, you must provide a valid JWT token:
# With OAuth2 enabled
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="eyJhbGciOiJSUzI1NiIs..." # Your JWT token
)
Without OAuth2, the api_key
field is still required by most SDKs but can be any non-empty string:
# Without OAuth2
client = OpenAI(
base_url="http://localhost:8000/llm",
api_key="dummy" # Any non-empty string works
)
Error Handling
Nexus provides consistent error responses across all providers:
- 400 Bad Request: Invalid request format or parameters
- 401 Unauthorized: Missing or invalid API key
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Provider API error or network issues
Example error response:
{
"error": {
"message": "Invalid model format: expected 'provider/model', got 'invalid-format'",
"type": "invalid_request_error",
"code": 400
}
}
Adding to AI Assistants
Cursor
Add to your Cursor settings:
- Open Cursor Settings (Cmd+, on macOS)
- Search for "Model Context Protocol"
- Enable MCP support
- Add to the MCP server configuration:
{
"nexus": {
"transport": {
"type": "http",
"url": "http://localhost:8000/mcp"
}
}
}
Make sure Nexus is running on localhost:8000
(or adjust the URL accordingly).
Claude Code
Add to your Claude Code configuration:
Open Claude Code and run the command:
claude mcp add --transport http nexus http://localhost:8000/mcp
Or add it to your project's
.mcp.json
file:{ "mcpServers": { "nexus": { "type": "http", "url": "http://localhost:8000/mcp" } } }
Verify the connection:
claude mcp list
Make sure Nexus is running before starting Claude Code.
How It Works
MCP Tool Aggregation
Nexus provides two main tools to AI assistants:
search
: A context-aware tool search that uses fuzzy matching to find relevant tools across all connected MCP serversexecute
: Executes a specific tool with the provided parameters
When an AI assistant connects to Nexus, it can:
- Search for tools using natural language queries
- Discover tool names, descriptions, and required parameters
- Execute tools from any connected MCP server
All tools from downstream servers are namespaced with their server name (e.g., github__search_code
, filesystem__read_file
).
LLM Provider Routing
Nexus acts as a unified gateway for multiple LLM providers:
- Model Discovery: Lists all available models from configured providers with consistent naming
- Request Routing: Automatically routes requests to the correct provider based on model name
- Format Translation: Converts between OpenAI's API format and provider-specific formats
- Response Normalization: Returns consistent response format regardless of provider
Models are namespaced with their provider name (e.g., openai/gpt-4
, anthropic/claude-3-opus-20240229
).
STDIO Server Integration
STDIO servers are spawned as child processes and communicate via JSON-RPC over standard input/output:
- Process Management: Nexus automatically manages the lifecycle of STDIO server processes
- Tool Discovery: Tools from STDIO servers are discovered dynamically and indexed for search
- Error Handling: If a STDIO process crashes or outputs invalid JSON, appropriate errors are returned
- Environment Isolation: Each STDIO server runs in its own process with configurable environment
Example Usage
Once configured, AI assistants can interact with Nexus like this:
Search for tools:
User: "I need to search for code on GitHub" Assistant: Let me search for GitHub-related tools... [Calls search with keywords: ["github", "code", "search"]]
Execute tools:
Assistant: I found the `github__search_code` tool. Let me search for your query... [Calls execute with name: "github__search_code" and appropriate arguments]
Common STDIO Server Examples
Python MCP Server
[mcp.servers.python_tools]
cmd = ["python", "-m", "my_mcp_server"]
env = { PYTHONPATH = "/opt/mcp", PYTHONUNBUFFERED = "1" }
stderr = "inherit" # See Python output during development
Node.js MCP Server
[mcp.servers.node_tools]
cmd = ["node", "mcp-server.js"]
cwd = "/path/to/project"
env = { NODE_ENV = "production" }
Using npx packages
[mcp.servers.filesystem]
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/home/user"]
Troubleshooting STDIO Servers
Server doesn't start
- Check executable path: Ensure the command exists and is executable
- View stderr output: Set
stderr = "inherit"
temporarily to see error messages - Verify JSON-RPC output: The server must output valid JSON-RPC on stdout
- Check working directory: Ensure
cwd
path exists if specified
Tools not appearing
- Wait for initialization: STDIO servers may take a moment to start
- Use search: STDIO tools only appear in search results, not in the base tool list
- Check server logs: Enable stderr logging to see if the server is responding to tool list requests
Security Considerations
- Always use environment variables for sensitive tokens
- Enable TLS verification for production deployments
- Use CORS configuration to restrict access
- Configure OAuth2 authentication for production deployments
- Ensure JWKs URLs use HTTPS in production
- Validate JWT token issuer and audience claims
- Keep your MCP servers and Nexus updated
- Be cautious when running STDIO servers with elevated privileges
- Validate and sanitize any user input passed to STDIO server commands
OAuth2 Security
When using OAuth2 authentication:
- Use HTTPS: Always use HTTPS for JWKs URLs and protected resources in production
- Validate Claims: Configure
expected_issuer
andexpected_audience
to validate JWT claims - Metadata Endpoint: The
/.well-known/oauth-protected-resource
endpoint provides OAuth2 metadata and is publicly accessible - Health Checks: The
/health
endpoint bypasses OAuth2 authentication for monitoring systems
License
Nexus is licensed under the Mozilla Public License 2.0 (MPL-2.0). See the LICENSE file for details.
Contributing
See CONTRIBUTING.md for guidelines on how to contribute to Nexus.
Support
- Documentation: https://nexusrouter.com/docs
- Issues: https://github.com/grafbase/nexus/issues
- Discord: Grafbase Discord