# Agent Friendly World — Full LLM-Friendly Documentation > Credit-based LLM inference gateway. Pay with USDC on Base via x402. OpenAI-compatible API. No accounts, no KYC. ## Base URL - Staging: https://agent-router.gaib.cloud - Local: http://localhost:8080 --- ## AUTHENTICATION ### API Key (for inference + usage) Header: `Authorization: Bearer sk-<64 hex chars>` ### SIWE — Sign-In with Ethereum (for key management) Include `message` (prepared SIWE string) and `signature` (hex) in request body or query params. SIWE messages must have `issuedAt` within the last 5 minutes. Required SIWE fields: domain (gateway hostname), address (EIP-55 checksummed), uri, version "1", chainId, nonce, issuedAt. ### x402 Payment (for top-up) No pre-auth needed. The payment signature itself proves the payer. Uses ERC-3009 transferWithAuthorization on USDC (Base chain). --- ## ENDPOINTS ### GET /health No auth. Response: `{"status":"ok"}` ### GET /v1/models No auth. Lists all available models across all registered providers. Response shape: ```json { "object": "list", "data": [ { "id": "gemini/gemini-2.5-flash", "name": "gemini-2.5-flash", "provider": "gemini", "contextLength": 1048576, "promptPricePer1MTokens": 0.15, "completionPricePer1M": 0.60 } ] } ``` Prices are USD per 1M tokens (downstream cost). Gateway applies 20% markup on top. ### POST /v1/topup Two-phase x402 payment flow. Phase 1 — Discover requirements: Request body: `{"amount": 5}` (USD, minimum $1, maximum $10000) Response 402: ```json { "accepts": [{ "scheme": "exact", "network": "eip155:8453", "maxAmountRequired": "5000000", "asset": "0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913", "payTo": "0x...", "extra": {"name": "USDC", "version": "1", "decimals": 6} }] } ``` Phase 2 — Submit signed payment: Same request body, add header: `X-Payment: ` or `Payment-Signature: ` Server verifies via CDP facilitator, credits balance, settles on-chain async. Response 200: ```json {"balance_usdc": 5000000, "credited_usdc": 5000000} ``` Database operations: - UPSERT wallets: creates wallet if new, adds to balance_usdc if existing - INSERT topups: records transaction with tx_hash ### POST /v1/auth/keys Create API key. SIWE auth required in body. Request: ```json {"message": "", "signature": "0x...", "label": "my-app"} ``` Response 201: ```json {"id": 1, "key": "sk-a3f9e2b1c4d5...", "label": "my-app", "created_at": "2026-04-07T12:00:00.000Z"} ``` IMPORTANT: The `key` field is returned ONLY ONCE. It is SHA-256 hashed before storage. Key format: `sk-` followed by 64 random hex characters. ### GET /v1/auth/keys List non-revoked API keys for wallet. SIWE via query params. Request: `GET /v1/auth/keys?message=&signature=` Response: ```json {"data": [{"id": 1, "label": "my-app", "created_at": "...", "revoked_at": null}]} ``` ### DELETE /v1/auth/keys/:key_id Revoke API key. SIWE auth in body. Request body: `{"message": "", "signature": "0x..."}` Response: `{"revoked": true}` Sets revoked_at timestamp. Revoked keys cannot authenticate. ### POST /v1/chat/completions OpenAI-compatible chat completion. Requires Bearer API key. Request: ```json { "model": "gemini/gemini-2.5-flash", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"} ], "max_tokens": 512, "temperature": 0.7, "top_p": 1.0, "stream": false } ``` Supported parameters: model, messages, max_tokens (default 1024), temperature, top_p, stream, presence_penalty, frequency_penalty. NOT SUPPORTED: `thinking`, `reasoning_effort` — returns 400 VALIDATION_ERROR. Non-streaming response: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "choices": [{ "index": 0, "message": {"role": "assistant", "content": "Hello! How can I help?"}, "finish_reason": "stop" }], "usage": {"prompt_tokens": 20, "completion_tokens": 9, "total_tokens": 29} } ``` Streaming response (SSE): each line is `data: \n\n`, final line is `data: [DONE]\n\n`. Delta format: `{"id":"...","choices":[{"delta":{"content":"token"}}]}` Final chunk includes `usage` stats. ### GET /v1/usage/:wallet Balance and per-key usage breakdown. Auth: Bearer API key OR SIWE query params. Authenticated wallet must match :wallet param. Response: ```json { "wallet": "0xabc...", "balance_usdc": 4950000, "locked_usdc": 0, "available_usdc": 4950000, "keys": [{ "api_key_id": 1, "label": "my-app", "request_count": 3, "total_prompt_tokens": 66, "total_completion_tokens": 174, "total_charged_usdc": 50000, "total_platform_revenue_usd": 0.008 }] } ``` --- ## MODEL ID FORMAT Pattern: `/` The gateway splits on the first `/` to determine the provider. Examples: `gemini/gemini-2.5-flash`, `openrouter/anthropic/claude-3-5-sonnet`, `kimi/kimi-k2.5`, `minimax/MiniMax-M2.5`, `local/llama-3` ## PROVIDERS AND PRICING All prices USD per 1M tokens. Gateway applies 20% markup. ### gemini (Google Gemini) API: https://generativelanguage.googleapis.com/v1beta/openai | Model | Prompt | Completion | Context | |-------|--------|------------|---------| | gemini-2.5-pro | $1.25 | $10.00 | 1M | | gemini-2.5-flash | $0.15 | $0.60 | 1M | | gemini-2.5-flash-lite | $0.10 | $0.40 | 1M | | gemini-2.0-flash | $0.10 | $0.40 | 1M | | gemini-1.5-pro | $1.25 | $5.00 | 2M | | gemini-1.5-flash | $0.075 | $0.30 | 1M | ### kimi (Moonshot AI) API: https://api.moonshot.ai/v1 | Model | Prompt | Completion | Context | Notes | |-------|--------|------------|---------|-------| | kimi-k2.5 | $0.60 | $3.00 | 262k | ignores temperature, top_p, penalties | | moonshot-v1-8k | $0.20 | $2.00 | 8k | | | moonshot-v1-32k | $1.00 | $3.00 | 32k | | | moonshot-v1-128k | $2.00 | $5.00 | 131k | | ### minimax (MiniMax) API: https://api.minimax.io/v1 | Model | Prompt | Completion | Context | Notes | |-------|--------|------------|---------|-------| | MiniMax-M2.7 | $0.30 | $1.20 | 204k | ignores presence/frequency penalty | | MiniMax-M2.5 | $0.118 | $0.95 | 196k | | | MiniMax-M2 | $0.255 | $1.00 | 196k | | | MiniMax-M1 | $0.40 | $1.76 | 1M | | | MiniMax-Text-01 | $0.20 | $1.10 | 1M | | ### openrouter API: https://openrouter.ai/api/v1 400+ models. Pricing fetched dynamically. Model IDs: `openrouter/`. ### local Self-hosted Ollama/vLLM. Free ($0/$0). Context default 4096. --- ## UNITS AND CONVERSIONS - balance_usdc, locked_usdc, available_usdc, total_charged_usdc, credited_usdc, amount_usdc: micro-USDC (6 decimal places). Divide by 1_000_000 to get USD. $1 = 1_000_000. - amount in POST /v1/topup body: USD float (e.g. 5 = $5). - promptPricePer1MTokens, completionPricePer1M: USD per 1 million tokens. --- ## ERROR RESPONSES Shape: `{"error": "CODE", "message": "Human readable"}` | HTTP | Code | Description | Balance impact | |------|------|-------------|---------------| | 400 | VALIDATION_ERROR | Bad params, unknown model, thinking/reasoning_effort | No charge | | 401 | UNAUTHORIZED | Bad API key or SIWE | No charge | | 402 | INSUFFICIENT_BALANCE | Cannot cover estimated cost | No charge | | 404 | NOT_FOUND | Resource not found | No charge | | 429 | RATE_LIMITED | Per-wallet limit hit | No charge, includes retry_after_ms | | 500 | INTERNAL_ERROR | Server error | No charge | | 502 | UPSTREAM_ERROR | LLM provider failed | Lock released, no charge | 402 Insufficient balance includes: `{"balance_usdc": 1000, "required_usdc": 5000}` 429 Rate limited includes: `{"retry_after_ms": 15000}` --- ## COST ESTIMATION The gateway reserves an upper-bound estimate of the request cost (based on input size and `max_tokens`) before calling the LLM. After completion, you are charged only the actual cost; the reserved remainder is released. Actual cost is always <= estimate. A 20% markup is applied on top of upstream provider prices. --- ## RATE LIMITING Per-wallet sliding window. Default limits: 60 requests/minute, 10000 requests/day. Exceeding returns 429 with `retry_after_ms`. --- ## QUICK START CODE ```typescript import OpenAI from 'openai' const client = new OpenAI({ baseURL: 'https://agent-router.gaib.cloud/v1', apiKey: 'sk-your-api-key', }) // Non-streaming const res = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Hello' }], }) console.log(res.choices[0].message.content) // Streaming const stream = await client.chat.completions.create({ model: 'gemini/gemini-2.5-flash', messages: [{ role: 'user', content: 'Tell me a story' }], stream: true, }) for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? '') } ```