Table of Contents
Authentication
Assisters uses API keys for authentication. Include your key in every request via the Authorization header using the Bearer scheme.
curl https://api.assisters.com/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
Key Management
- Create Keys: POST
/v1/keys
{ "name": "dev-key-01" }
- Rotate Keys: DELETE
/v1/keys/{key_id}then create a new one. - Rate Limits: 1000 requests per minute per key. Exceeding this returns
HTTP 429.
Best Practices
- Store keys in environment variables (never in code).
- Use separate keys for development, staging, and production.
- Rotate keys every 90 days or after personnel changes.
Core Endpoints
Models
List available AI models and their capabilities.
Request
GET /v1/models
Response
{
"models": [
{
"id": "gpt-4.1-mini",
"name": "GPT-4.1 Mini",
"max_tokens": 128000,
"supports": ["chat", "embeddings", "reasoning"]
}
]
}
Use Case: Select a model based on token limits or supported features.
Chat Completions
Generate AI responses for chat interactions.
Request
POST /v1/chat/completions
Body
{
"model": "gpt-4.1-mini",
"messages": [
{ "role": "user", "content": "Explain quantum computing." }
],
"temperature": 0.7,
"max_tokens": 1000
}
Response
{
"choices": [
{
"message": {
"role": "assistant",
"content": "Quantum computing..."
}
}
]
}
Parameters
model: Required. Specify the model ID.messages: Required. Array of{ role, content }pairs (e.g.,user,assistant).temperature: Float (0–1). Lower = more deterministic.max_tokens: Integer. Maximum response length.
Streaming Responses
Set stream: true to receive chunks as they’re generated.
fetch("https://api.assisters.com/v1/chat/completions", {
method: "POST",
headers: { "Authorization": "Bearer YOUR_KEY", "Content-Type": "application/json" },
body: JSON.stringify({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }], stream: true })
});
Embeddings
Convert text into vector embeddings for semantic search or clustering.
Request
POST /v1/embeddings
Body
{
"model": "text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog."
}
Response
{
"embedding": [0.0012, -0.0045, ..., 0.0078],
"model": "text-embedding-3-small",
"usage": { "tokens": 12 }
}
Use Cases
- Semantic search in document databases.
- Clustering user queries for analytics.
- Input for machine learning models.
Advanced Features
Tools
Extend chat completions with function calling for real-world integrations.
Request
{
"model": "gpt-4.1-mini",
"messages": [{ "role": "user", "content": "What’s the weather in Paris?" }],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": { "type": "string" }
}
}
}
}
]
}
Response
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_123",
"type": "function",
"function": { "name": "get_weather", "arguments": "{\"location\": \"Paris\"}" }
}]
}
}]
}
Handling Tool Calls
- Parse the
tool_callsarray. - Execute the named function with provided arguments.
- Return results via a new message:
{
"role": "tool",
"content": "{\"temperature\": 15, \"unit\": \"C\"}",
"tool_call_id": "call_123"
}
Supported Tools
web_search: Real-time web search.code_interpreter: Execute Python code.- Custom tools via the
toolsparameter.
Reasoning
Enable step-by-step problem-solving for complex queries.
Request
{
"model": "gpt-4.1-mini",
"messages": [{ "role": "user", "content": "Solve 2x + 3 = 7." }],
"reasoning": true,
"max_tokens": 2000
}
Response
{
"choices": [{
"message": {
"role": "assistant",
"content": "Step 1: Subtract 3 from both sides → 2x = 4.
Step 2: Divide by 2 → x = 2.",
"reasoning": "Derived from algebraic manipulation."
}
}]
}
Use Cases
- Debugging code.
- Mathematical proofs.
- Multi-step decision making.
Error Handling
Assisters uses standard HTTP status codes. Key errors:
| Code | Error Type | Example |
|---|---|---|
| 400 | Bad Request | Missing model parameter. |
| 401 | Unauthorized | Invalid API key. |
| 404 | Not Found | Unknown model ID. |
| 429 | Too Many Requests | Rate limit exceeded. |
| 500 | Internal Server Error | Model inference failed. |
Error Response Format
{
"error": {
"type": "invalid_request_error",
"message": "Model not found.",
"param": "model",
"code": "model_not_found"
}
}
Retry Logic
- For
429, implement exponential backoff (e.g., 1s, 2s, 4s). - For
500, retry up to 3 times with jitter (e.g., +0.5s).
SDKs and Libraries
Official SDKs
- Python:
pip install assistents
from assistents import Assisters
client = Assisters(api_key="YOUR_KEY")
response = client.chat.completions.create(model="gpt-4.1-mini", messages=[{"role": "user", "content": "Hello"}])
print(response.choices[0].message.content)
- Node.js:
npm install @assisters/sdk
import Assisters from '@assisters/sdk';
const client = new Assisters({ apiKey: "YOUR_KEY" });
const response = await client.chat.completions.create({ model: "gpt-4.1-mini", messages: [{ role: "user", content: "Hello" }] });
console.log(response.choices[0].message.content);
Community Libraries
- Go:
github.com/assisters/go-sdk - Ruby:
gem assistents-ruby
Webhooks
Subscribe to real-time events (e.g., chat completions, errors).
Setup
- Create Hook: POST
/v1/webhooks
{
"url": "https://your-server.com/events",
"events": ["chat.completion", "model.failed"]
}
- Verify Endpoint: Respond to
GET /webhooks/verifywith a challenge token. - Receive Events: Assisters sends HTTP POST requests with payloads like:
{
"event": "chat.completion",
"data": { "id": "chat_123", "status": "completed" }
}
Security
- Validate webhook signatures using a shared secret.
- Use HTTPS for the endpoint URL.
Performance Optimization
Caching
- Cache embeddings for repeated queries:
from cachetools import cached, TTLCache
cache = TTLCache(maxsize=1000, ttl=3600)
@cached(cache)
def get_embedding(text):
response = client.embeddings.create(model="text-embedding-3-small", input=text)
return response.embedding
- Use Redis or Memcached for distributed caching.
Batch Processing
- Embed multiple texts in one request:
{
"model": "text-embedding-3-small",
"input": ["text 1", "text 2", "text 3"]
}
Model Selection
- Use smaller models for low-latency tasks (e.g.,
gpt-4.1-miniinstead ofgpt-4.1-ultra).
Compliance and Security
Data Handling
- GDPR/CCPA: Delete data via
DELETE /v1/data/{id}. - Encryption: All data in transit uses TLS 1.3. Data at rest is encrypted.
- PII Redaction: Use
mask: truein requests to redact personally identifiable information.
Audit Logs
Access logs via GET /v1/audit?start=2024-01-01&end=2024-01-31.
Migration Guide
From v1 Legacy API
- Update endpoints:
/v1/completions→/v1/chat/completions
- Replace
promptwithmessagesarray:
- { "prompt": "Hello" }
+ { "messages": [{ "role": "user", "content": "Hello" }] }
- Use new models (e.g.,
gpt-4.1-miniinstead ofgpt-3.5-turbo).
Breaking Changes in v2
temperaturenow defaults to1.0(was0.5).max_tokensincludes response tokens (previously excluded).
Best Practices for Developers
- Idempotency: Use the
idempotency-keyheader for retries:
POST /v1/chat/completions
Idempotency-Key: abc123
- Monitoring: Track latency and error rates with
/v1/metrics. - Fallbacks: Implement a secondary model for high-priority tasks.
- Testing: Use
/v1/models/{model}/testfor canary deployments.
Assisters’ API empowers you to integrate AI seamlessly into your applications, whether you’re building chatbots, search engines, or automation tools. By leveraging the endpoints, tools, and optimizations outlined here, you can reduce development time from weeks to minutes while ensuring scalability and reliability. Start with the quickstart guide and experiment with the interactive playground to see what’s possible. The future of AI-assisted development is here—build it today.
