Chat API
POST /chat
Send messages to AI models and receive streaming text responses for conversational AI and content generation
POST
Overview
The POST /chat endpoint is the primary way to interact with SundayPyjamas AI models. It accepts a conversation history and returns an AI-generated response as a streaming text.All responses are streamed in real-time for better user experience in conversational applications.
Authentication
Bearer token with your API key. Format:
Bearer spj_ai_your_api_key_hereRequest Body
Array of conversation messages. Must contain at least one message with valid content.
AI model to use for generating the response. Currently available models:
llama-3.3-70b-versatile- High-quality general-purpose model (default)
Response
Headers
The response includes the following headers for streaming:Body
Streaming text response from the AI model. The complete response is built by concatenating all streamed chunks.
Examples
Basic Request
With System Message
Multi-turn Conversation
Specifying Model
Content Generation
Error Responses
400 Bad Request
401 Unauthorized
403 Forbidden
429 Too Many Requests
500 Internal Server Error
Streaming Implementation
JavaScript/TypeScript
Python
Go
Request Validation
Message Array Requirements
Valid Message Structure
Valid Message Structure
Invalid Examples
Invalid Examples
Content Length Limits
Practical limits:- Single message: ~10,000 characters recommended
- Total conversation: ~20,000 characters for optimal performance
- Token estimation: ~4 characters per token
Rate Limiting Details
Token Usage
Input + output tokens count toward workspace limits
Request Rate
No hard limits, but monitored for abuse
Concurrent Requests
Multiple simultaneous requests supported
Fair Usage
Excessive usage may be throttled
Optimization Tips
Testing with Different Tools
Postman
HTTPie
Insomnia
Performance Considerations
Response Times
Typical response times vary based on:- Request complexity: Simple queries respond faster
- Response length: Longer responses take more time
- Server load: Peak times may have slightly longer latencies
- Simple queries: 1-3 seconds
- Complex content generation: 3-10 seconds
- Very long responses: 10-30 seconds
Best Practices
Optimize for Speed
Optimize for Speed
- Use specific, focused prompts
- Limit conversation history to relevant context
- Request shorter responses when appropriate
- Use streaming to show progress to users
Handle Timeouts
Handle Timeouts
Connection Management
Connection Management
- Reuse HTTP connections when possible
- Implement proper connection pooling
- Handle network interruptions gracefully
- Use appropriate timeouts for your use case
Next Steps
Code Examples
See complete implementation examples in multiple languages
Error Handling
Learn comprehensive error handling patterns
Rate Limits
Understand usage optimization and monitoring
Authentication
Manage API keys and security best practices

