Deploy the Perplexity MCP Server to cloud platforms for scalable, production-ready environments with high availability and automated management.
Overview
Cloud deployment allows you to:
Scale horizontally with multiple instances
Leverage managed infrastructure
Implement automated deployments
Benefit from cloud provider reliability
Integrate with cloud-native monitoring
General Requirements
All cloud deployments require:
Environment Variables
PERPLEXITY_API_KEY (required)
PORT for HTTP server
ALLOWED_ORIGINS for CORS
Networking
HTTP server listening on configured port
Health check endpoint at /health
MCP endpoint at /mcp
AWS Elastic Container Service (ECS)
Build and push Docker image
Build your Docker image and push to Amazon ECR: # Authenticate to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin < account-i d > .dkr.ecr.us-east-1.amazonaws.com
# Build and tag
docker build -t perplexity-mcp-server .
docker tag perplexity-mcp-server:latest < account-i d > .dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest
# Push to ECR
docker push < account-i d > .dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest
Create ECS task definition
Define your container with environment variables: {
"family" : "perplexity-mcp-server" ,
"containerDefinitions" : [
{
"name" : "perplexity-mcp" ,
"image" : "<account-id>.dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest" ,
"portMappings" : [
{
"containerPort" : 8080 ,
"protocol" : "tcp"
}
],
"environment" : [
{ "name" : "PORT" , "value" : "8080" },
{ "name" : "BIND_ADDRESS" , "value" : "0.0.0.0" },
{ "name" : "ALLOWED_ORIGINS" , "value" : "https://your-app.com" }
],
"secrets" : [
{
"name" : "PERPLEXITY_API_KEY" ,
"valueFrom" : "arn:aws:secretsmanager:us-east-1:<account-id>:secret:perplexity-api-key"
}
],
"healthCheck" : {
"command" : [ "CMD-SHELL" , "curl -f http://localhost:8080/health || exit 1" ],
"interval" : 30 ,
"timeout" : 5 ,
"retries" : 3
}
}
]
}
Deploy to ECS
Create service with load balancer integration for high availability.
Google Cloud Run
Build and deploy
Deploy directly from source or Docker image: # Deploy from source
gcloud run deploy perplexity-mcp-server \
--source . \
--region us-central1 \
--set-env-vars PORT= 8080 \
--set-env-vars ALLOWED_ORIGINS=https://your-app.com \
--set-secrets PERPLEXITY_API_KEY=perplexity-api-key:latest
Configure health checks
Cloud Run automatically uses the /health endpoint: gcloud run services update perplexity-mcp-server \
--region us-central1 \
--no-use-http2
Azure Container Instances
az container create \
--resource-group myResourceGroup \
--name perplexity-mcp-server \
--image < registr y > .azurecr.io/perplexity-mcp-server:latest \
--dns-name-label perplexity-mcp \
--ports 8080 \
--environment-variables \
PORT= 8080 \
BIND_ADDRESS= 0.0.0.0 \
ALLOWED_ORIGINS=https://your-app.com \
--secure-environment-variables \
PERPLEXITY_API_KEY=your_key_here
Heroku
Create Heroku app
heroku create perplexity-mcp-server
Set environment variables
heroku config:set PERPLEXITY_API_KEY=your_key_here
heroku config:set ALLOWED_ORIGINS=https://your-app.com
Render
Create a new Web Service with:
Build Command : npm install && npm run build
Start Command : npm run start:http
Environment Variables :
PERPLEXITY_API_KEY (secret)
PORT (auto-set by Render)
ALLOWED_ORIGINS (your domain)
Railway
# Install Railway CLI
npm install -g @railway/cli
# Initialize and deploy
railway init
railway up
# Set environment variables
railway variables set PERPLEXITY_API_KEY=your_key_here
railway variables set ALLOWED_ORIGINS=https://your-app.com
Health Check Configuration
All cloud platforms should be configured to use the /health endpoint for health checks:
curl http://your-deployment-url/health
Expected response:
{
"status" : "ok" ,
"service" : "perplexity-mcp-server"
}
Health Check Settings
Setting Recommended Value Path /healthInterval 30 seconds Timeout 5 seconds Healthy threshold 2 consecutive successes Unhealthy threshold 3 consecutive failures
Configure health checks to restart unhealthy containers automatically. This ensures high availability and automatic recovery from transient failures.
Security Best Practices
Environment Variable Security
Never commit API keys to version control
Use cloud provider secret management (AWS Secrets Manager, Google Secret Manager, etc.)
Rotate API keys regularly
Use different keys for development and production
Restrict ALLOWED_ORIGINS to specific domains
Use HTTPS/TLS for all external connections
Implement rate limiting at the load balancer level
Use VPC/private networking where possible
Implement authentication/authorization if needed
Use cloud IAM roles for service-to-service communication
Enable audit logging
Monitor API usage and set up alerts
Always use HTTPS in production. The MCP server runs HTTP by default - terminate SSL at your load balancer or reverse proxy.
Scaling Considerations
Horizontal Scaling
The Perplexity MCP Server is stateless and can be scaled horizontally:
Auto-scaling : Configure based on CPU/memory usage or request count
Load balancing : Distribute traffic across multiple instances
Session handling : Each request is independent (stateless)
Resource Allocation
Recommended resources per instance:
Environment CPU Memory Instances Development 0.25 vCPU 512 MB 1 Production 0.5-1 vCPU 1-2 GB 2+ High traffic 1-2 vCPU 2-4 GB 3+
Memory requirements may increase for long-running research queries. Monitor actual usage and adjust accordingly.
Timeout Configuration
For cloud deployments handling research queries, increase timeouts:
export PERPLEXITY_TIMEOUT_MS = 600000 # 10 minutes
Ensure your cloud platform’s timeout settings accommodate long-running requests:
Load balancer timeout > PERPLEXITY_TIMEOUT_MS
Container platform timeout > load balancer timeout
Monitoring and Logging
Log Level Configuration
export PERPLEXITY_LOG_LEVEL = INFO # DEBUG, INFO, WARN, ERROR
Metrics to Monitor
Request count and latency
Error rates (4xx, 5xx responses)
Health check success rate
API key usage and quotas
Container CPU and memory usage
Set up alerts for:
Health check failures
Error rate spikes
API quota approaching limits
Unusual traffic patterns
Example: Complete AWS Deployment
For a comprehensive HTTP server deployment guide with Node.js and process management, see the HTTP Server Deployment documentation.
For Docker-based deployments, refer to the Docker Deployment guide.