Cloud Deployment - Perplexity MCP Server

Deploy the Perplexity MCP Server to cloud platforms for scalable, production-ready environments with high availability and automated management.

Overview

Cloud deployment allows you to:

Scale horizontally with multiple instances
Leverage managed infrastructure
Implement automated deployments
Benefit from cloud provider reliability
Integrate with cloud-native monitoring

General Requirements

All cloud deployments require:

Environment Variables

PERPLEXITY_API_KEY (required)
PORT for HTTP server
ALLOWED_ORIGINS for CORS

Networking

HTTP server listening on configured port
Health check endpoint at /health
MCP endpoint at /mcp

Platform Examples

AWS Elastic Container Service (ECS)

Build and push Docker image

Build your Docker image and push to Amazon ECR:

# Authenticate to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account-id>.dkr.ecr.us-east-1.amazonaws.com

# Build and tag
docker build -t perplexity-mcp-server .
docker tag perplexity-mcp-server:latest <account-id>.dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest

# Push to ECR
docker push <account-id>.dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest

Create ECS task definition

Define your container with environment variables:

task-definition.json

{
  "family": "perplexity-mcp-server",
  "containerDefinitions": [
    {
      "name": "perplexity-mcp",
      "image": "<account-id>.dkr.ecr.us-east-1.amazonaws.com/perplexity-mcp-server:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "PORT", "value": "8080"},
        {"name": "BIND_ADDRESS", "value": "0.0.0.0"},
        {"name": "ALLOWED_ORIGINS", "value": "https://your-app.com"}
      ],
      "secrets": [
        {
          "name": "PERPLEXITY_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:<account-id>:secret:perplexity-api-key"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Deploy to ECS

Create service with load balancer integration for high availability.

Google Cloud Run

Build and deploy

Deploy directly from source or Docker image:

# Deploy from source
gcloud run deploy perplexity-mcp-server \
  --source . \
  --region us-central1 \
  --set-env-vars PORT=8080 \
  --set-env-vars ALLOWED_ORIGINS=https://your-app.com \
  --set-secrets PERPLEXITY_API_KEY=perplexity-api-key:latest

Configure health checks

Cloud Run automatically uses the /health endpoint:

gcloud run services update perplexity-mcp-server \
  --region us-central1 \
  --no-use-http2

Azure Container Instances

az container create \
  --resource-group myResourceGroup \
  --name perplexity-mcp-server \
  --image <registry>.azurecr.io/perplexity-mcp-server:latest \
  --dns-name-label perplexity-mcp \
  --ports 8080 \
  --environment-variables \
    PORT=8080 \
    BIND_ADDRESS=0.0.0.0 \
    ALLOWED_ORIGINS=https://your-app.com \
  --secure-environment-variables \
    PERPLEXITY_API_KEY=your_key_here

Heroku

Create Heroku app

heroku create perplexity-mcp-server

Set environment variables

heroku config:set PERPLEXITY_API_KEY=your_key_here
heroku config:set ALLOWED_ORIGINS=https://your-app.com

Deploy

git push heroku main

Render

Create a new Web Service with:

Build Command: npm install && npm run build
Start Command: npm run start:http
Environment Variables:
- PERPLEXITY_API_KEY (secret)
- PORT (auto-set by Render)
- ALLOWED_ORIGINS (your domain)

Railway

# Install Railway CLI
npm install -g @railway/cli

# Initialize and deploy
railway init
railway up

# Set environment variables
railway variables set PERPLEXITY_API_KEY=your_key_here
railway variables set ALLOWED_ORIGINS=https://your-app.com

Health Check Configuration

All cloud platforms should be configured to use the /health endpoint for health checks:

curl http://your-deployment-url/health

Expected response:

{
  "status": "ok",
  "service": "perplexity-mcp-server"
}

Health Check Settings

Setting	Recommended Value
Path	`/health`
Interval	30 seconds
Timeout	5 seconds
Healthy threshold	2 consecutive successes
Unhealthy threshold	3 consecutive failures

Configure health checks to restart unhealthy containers automatically. This ensures high availability and automatic recovery from transient failures.

Security Best Practices

Environment Variable Security

Never commit API keys to version control
Use cloud provider secret management (AWS Secrets Manager, Google Secret Manager, etc.)
Rotate API keys regularly
Use different keys for development and production

Network Security

Restrict ALLOWED_ORIGINS to specific domains
Use HTTPS/TLS for all external connections
Implement rate limiting at the load balancer level
Use VPC/private networking where possible

Access Control

Implement authentication/authorization if needed
Use cloud IAM roles for service-to-service communication
Enable audit logging
Monitor API usage and set up alerts

Always use HTTPS in production. The MCP server runs HTTP by default - terminate SSL at your load balancer or reverse proxy.

Scaling Considerations

Horizontal Scaling

The Perplexity MCP Server is stateless and can be scaled horizontally:

Auto-scaling: Configure based on CPU/memory usage or request count
Load balancing: Distribute traffic across multiple instances
Session handling: Each request is independent (stateless)

Resource Allocation

Recommended resources per instance:

Environment	CPU	Memory	Instances
Development	0.25 vCPU	512 MB	1
Production	0.5-1 vCPU	1-2 GB	2+
High traffic	1-2 vCPU	2-4 GB	3+

Memory requirements may increase for long-running research queries. Monitor actual usage and adjust accordingly.

Timeout Configuration

For cloud deployments handling research queries, increase timeouts:

export PERPLEXITY_TIMEOUT_MS=600000  # 10 minutes

Ensure your cloud platform’s timeout settings accommodate long-running requests:

Load balancer timeout > PERPLEXITY_TIMEOUT_MS
Container platform timeout > load balancer timeout

Monitoring and Logging

Log Level Configuration

export PERPLEXITY_LOG_LEVEL=INFO  # DEBUG, INFO, WARN, ERROR

Metrics to Monitor

Request count and latency
Error rates (4xx, 5xx responses)
Health check success rate
API key usage and quotas
Container CPU and memory usage

Set up alerts for:

Health check failures
Error rate spikes
API quota approaching limits
Unusual traffic patterns

Example: Complete AWS Deployment

For a comprehensive HTTP server deployment guide with Node.js and process management, see the HTTP Server Deployment documentation. For Docker-based deployments, refer to the Docker Deployment guide.

​Overview

​General Requirements

Environment Variables

Networking

​Platform Examples

​AWS Elastic Container Service (ECS)

​Google Cloud Run

​Azure Container Instances

​Heroku

​Render

​Railway

​Health Check Configuration

​Health Check Settings

​Security Best Practices

​Scaling Considerations

​Horizontal Scaling

​Resource Allocation

​Timeout Configuration

​Monitoring and Logging

​Log Level Configuration

​Metrics to Monitor

​Example: Complete AWS Deployment

Overview

General Requirements

Platform Examples

AWS Elastic Container Service (ECS)

Google Cloud Run

Azure Container Instances

Heroku

Render

Railway

Health Check Configuration

Health Check Settings

Security Best Practices

Scaling Considerations

Horizontal Scaling

Resource Allocation

Timeout Configuration

Monitoring and Logging

Log Level Configuration

Metrics to Monitor

Example: Complete AWS Deployment