AWS ECS Deployment¶

Amazon Elastic Container Service (ECS) orchestrates Docker containers on AWS infrastructure. ECS handles container scheduling, networking, load balancing, and health monitoring, providing a managed platform for running containerized applications.

This guide covers ECS deployment patterns for Django applications, from task definitions to blue-green deployments with CodeDeploy.

Philosophy

ECS should be invisible infrastructure. Your application shouldn't know it's running on ECS. Configuration lives in task definitions and environment variables, not in code. Deployments should be zero-downtime by default.

ECS Fundamentals¶

Core Concepts¶

ECS Cluster: - Logical grouping of tasks and services - Can run on Fargate (serverless) or EC2 (self-managed) - Isolated networking and IAM boundaries - Example: production-cluster, staging-cluster

Task Definition: - Blueprint for running containers - Specifies: image, CPU, memory, environment variables, networking - Versioned (revision number increments on changes) - Immutable (create new revision to change)

Task: - Running instance of a task definition - One or more containers running together - Can be long-running (service) or one-time (scheduled job) - Has unique task ID

Service: - Maintains desired number of tasks running - Integrates with load balancer for web traffic - Handles rolling deployments and health checks - Automatically replaces failed tasks

Container: - Single Docker container within a task - Defined in task definition - Can be essential (task fails if container stops) or non-essential

ECS Architecture for Django¶

graph TB
    ALB[Application Load Balancer]
    TG1[Target Group Blue]
    TG2[Target Group Green]

    subgraph ECS_Cluster[ECS Cluster]
        subgraph Service[ECS Service]
            T1[Task 1<br/>web + worker]
            T2[Task 2<br/>web + worker]
            T3[Task 3<br/>web + worker]
        end
    end

    ECR[ECR Repository]
    RDS[(RDS Database)]
    REDIS[(ElastiCache Redis)]

    ALB --> TG1
    ALB --> TG2
    TG1 --> T1
    TG1 --> T2
    TG2 --> T3

    T1 --> RDS
    T2 --> RDS
    T3 --> RDS

    T1 --> REDIS
    T2 --> REDIS
    T3 --> REDIS

    ECR -.Pull Image.-> T1
    ECR -.Pull Image.-> T2
    ECR -.Pull Image.-> T3

    style ALB fill:#ff9900
    style ECS_Cluster fill:#f0f0f0
    style Service fill:#e1f5ff

Traffic Flow: 1. User request hits Application Load Balancer 2. ALB routes to target group (blue or green) 3. Target group distributes across healthy tasks 4. Task's web container processes request 5. Container connects to RDS/Redis as needed

Task Definition Structure¶

Web Application Task Definition¶

A complete task definition for Django web application:

{
    "family": "django-webapp",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "1024",
    "memory": "2048",
    "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
    "containerDefinitions": [
        {
            "name": "log_router",
            "image": "grafana/fluent-bit-plugin-loki:latest",
            "essential": true,
            "firelensConfiguration": {
                "type": "fluentbit",
                "options": {
                    "enable-ecs-log-metadata": "true"
                }
            },
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/log-router",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "firelens"
                }
            }
        },
        {
            "name": "web",
            "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": 8000,
                    "protocol": "tcp"
                }
            ],
            "environment": [
                {
                    "name": "DJANGO_SETTINGS_MODULE",
                    "value": "myproject.settings.production"
                },
                {
                    "name": "ENVIRONMENT",
                    "value": "production"
                },
                {
                    "name": "AWS_REGION",
                    "value": "us-east-1"
                }
            ],
            "secrets": [
                {
                    "name": "DATABASE_URL",
                    "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
                },
                {
                    "name": "SECRET_KEY",
                    "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/django-secret-key"
                }
            ],
            "logConfiguration": {
                "logDriver": "awsfirelens",
                "options": {
                    "Name": "grafana-loki",
                    "Url": "https://logs.example.com/loki/api/v1/push",
                    "Labels": "{job=\"django-web\",environment=\"production\"}",
                    "RemoveKeys": "container_id,ecs_task_arn"
                }
            },
            "healthCheck": {
                "command": [
                    "CMD-SHELL",
                    "curl -f http://localhost:8000/health || exit 1"
                ],
                "interval": 30,
                "timeout": 5,
                "retries": 3,
                "startPeriod": 60
            },
            "dependsOn": [
                {
                    "containerName": "log_router",
                    "condition": "START"
                }
            ]
        }
    ]
}

Key Components:

Resource Allocation: - cpu: Task-level CPU (1024 = 1 vCPU) - memory: Task-level memory in MiB - Must match Fargate valid combinations

IAM Roles: - executionRoleArn: ECS uses to pull images, fetch secrets - taskRoleArn: Application uses for AWS API calls

Container Configuration: - essential: true: Task fails if container stops - portMappings: Expose container ports - environment: Plain-text environment variables - secrets: Sensitive data from SSM/Secrets Manager

Logging: - logConfiguration: Where logs go - awsfirelens: Route logs to custom destinations - awslogs: CloudWatch Logs (default)

Health Checks: - command: Shell command to test health - interval: Seconds between checks - timeout: Max seconds for check to complete - retries: Failed checks before unhealthy - startPeriod: Grace period for app startup

Scheduled Task Definition¶

For background jobs and cron-like tasks:

{
    "family": "django-scheduled-job",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "512",
    "memory": "1024",
    "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
    "containerDefinitions": [
        {
            "name": "worker",
            "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
            "essential": true,
            "command": [
                "python",
                "manage.py",
                "sync_external_data"
            ],
            "environment": [
                {
                    "name": "DJANGO_SETTINGS_MODULE",
                    "value": "myproject.settings.production"
                }
            ],
            "secrets": [
                {
                    "name": "DATABASE_URL",
                    "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/django-scheduled-job",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "worker"
                }
            }
        }
    ]
}

Scheduled Task Characteristics: - No port mappings (doesn't serve traffic) - No health checks (runs to completion) - Different command than web containers - Lower resource allocation - Triggered by EventBridge schedule

Fargate vs EC2 Decision¶

Fargate (Serverless Containers)¶

How Fargate Works: - AWS manages underlying infrastructure - Pay per-second for CPU and memory - No server management required - Automatic scaling

When to Use Fargate: - Small to medium workloads (most Django apps) - Variable traffic patterns - Don't want to manage EC2 instances - Cost-effective for <100 tasks - Need rapid scaling

Fargate Pricing Example: - 1 vCPU, 2 GB memory - $0.04048 per vCPU-hour - $0.004445 per GB-hour - Total: ~$29/month per task (if running 24/7)

Fargate CPU/Memory Combinations:

vCPU	Memory Options (GB)
0.25	0.5, 1, 2
0.5	1, 2, 3, 4
1	2, 3, 4, 5, 6, 7, 8
2	4-16 (1 GB increments)
4	8-30 (1 GB increments)

EC2 Launch Type¶

How EC2 Launch Type Works: - You provision and manage EC2 instances - ECS schedules tasks on your instances - Bin-packing to maximize instance usage - Pay for EC2 instances (not per task)

When to Use EC2: - Large workloads (>100 tasks) - Steady, predictable traffic - Need specialized instance types (GPU, ARM) - Cost-effective for high utilization - Require specific instance features

EC2 Cost Example: - m5.xlarge: 4 vCPU, 16 GB memory - $0.192/hour = ~$140/month - Can run 8 tasks (0.5 vCPU each) = $17.50/task - Cheaper than Fargate at high utilization

Decision Matrix¶

Factor	Fargate	EC2
Management	AWS manages	You manage
Scaling	Instant	Launch instances first
Cost (low scale)	Lower	Higher
Cost (high scale)	Higher	Lower
Flexibility	Limited configs	Full control
Startup time	30-60 seconds	Instant (if instances ready)
Recommendation	Default choice	High-scale workloads

For Most Django Apps: Use Fargate - Simpler operations - Better for variable traffic - No instance management overhead - Cost-effective up to ~50-100 tasks

Service Configuration¶

Creating an ECS Service¶

Service Definition:

aws ecs create-service \
    --cluster production-cluster \
    --service-name django-web \
    --task-definition django-webapp:15 \
    --desired-count 3 \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={
        subnets=[subnet-abc123,subnet-def456],
        securityGroups=[sg-web123],
        assignPublicIp=DISABLED
    }" \
    --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/django-web/abc123,
                      containerName=web,
                      containerPort=8000" \
    --health-check-grace-period-seconds 60 \
    --deployment-configuration "maximumPercent=200,minimumHealthyPercent=100" \
    --enable-execute-command

Service Parameters:

Desired Count: - Number of tasks to keep running - 3 minimum for high availability - Auto-scaling adjusts this value

Network Configuration: - subnets: Private subnets for tasks - securityGroups: Firewall rules - assignPublicIp: DISABLED for private subnets (use NAT gateway)

Load Balancer: - targetGroupArn: ALB target group - containerName: Which container receives traffic - containerPort: Container's listening port

Deployment Configuration: - maximumPercent: Maximum % of desired during deployment (200 = double) - minimumHealthyPercent: Minimum % that must stay healthy (100 = no downtime)

Health Check Grace Period: - Seconds before failing health checks count - Allows application startup time - Should be longer than actual startup time

Service Auto-Scaling¶

Target Tracking Scaling:

# Register scalable target
aws application-autoscaling register-scalable-target \
    --service-namespace ecs \
    --resource-id service/production-cluster/django-web \
    --scalable-dimension ecs:service:DesiredCount \
    --min-capacity 3 \
    --max-capacity 20

# Create scaling policy
aws application-autoscaling put-scaling-policy \
    --service-namespace ecs \
    --resource-id service/production-cluster/django-web \
    --scalable-dimension ecs:service:DesiredCount \
    --policy-name django-cpu-scaling \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
        "TargetValue": 70.0,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
        },
        "ScaleOutCooldown": 60,
        "ScaleInCooldown": 300
    }'

Scaling Metrics:

Metric	Target Value	Use Case
CPU Utilization	70%	CPU-bound tasks
Memory Utilization	80%	Memory-intensive apps
Request Count	1000/task	Request-driven scaling
Custom Metric	Variable	Business logic

Cooldown Periods: - ScaleOutCooldown: Seconds before another scale-out (60s) - ScaleInCooldown: Seconds before another scale-in (300s) - Prevents rapid scaling oscillation

CodeDeploy Integration¶

Blue-Green Deployment Architecture¶

CodeDeploy orchestrates blue-green deployments on ECS:

graph TB
    Start[Start Deployment] --> Deploy[Deploy to Green]
    Deploy --> Health[Health Checks]
    Health --> Pass{Healthy?}
    Pass -->|Yes| Shift[Shift Traffic]
    Pass -->|No| Rollback1[Auto Rollback]
    Shift --> Monitor[Monitor Metrics]
    Monitor --> Stable{Stable?}
    Stable -->|Yes| Terminate[Terminate Blue]
    Stable -->|No| Rollback2[Auto Rollback]
    Rollback1 --> Alert[Alert Team]
    Rollback2 --> Alert
    Terminate --> Complete[Deployment Complete]

    style Start fill:#e1f5ff
    style Complete fill:#d4edda
    style Rollback1 fill:#f8d7da
    style Rollback2 fill:#f8d7da

Deployment Process:

Provision Green Environment:
Create new tasks with new image
Wait for tasks to reach healthy state
Validate health checks pass
Traffic Shift:
Gradually route traffic from blue to green
Monitor error rates during shift
Can be instant, linear, or canary
Validation Period:
Monitor CloudWatch metrics
Check error rate, latency, CPU
Auto-rollback if alarms trigger
Cleanup:
After successful validation, terminate blue tasks
Blue environment remains briefly for rapid rollback

CodeDeploy Application Setup¶

Create CodeDeploy Application:

# Create application
aws deploy create-application \
    --application-name django-app \
    --compute-platform ECS

# Create deployment group
aws deploy create-deployment-group \
    --application-name django-app \
    --deployment-group-name production \
    --service-role-arn arn:aws:iam::123456789012:role/CodeDeployServiceRole \
    --ecs-services "clusterName=production-cluster,serviceName=django-web" \
    --load-balancer-info '{
        "targetGroupPairInfoList": [{
            "targetGroups": [
                {"name": "django-web-blue"},
                {"name": "django-web-green"}
            ],
            "prodTrafficRoute": {
                "listenerArns": ["arn:aws:elasticloadbalancing:..."]
            }
        }]
    }' \
    --deployment-config-name CodeDeployDefault.ECSAllAtOnce \
    --auto-rollback-configuration '{
        "enabled": true,
        "events": ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
    }' \
    --alarm-configuration '{
        "enabled": true,
        "alarms": [
            {"name": "django-5xx-errors"},
            {"name": "django-cpu-high"}
        ]
    }'

Key Configuration:

Target Group Pair: - Blue and green target groups - Load balancer switches between them - Both attached to same ALB listener

Deployment Configuration: - ECSAllAtOnce: Instant traffic shift - ECSLinear10PercentEvery1Minutes: Gradual shift - ECSCanary10Percent5Minutes: Canary then full shift

Auto-Rollback: - Triggers on deployment failure - Triggers on CloudWatch alarm - Instantly reverts to blue environment

AppSpec File¶

AppSpec for ECS (embedded in deployment):

{
    "version": 1,
    "Resources": [
        {
            "TargetService": {
                "Type": "AWS::ECS::Service",
                "Properties": {
                    "TaskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15",
                    "LoadBalancerInfo": {
                        "ContainerName": "web",
                        "ContainerPort": 8000
                    },
                    "PlatformVersion": "LATEST",
                    "NetworkConfiguration": {
                        "AwsvpcConfiguration": {
                            "Subnets": ["subnet-abc123", "subnet-def456"],
                            "SecurityGroups": ["sg-web123"],
                            "AssignPublicIp": "DISABLED"
                        }
                    }
                }
            }
        }
    ],
    "Hooks": [
        {
            "BeforeInstall": "LambdaFunctionToRunDatabaseMigrations"
        },
        {
            "AfterInstall": "LambdaFunctionToWarmUpCache"
        },
        {
            "AfterAllowTestTraffic": "LambdaFunctionToRunSmokeTests"
        }
    ]
}

Deployment Hooks:

Hook	Timing	Use Case
`BeforeInstall`	Before new tasks start	Run migrations
`AfterInstall`	After new tasks healthy	Warm cache
`AfterAllowTestTraffic`	After test traffic routed	Smoke tests
`BeforeAllowTraffic`	Before production traffic	Final validation
`AfterAllowTraffic`	After production traffic	Post-deploy tasks

Creating a Deployment¶

CLI Deployment:

# Create deployment
aws deploy create-deployment \
    --application-name django-app \
    --deployment-group-name production \
    --description "Deploy version abc123" \
    --revision '{
        "revisionType": "AppSpecContent",
        "appSpecContent": {
            "content": "{\"version\":1,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15\",\"LoadBalancerInfo\":{\"ContainerName\":\"web\",\"ContainerPort\":8000}}}}]}"
        }
    }'

Deployment via GitHub Actions:

- name: Deploy to ECS
  run: |
    # Update task definition with new image
    TASK_DEFINITION=$(aws ecs describe-task-definition \
      --task-definition django-webapp \
      --query 'taskDefinition' \
      | jq '.containerDefinitions[0].image = "${{ env.ECR_REGISTRY }}/django-app:${{ github.sha }}"' \
      | jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

    # Register new task definition revision
    NEW_TASK_DEF=$(aws ecs register-task-definition \
      --cli-input-json "$TASK_DEFINITION" \
      --query 'taskDefinition.taskDefinitionArn' \
      --output text)

    # Create AppSpec
    APPSPEC=$(cat <<EOF
    {
      "version": 1,
      "Resources": [{
        "TargetService": {
          "Type": "AWS::ECS::Service",
          "Properties": {
            "TaskDefinition": "$NEW_TASK_DEF",
            "LoadBalancerInfo": {
              "ContainerName": "web",
              "ContainerPort": 8000
            }
          }
        }
      }]
    }
    EOF
    )

    # Create deployment
    aws deploy create-deployment \
      --application-name django-app \
      --deployment-group-name production \
      --revision revisionType=AppSpecContent,appSpecContent={content="$APPSPEC"}

Monitoring Deployment:

# Get deployment ID from create-deployment output
DEPLOYMENT_ID="d-XXXXXXXXX"

# Watch deployment status
aws deploy get-deployment \
    --deployment-id $DEPLOYMENT_ID \
    --query 'deploymentInfo.status' \
    --output text

# Wait for completion
aws deploy wait deployment-successful \
    --deployment-id $DEPLOYMENT_ID

# If failed, get failure reason
aws deploy get-deployment \
    --deployment-id $DEPLOYMENT_ID \
    --query 'deploymentInfo.errorInformation'

Health Checks and Monitoring¶

Container Health Checks¶

Health Check in Task Definition:

"healthCheck": {
    "command": [
        "CMD-SHELL",
        "curl -f http://localhost:8000/health || exit 1"
    ],
    "interval": 30,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 60
}

Django Health Endpoint:

# myproject/views/health.py
from django.http import JsonResponse
from django.db import connection
from django.core.cache import cache

def health_check(request):
    """Health check endpoint for ECS."""
    checks = {
        "database": check_database(),
        "cache": check_cache(),
        "status": "healthy"
    }

    all_healthy = all(checks.values())
    status_code = 200 if all_healthy else 503

    return JsonResponse(checks, status=status_code)

def check_database():
    """Verify database connectivity."""
    try:
        connection.ensure_connection()
        return True
    except Exception:
        return False

def check_cache():
    """Verify cache connectivity."""
    try:
        cache.set("health_check", "ok", 10)
        return cache.get("health_check") == "ok"
    except Exception:
        return False

Health Check Best Practices: - Return 200 for healthy, 503 for unhealthy - Check critical dependencies (database, cache) - Timeout faster than ECS health check timeout - Don't perform expensive operations - Cache health check results briefly

Load Balancer Health Checks¶

Target Group Health Check:

aws elbv2 modify-target-group \
    --target-group-arn arn:aws:elasticloadbalancing:... \
    --health-check-protocol HTTP \
    --health-check-path /health \
    --health-check-interval-seconds 30 \
    --health-check-timeout-seconds 5 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 3

Health Check Parameters:

Parameter	Recommended	Purpose
`interval`	30 seconds	Time between checks
`timeout`	5 seconds	Max response time
`healthy-threshold`	2	Consecutive successes to be healthy
`unhealthy-threshold`	3	Consecutive failures to be unhealthy

Health Check Flow: 1. Task starts, enters startPeriod grace period 2. After grace period, health checks begin 3. Must pass healthy-threshold checks to receive traffic 4. If unhealthy-threshold failures, task is killed and replaced

CloudWatch Alarms¶

CPU Alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name django-cpu-high \
    --alarm-description "Alert when CPU exceeds 80%" \
    --metric-name CPUUtilization \
    --namespace AWS/ECS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=ServiceName,Value=django-web Name=ClusterName,Value=production-cluster

Error Rate Alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name django-5xx-errors \
    --alarm-description "Alert on 5xx errors" \
    --metric-name HTTPCode_Target_5XX_Count \
    --namespace AWS/ApplicationELB \
    --statistic Sum \
    --period 60 \
    --evaluation-periods 2 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=TargetGroup,Value=targetgroup/django-web/abc123

Recommended Alarms:

Alarm	Metric	Threshold	Action
CPU High	CPUUtilization	>80%	Scale out or rollback
Memory High	MemoryUtilization	>85%	Scale out or rollback
5xx Errors	HTTPCode_Target_5XX	>10/min	Rollback deployment
Slow Response	TargetResponseTime	>2 seconds	Investigate performance
Unhealthy Hosts	UnHealthyHostCount	>0	Alert on-call

Rollback Strategies¶

Automatic Rollback¶

CodeDeploy automatically rolls back when:

Deployment Failures: - New tasks fail to reach healthy state - Health check failures exceed threshold - Deployment times out

CloudWatch Alarms: - CPU exceeds threshold during deployment - Error rate spikes during deployment - Custom alarm triggers

Configuration:

{
    "autoRollbackConfiguration": {
        "enabled": true,
        "events": [
            "DEPLOYMENT_FAILURE",
            "DEPLOYMENT_STOP_ON_ALARM"
        ]
    },
    "alarmConfiguration": {
        "enabled": true,
        "alarms": [
            {"name": "django-5xx-errors"},
            {"name": "django-cpu-high"}
        ],
        "ignorePollAlarmFailure": false
    }
}

Rollback Process: 1. CodeDeploy detects failure condition 2. Stops routing traffic to green environment 3. Routes all traffic back to blue environment 4. Terminates green tasks 5. Sends SNS notification

Rollback Time: 2-5 minutes (faster than new deployment)

Manual Rollback¶

When to Manually Rollback: - Business logic errors discovered after deployment - Data consistency issues - Performance degradation not caught by alarms - User-reported critical bugs

Manual Rollback Methods:

Method 1: Redeploy Previous Task Definition

# List recent task definitions
aws ecs list-task-definitions \
    --family-prefix django-webapp \
    --sort DESC \
    --max-items 5

# Create deployment with previous revision
aws deploy create-deployment \
    --application-name django-app \
    --deployment-group-name production \
    --revision revisionType=AppSpecContent,appSpecContent={content='{"version":1,"Resources":[{"TargetService":{"Type":"AWS::ECS::Service","Properties":{"TaskDefinition":"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:14"}}}]}'}

Method 2: Stop Active Deployment

# Stop current deployment (triggers auto-rollback)
aws deploy stop-deployment \
    --deployment-id d-XXXXXXXXX \
    --auto-rollback-enabled

Method 3: Update Service Directly (Emergency)

# Bypass CodeDeploy and update service directly
aws ecs update-service \
    --cluster production-cluster \
    --service django-web \
    --task-definition django-webapp:14 \
    --force-new-deployment

Emergency Only

Direct service updates bypass CodeDeploy safety checks. Use only in emergencies when CodeDeploy itself is failing.

Database Rollback Considerations¶

Forward-Only Migrations: - Never write destructive migrations (DROP COLUMN, DROP TABLE) - Deploy schema changes separately from code changes - Use feature flags to enable new features

Two-Phase Deployment:

Phase 1: Schema Change

# Migration: Add nullable column
class Migration(migrations.Migration):
    operations = [
        migrations.AddField(
            model_name='user',
            name='new_field',
            field=models.CharField(max_length=100, null=True, blank=True),
        ),
    ]

Phase 2 (after code deployed): Make Required

# Migration: Make column non-nullable (after data backfilled)
class Migration(migrations.Migration):
    operations = [
        migrations.AlterField(
            model_name='user',
            name='new_field',
            field=models.CharField(max_length=100),
        ),
    ]

Rollback-Safe Pattern: - Old code works with new schema (column nullable) - New code works with new schema - Rollback to old code still works

Complete Deployment Example¶

End-to-End Deployment Workflow¶

#!/bin/bash
set -e

# Configuration
CLUSTER="production-cluster"
SERVICE="django-web"
TASK_FAMILY="django-webapp"
APP_NAME="django-app"
DEPLOY_GROUP="production"
ECR_REGISTRY="123456789012.dkr.ecr.us-east-1.amazonaws.com"
IMAGE_NAME="django-app"
COMMIT_SHA=$(git rev-parse --short HEAD)

echo "🚀 Starting deployment of ${COMMIT_SHA}"

# Step 1: Build and push image
echo "📦 Building Docker image..."
docker build -t ${IMAGE_NAME}:${COMMIT_SHA} .

echo "🔐 Logging into ECR..."
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin ${ECR_REGISTRY}

echo "⬆️ Pushing image to ECR..."
docker tag ${IMAGE_NAME}:${COMMIT_SHA} ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}
docker push ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}

# Step 2: Update task definition
echo "📝 Updating task definition..."
TASK_DEF=$(aws ecs describe-task-definition --task-definition ${TASK_FAMILY})

NEW_TASK_DEF=$(echo $TASK_DEF | \
    jq --arg IMAGE "${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}" \
       '.taskDefinition | .containerDefinitions[0].image = $IMAGE' | \
    jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

NEW_TASK_ARN=$(aws ecs register-task-definition \
    --cli-input-json "$NEW_TASK_DEF" \
    --query 'taskDefinition.taskDefinitionArn' \
    --output text)

echo "✅ Registered new task definition: ${NEW_TASK_ARN}"

# Step 3: Run database migrations (before deployment)
echo "🗄️ Running database migrations..."
aws ecs run-task \
    --cluster ${CLUSTER} \
    --task-definition ${TASK_FAMILY} \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-migration123],assignPublicIp=DISABLED}" \
    --overrides '{
        "containerOverrides": [{
            "name": "web",
            "command": ["python", "manage.py", "migrate", "--noinput"]
        }]
    }'

echo "⏳ Waiting for migrations to complete..."
sleep 30

# Step 4: Create CodeDeploy deployment
echo "🚢 Creating CodeDeploy deployment..."

APPSPEC=$(cat <<EOF
{
  "version": 1,
  "Resources": [{
    "TargetService": {
      "Type": "AWS::ECS::Service",
      "Properties": {
        "TaskDefinition": "${NEW_TASK_ARN}",
        "LoadBalancerInfo": {
          "ContainerName": "web",
          "ContainerPort": 8000
        }
      }
    }
  }]
}
EOF
)

DEPLOYMENT_ID=$(aws deploy create-deployment \
    --application-name ${APP_NAME} \
    --deployment-group-name ${DEPLOY_GROUP} \
    --description "Deploy ${COMMIT_SHA}" \
    --revision revisionType=AppSpecContent,appSpecContent="{content='${APPSPEC}'}" \
    --query 'deploymentId' \
    --output text)

echo "📊 Deployment ID: ${DEPLOYMENT_ID}"
echo "🔗 https://console.aws.amazon.com/codesuite/codedeploy/deployments/${DEPLOYMENT_ID}"

# Step 5: Wait for deployment
echo "⏳ Waiting for deployment to complete..."
aws deploy wait deployment-successful --deployment-id ${DEPLOYMENT_ID}

echo "✅ Deployment successful!"
echo "🎉 Version ${COMMIT_SHA} is now live!"

Next Steps¶

After understanding ECS deployment:

CI/CD Overview: Review complete pipeline architecture
GitHub Actions: Automate deployments in CI/CD
Monitoring: Implement comprehensive monitoring
Configuration: Manage secrets and parameters

Start Simple

Begin with basic ECS service deployment, then add CodeDeploy, then implement blue-green deployments. Each layer adds safety but also complexity.

Internal Documentation: - Container Building: Optimize Docker images for ECS - Environment Variables: Configuration management - AWS SSM Parameters: Secrets storage - 12-Factor App: Design principles

External Resources: - ECS Best Practices Guide - ECS Documentation - CodeDeploy Documentation - Fargate Pricing - ECS Workshop