Skip to content

AWS ECS Deployment

Amazon Elastic Container Service (ECS) orchestrates Docker containers on AWS infrastructure. ECS handles container scheduling, networking, load balancing, and health monitoring, providing a managed platform for running containerized applications.

This guide covers ECS deployment patterns for Django applications, from task definitions to blue-green deployments with CodeDeploy.

Philosophy

ECS should be invisible infrastructure. Your application shouldn't know it's running on ECS. Configuration lives in task definitions and environment variables, not in code. Deployments should be zero-downtime by default.

ECS Fundamentals

Core Concepts

ECS Cluster: - Logical grouping of tasks and services - Can run on Fargate (serverless) or EC2 (self-managed) - Isolated networking and IAM boundaries - Example: production-cluster, staging-cluster

Task Definition: - Blueprint for running containers - Specifies: image, CPU, memory, environment variables, networking - Versioned (revision number increments on changes) - Immutable (create new revision to change)

Task: - Running instance of a task definition - One or more containers running together - Can be long-running (service) or one-time (scheduled job) - Has unique task ID

Service: - Maintains desired number of tasks running - Integrates with load balancer for web traffic - Handles rolling deployments and health checks - Automatically replaces failed tasks

Container: - Single Docker container within a task - Defined in task definition - Can be essential (task fails if container stops) or non-essential

ECS Architecture for Django

graph TB
    ALB[Application Load Balancer]
    TG1[Target Group Blue]
    TG2[Target Group Green]

    subgraph ECS_Cluster[ECS Cluster]
        subgraph Service[ECS Service]
            T1[Task 1<br/>web + worker]
            T2[Task 2<br/>web + worker]
            T3[Task 3<br/>web + worker]
        end
    end

    ECR[ECR Repository]
    RDS[(RDS Database)]
    REDIS[(ElastiCache Redis)]

    ALB --> TG1
    ALB --> TG2
    TG1 --> T1
    TG1 --> T2
    TG2 --> T3

    T1 --> RDS
    T2 --> RDS
    T3 --> RDS

    T1 --> REDIS
    T2 --> REDIS
    T3 --> REDIS

    ECR -.Pull Image.-> T1
    ECR -.Pull Image.-> T2
    ECR -.Pull Image.-> T3

    style ALB fill:#ff9900
    style ECS_Cluster fill:#f0f0f0
    style Service fill:#e1f5ff

Traffic Flow: 1. User request hits Application Load Balancer 2. ALB routes to target group (blue or green) 3. Target group distributes across healthy tasks 4. Task's web container processes request 5. Container connects to RDS/Redis as needed

Task Definition Structure

Web Application Task Definition

A complete task definition for Django web application:

{
    "family": "django-webapp",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "1024",
    "memory": "2048",
    "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
    "containerDefinitions": [
        {
            "name": "log_router",
            "image": "grafana/fluent-bit-plugin-loki:latest",
            "essential": true,
            "firelensConfiguration": {
                "type": "fluentbit",
                "options": {
                    "enable-ecs-log-metadata": "true"
                }
            },
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/log-router",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "firelens"
                }
            }
        },
        {
            "name": "web",
            "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
            "essential": true,
            "portMappings": [
                {
                    "containerPort": 8000,
                    "protocol": "tcp"
                }
            ],
            "environment": [
                {
                    "name": "DJANGO_SETTINGS_MODULE",
                    "value": "myproject.settings.production"
                },
                {
                    "name": "ENVIRONMENT",
                    "value": "production"
                },
                {
                    "name": "AWS_REGION",
                    "value": "us-east-1"
                }
            ],
            "secrets": [
                {
                    "name": "DATABASE_URL",
                    "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
                },
                {
                    "name": "SECRET_KEY",
                    "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/django-secret-key"
                }
            ],
            "logConfiguration": {
                "logDriver": "awsfirelens",
                "options": {
                    "Name": "grafana-loki",
                    "Url": "https://logs.example.com/loki/api/v1/push",
                    "Labels": "{job=\"django-web\",environment=\"production\"}",
                    "RemoveKeys": "container_id,ecs_task_arn"
                }
            },
            "healthCheck": {
                "command": [
                    "CMD-SHELL",
                    "curl -f http://localhost:8000/health || exit 1"
                ],
                "interval": 30,
                "timeout": 5,
                "retries": 3,
                "startPeriod": 60
            },
            "dependsOn": [
                {
                    "containerName": "log_router",
                    "condition": "START"
                }
            ]
        }
    ]
}

Key Components:

Resource Allocation: - cpu: Task-level CPU (1024 = 1 vCPU) - memory: Task-level memory in MiB - Must match Fargate valid combinations

IAM Roles: - executionRoleArn: ECS uses to pull images, fetch secrets - taskRoleArn: Application uses for AWS API calls

Container Configuration: - essential: true: Task fails if container stops - portMappings: Expose container ports - environment: Plain-text environment variables - secrets: Sensitive data from SSM/Secrets Manager

Logging: - logConfiguration: Where logs go - awsfirelens: Route logs to custom destinations - awslogs: CloudWatch Logs (default)

Health Checks: - command: Shell command to test health - interval: Seconds between checks - timeout: Max seconds for check to complete - retries: Failed checks before unhealthy - startPeriod: Grace period for app startup

Scheduled Task Definition

For background jobs and cron-like tasks:

{
    "family": "django-scheduled-job",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "512",
    "memory": "1024",
    "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
    "taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
    "containerDefinitions": [
        {
            "name": "worker",
            "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
            "essential": true,
            "command": [
                "python",
                "manage.py",
                "sync_external_data"
            ],
            "environment": [
                {
                    "name": "DJANGO_SETTINGS_MODULE",
                    "value": "myproject.settings.production"
                }
            ],
            "secrets": [
                {
                    "name": "DATABASE_URL",
                    "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/django-scheduled-job",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream-prefix": "worker"
                }
            }
        }
    ]
}

Scheduled Task Characteristics: - No port mappings (doesn't serve traffic) - No health checks (runs to completion) - Different command than web containers - Lower resource allocation - Triggered by EventBridge schedule

Fargate vs EC2 Decision

Fargate (Serverless Containers)

How Fargate Works: - AWS manages underlying infrastructure - Pay per-second for CPU and memory - No server management required - Automatic scaling

When to Use Fargate: - Small to medium workloads (most Django apps) - Variable traffic patterns - Don't want to manage EC2 instances - Cost-effective for <100 tasks - Need rapid scaling

Fargate Pricing Example: - 1 vCPU, 2 GB memory - $0.04048 per vCPU-hour - \(0.004445 per GB-hour - Total: ~\)29/month per task (if running 24/7)

Fargate CPU/Memory Combinations:

vCPU Memory Options (GB)
0.25 0.5, 1, 2
0.5 1, 2, 3, 4
1 2, 3, 4, 5, 6, 7, 8
2 4-16 (1 GB increments)
4 8-30 (1 GB increments)

EC2 Launch Type

How EC2 Launch Type Works: - You provision and manage EC2 instances - ECS schedules tasks on your instances - Bin-packing to maximize instance usage - Pay for EC2 instances (not per task)

When to Use EC2: - Large workloads (>100 tasks) - Steady, predictable traffic - Need specialized instance types (GPU, ARM) - Cost-effective for high utilization - Require specific instance features

EC2 Cost Example: - m5.xlarge: 4 vCPU, 16 GB memory - \(0.192/hour = ~\)140/month - Can run 8 tasks (0.5 vCPU each) = $17.50/task - Cheaper than Fargate at high utilization

Decision Matrix

Factor Fargate EC2
Management AWS manages You manage
Scaling Instant Launch instances first
Cost (low scale) Lower Higher
Cost (high scale) Higher Lower
Flexibility Limited configs Full control
Startup time 30-60 seconds Instant (if instances ready)
Recommendation Default choice High-scale workloads

For Most Django Apps: Use Fargate - Simpler operations - Better for variable traffic - No instance management overhead - Cost-effective up to ~50-100 tasks

Service Configuration

Creating an ECS Service

Service Definition:

aws ecs create-service \
    --cluster production-cluster \
    --service-name django-web \
    --task-definition django-webapp:15 \
    --desired-count 3 \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={
        subnets=[subnet-abc123,subnet-def456],
        securityGroups=[sg-web123],
        assignPublicIp=DISABLED
    }" \
    --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/django-web/abc123,
                      containerName=web,
                      containerPort=8000" \
    --health-check-grace-period-seconds 60 \
    --deployment-configuration "maximumPercent=200,minimumHealthyPercent=100" \
    --enable-execute-command

Service Parameters:

Desired Count: - Number of tasks to keep running - 3 minimum for high availability - Auto-scaling adjusts this value

Network Configuration: - subnets: Private subnets for tasks - securityGroups: Firewall rules - assignPublicIp: DISABLED for private subnets (use NAT gateway)

Load Balancer: - targetGroupArn: ALB target group - containerName: Which container receives traffic - containerPort: Container's listening port

Deployment Configuration: - maximumPercent: Maximum % of desired during deployment (200 = double) - minimumHealthyPercent: Minimum % that must stay healthy (100 = no downtime)

Health Check Grace Period: - Seconds before failing health checks count - Allows application startup time - Should be longer than actual startup time

Service Auto-Scaling

Target Tracking Scaling:

# Register scalable target
aws application-autoscaling register-scalable-target \
    --service-namespace ecs \
    --resource-id service/production-cluster/django-web \
    --scalable-dimension ecs:service:DesiredCount \
    --min-capacity 3 \
    --max-capacity 20

# Create scaling policy
aws application-autoscaling put-scaling-policy \
    --service-namespace ecs \
    --resource-id service/production-cluster/django-web \
    --scalable-dimension ecs:service:DesiredCount \
    --policy-name django-cpu-scaling \
    --policy-type TargetTrackingScaling \
    --target-tracking-scaling-policy-configuration '{
        "TargetValue": 70.0,
        "PredefinedMetricSpecification": {
            "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
        },
        "ScaleOutCooldown": 60,
        "ScaleInCooldown": 300
    }'

Scaling Metrics:

Metric Target Value Use Case
CPU Utilization 70% CPU-bound tasks
Memory Utilization 80% Memory-intensive apps
Request Count 1000/task Request-driven scaling
Custom Metric Variable Business logic

Cooldown Periods: - ScaleOutCooldown: Seconds before another scale-out (60s) - ScaleInCooldown: Seconds before another scale-in (300s) - Prevents rapid scaling oscillation

CodeDeploy Integration

Blue-Green Deployment Architecture

CodeDeploy orchestrates blue-green deployments on ECS:

graph TB
    Start[Start Deployment] --> Deploy[Deploy to Green]
    Deploy --> Health[Health Checks]
    Health --> Pass{Healthy?}
    Pass -->|Yes| Shift[Shift Traffic]
    Pass -->|No| Rollback1[Auto Rollback]
    Shift --> Monitor[Monitor Metrics]
    Monitor --> Stable{Stable?}
    Stable -->|Yes| Terminate[Terminate Blue]
    Stable -->|No| Rollback2[Auto Rollback]
    Rollback1 --> Alert[Alert Team]
    Rollback2 --> Alert
    Terminate --> Complete[Deployment Complete]

    style Start fill:#e1f5ff
    style Complete fill:#d4edda
    style Rollback1 fill:#f8d7da
    style Rollback2 fill:#f8d7da

Deployment Process:

  1. Provision Green Environment:
  2. Create new tasks with new image
  3. Wait for tasks to reach healthy state
  4. Validate health checks pass

  5. Traffic Shift:

  6. Gradually route traffic from blue to green
  7. Monitor error rates during shift
  8. Can be instant, linear, or canary

  9. Validation Period:

  10. Monitor CloudWatch metrics
  11. Check error rate, latency, CPU
  12. Auto-rollback if alarms trigger

  13. Cleanup:

  14. After successful validation, terminate blue tasks
  15. Blue environment remains briefly for rapid rollback

CodeDeploy Application Setup

Create CodeDeploy Application:

# Create application
aws deploy create-application \
    --application-name django-app \
    --compute-platform ECS

# Create deployment group
aws deploy create-deployment-group \
    --application-name django-app \
    --deployment-group-name production \
    --service-role-arn arn:aws:iam::123456789012:role/CodeDeployServiceRole \
    --ecs-services "clusterName=production-cluster,serviceName=django-web" \
    --load-balancer-info '{
        "targetGroupPairInfoList": [{
            "targetGroups": [
                {"name": "django-web-blue"},
                {"name": "django-web-green"}
            ],
            "prodTrafficRoute": {
                "listenerArns": ["arn:aws:elasticloadbalancing:..."]
            }
        }]
    }' \
    --deployment-config-name CodeDeployDefault.ECSAllAtOnce \
    --auto-rollback-configuration '{
        "enabled": true,
        "events": ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
    }' \
    --alarm-configuration '{
        "enabled": true,
        "alarms": [
            {"name": "django-5xx-errors"},
            {"name": "django-cpu-high"}
        ]
    }'

Key Configuration:

Target Group Pair: - Blue and green target groups - Load balancer switches between them - Both attached to same ALB listener

Deployment Configuration: - ECSAllAtOnce: Instant traffic shift - ECSLinear10PercentEvery1Minutes: Gradual shift - ECSCanary10Percent5Minutes: Canary then full shift

Auto-Rollback: - Triggers on deployment failure - Triggers on CloudWatch alarm - Instantly reverts to blue environment

AppSpec File

AppSpec for ECS (embedded in deployment):

{
    "version": 1,
    "Resources": [
        {
            "TargetService": {
                "Type": "AWS::ECS::Service",
                "Properties": {
                    "TaskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15",
                    "LoadBalancerInfo": {
                        "ContainerName": "web",
                        "ContainerPort": 8000
                    },
                    "PlatformVersion": "LATEST",
                    "NetworkConfiguration": {
                        "AwsvpcConfiguration": {
                            "Subnets": ["subnet-abc123", "subnet-def456"],
                            "SecurityGroups": ["sg-web123"],
                            "AssignPublicIp": "DISABLED"
                        }
                    }
                }
            }
        }
    ],
    "Hooks": [
        {
            "BeforeInstall": "LambdaFunctionToRunDatabaseMigrations"
        },
        {
            "AfterInstall": "LambdaFunctionToWarmUpCache"
        },
        {
            "AfterAllowTestTraffic": "LambdaFunctionToRunSmokeTests"
        }
    ]
}

Deployment Hooks:

Hook Timing Use Case
BeforeInstall Before new tasks start Run migrations
AfterInstall After new tasks healthy Warm cache
AfterAllowTestTraffic After test traffic routed Smoke tests
BeforeAllowTraffic Before production traffic Final validation
AfterAllowTraffic After production traffic Post-deploy tasks

Creating a Deployment

CLI Deployment:

# Create deployment
aws deploy create-deployment \
    --application-name django-app \
    --deployment-group-name production \
    --description "Deploy version abc123" \
    --revision '{
        "revisionType": "AppSpecContent",
        "appSpecContent": {
            "content": "{\"version\":1,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15\",\"LoadBalancerInfo\":{\"ContainerName\":\"web\",\"ContainerPort\":8000}}}}]}"
        }
    }'

Deployment via GitHub Actions:

- name: Deploy to ECS
  run: |
    # Update task definition with new image
    TASK_DEFINITION=$(aws ecs describe-task-definition \
      --task-definition django-webapp \
      --query 'taskDefinition' \
      | jq '.containerDefinitions[0].image = "${{ env.ECR_REGISTRY }}/django-app:${{ github.sha }}"' \
      | jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

    # Register new task definition revision
    NEW_TASK_DEF=$(aws ecs register-task-definition \
      --cli-input-json "$TASK_DEFINITION" \
      --query 'taskDefinition.taskDefinitionArn' \
      --output text)

    # Create AppSpec
    APPSPEC=$(cat <<EOF
    {
      "version": 1,
      "Resources": [{
        "TargetService": {
          "Type": "AWS::ECS::Service",
          "Properties": {
            "TaskDefinition": "$NEW_TASK_DEF",
            "LoadBalancerInfo": {
              "ContainerName": "web",
              "ContainerPort": 8000
            }
          }
        }
      }]
    }
    EOF
    )

    # Create deployment
    aws deploy create-deployment \
      --application-name django-app \
      --deployment-group-name production \
      --revision revisionType=AppSpecContent,appSpecContent={content="$APPSPEC"}

Monitoring Deployment:

# Get deployment ID from create-deployment output
DEPLOYMENT_ID="d-XXXXXXXXX"

# Watch deployment status
aws deploy get-deployment \
    --deployment-id $DEPLOYMENT_ID \
    --query 'deploymentInfo.status' \
    --output text

# Wait for completion
aws deploy wait deployment-successful \
    --deployment-id $DEPLOYMENT_ID

# If failed, get failure reason
aws deploy get-deployment \
    --deployment-id $DEPLOYMENT_ID \
    --query 'deploymentInfo.errorInformation'

Health Checks and Monitoring

Container Health Checks

Health Check in Task Definition:

"healthCheck": {
    "command": [
        "CMD-SHELL",
        "curl -f http://localhost:8000/health || exit 1"
    ],
    "interval": 30,
    "timeout": 5,
    "retries": 3,
    "startPeriod": 60
}

Django Health Endpoint:

# myproject/views/health.py
from django.http import JsonResponse
from django.db import connection
from django.core.cache import cache

def health_check(request):
    """Health check endpoint for ECS."""
    checks = {
        "database": check_database(),
        "cache": check_cache(),
        "status": "healthy"
    }

    all_healthy = all(checks.values())
    status_code = 200 if all_healthy else 503

    return JsonResponse(checks, status=status_code)

def check_database():
    """Verify database connectivity."""
    try:
        connection.ensure_connection()
        return True
    except Exception:
        return False

def check_cache():
    """Verify cache connectivity."""
    try:
        cache.set("health_check", "ok", 10)
        return cache.get("health_check") == "ok"
    except Exception:
        return False

Health Check Best Practices: - Return 200 for healthy, 503 for unhealthy - Check critical dependencies (database, cache) - Timeout faster than ECS health check timeout - Don't perform expensive operations - Cache health check results briefly

Load Balancer Health Checks

Target Group Health Check:

aws elbv2 modify-target-group \
    --target-group-arn arn:aws:elasticloadbalancing:... \
    --health-check-protocol HTTP \
    --health-check-path /health \
    --health-check-interval-seconds 30 \
    --health-check-timeout-seconds 5 \
    --healthy-threshold-count 2 \
    --unhealthy-threshold-count 3

Health Check Parameters:

Parameter Recommended Purpose
interval 30 seconds Time between checks
timeout 5 seconds Max response time
healthy-threshold 2 Consecutive successes to be healthy
unhealthy-threshold 3 Consecutive failures to be unhealthy

Health Check Flow: 1. Task starts, enters startPeriod grace period 2. After grace period, health checks begin 3. Must pass healthy-threshold checks to receive traffic 4. If unhealthy-threshold failures, task is killed and replaced

CloudWatch Alarms

CPU Alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name django-cpu-high \
    --alarm-description "Alert when CPU exceeds 80%" \
    --metric-name CPUUtilization \
    --namespace AWS/ECS \
    --statistic Average \
    --period 300 \
    --evaluation-periods 2 \
    --threshold 80 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=ServiceName,Value=django-web Name=ClusterName,Value=production-cluster

Error Rate Alarm:

aws cloudwatch put-metric-alarm \
    --alarm-name django-5xx-errors \
    --alarm-description "Alert on 5xx errors" \
    --metric-name HTTPCode_Target_5XX_Count \
    --namespace AWS/ApplicationELB \
    --statistic Sum \
    --period 60 \
    --evaluation-periods 2 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold \
    --dimensions Name=TargetGroup,Value=targetgroup/django-web/abc123

Recommended Alarms:

Alarm Metric Threshold Action
CPU High CPUUtilization >80% Scale out or rollback
Memory High MemoryUtilization >85% Scale out or rollback
5xx Errors HTTPCode_Target_5XX >10/min Rollback deployment
Slow Response TargetResponseTime >2 seconds Investigate performance
Unhealthy Hosts UnHealthyHostCount >0 Alert on-call

Rollback Strategies

Automatic Rollback

CodeDeploy automatically rolls back when:

Deployment Failures: - New tasks fail to reach healthy state - Health check failures exceed threshold - Deployment times out

CloudWatch Alarms: - CPU exceeds threshold during deployment - Error rate spikes during deployment - Custom alarm triggers

Configuration:

{
    "autoRollbackConfiguration": {
        "enabled": true,
        "events": [
            "DEPLOYMENT_FAILURE",
            "DEPLOYMENT_STOP_ON_ALARM"
        ]
    },
    "alarmConfiguration": {
        "enabled": true,
        "alarms": [
            {"name": "django-5xx-errors"},
            {"name": "django-cpu-high"}
        ],
        "ignorePollAlarmFailure": false
    }
}

Rollback Process: 1. CodeDeploy detects failure condition 2. Stops routing traffic to green environment 3. Routes all traffic back to blue environment 4. Terminates green tasks 5. Sends SNS notification

Rollback Time: 2-5 minutes (faster than new deployment)

Manual Rollback

When to Manually Rollback: - Business logic errors discovered after deployment - Data consistency issues - Performance degradation not caught by alarms - User-reported critical bugs

Manual Rollback Methods:

Method 1: Redeploy Previous Task Definition

# List recent task definitions
aws ecs list-task-definitions \
    --family-prefix django-webapp \
    --sort DESC \
    --max-items 5

# Create deployment with previous revision
aws deploy create-deployment \
    --application-name django-app \
    --deployment-group-name production \
    --revision revisionType=AppSpecContent,appSpecContent={content='{"version":1,"Resources":[{"TargetService":{"Type":"AWS::ECS::Service","Properties":{"TaskDefinition":"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:14"}}}]}'}

Method 2: Stop Active Deployment

# Stop current deployment (triggers auto-rollback)
aws deploy stop-deployment \
    --deployment-id d-XXXXXXXXX \
    --auto-rollback-enabled

Method 3: Update Service Directly (Emergency)

# Bypass CodeDeploy and update service directly
aws ecs update-service \
    --cluster production-cluster \
    --service django-web \
    --task-definition django-webapp:14 \
    --force-new-deployment

Emergency Only

Direct service updates bypass CodeDeploy safety checks. Use only in emergencies when CodeDeploy itself is failing.

Database Rollback Considerations

Forward-Only Migrations: - Never write destructive migrations (DROP COLUMN, DROP TABLE) - Deploy schema changes separately from code changes - Use feature flags to enable new features

Two-Phase Deployment:

Phase 1: Schema Change

# Migration: Add nullable column
class Migration(migrations.Migration):
    operations = [
        migrations.AddField(
            model_name='user',
            name='new_field',
            field=models.CharField(max_length=100, null=True, blank=True),
        ),
    ]

Phase 2 (after code deployed): Make Required

# Migration: Make column non-nullable (after data backfilled)
class Migration(migrations.Migration):
    operations = [
        migrations.AlterField(
            model_name='user',
            name='new_field',
            field=models.CharField(max_length=100),
        ),
    ]

Rollback-Safe Pattern: - Old code works with new schema (column nullable) - New code works with new schema - Rollback to old code still works

Complete Deployment Example

End-to-End Deployment Workflow

#!/bin/bash
set -e

# Configuration
CLUSTER="production-cluster"
SERVICE="django-web"
TASK_FAMILY="django-webapp"
APP_NAME="django-app"
DEPLOY_GROUP="production"
ECR_REGISTRY="123456789012.dkr.ecr.us-east-1.amazonaws.com"
IMAGE_NAME="django-app"
COMMIT_SHA=$(git rev-parse --short HEAD)

echo "🚀 Starting deployment of ${COMMIT_SHA}"

# Step 1: Build and push image
echo "📦 Building Docker image..."
docker build -t ${IMAGE_NAME}:${COMMIT_SHA} .

echo "🔐 Logging into ECR..."
aws ecr get-login-password --region us-east-1 | \
    docker login --username AWS --password-stdin ${ECR_REGISTRY}

echo "⬆️ Pushing image to ECR..."
docker tag ${IMAGE_NAME}:${COMMIT_SHA} ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}
docker push ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}

# Step 2: Update task definition
echo "📝 Updating task definition..."
TASK_DEF=$(aws ecs describe-task-definition --task-definition ${TASK_FAMILY})

NEW_TASK_DEF=$(echo $TASK_DEF | \
    jq --arg IMAGE "${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}" \
       '.taskDefinition | .containerDefinitions[0].image = $IMAGE' | \
    jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')

NEW_TASK_ARN=$(aws ecs register-task-definition \
    --cli-input-json "$NEW_TASK_DEF" \
    --query 'taskDefinition.taskDefinitionArn' \
    --output text)

echo "✅ Registered new task definition: ${NEW_TASK_ARN}"

# Step 3: Run database migrations (before deployment)
echo "🗄️ Running database migrations..."
aws ecs run-task \
    --cluster ${CLUSTER} \
    --task-definition ${TASK_FAMILY} \
    --launch-type FARGATE \
    --network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-migration123],assignPublicIp=DISABLED}" \
    --overrides '{
        "containerOverrides": [{
            "name": "web",
            "command": ["python", "manage.py", "migrate", "--noinput"]
        }]
    }'

echo "⏳ Waiting for migrations to complete..."
sleep 30

# Step 4: Create CodeDeploy deployment
echo "🚢 Creating CodeDeploy deployment..."

APPSPEC=$(cat <<EOF
{
  "version": 1,
  "Resources": [{
    "TargetService": {
      "Type": "AWS::ECS::Service",
      "Properties": {
        "TaskDefinition": "${NEW_TASK_ARN}",
        "LoadBalancerInfo": {
          "ContainerName": "web",
          "ContainerPort": 8000
        }
      }
    }
  }]
}
EOF
)

DEPLOYMENT_ID=$(aws deploy create-deployment \
    --application-name ${APP_NAME} \
    --deployment-group-name ${DEPLOY_GROUP} \
    --description "Deploy ${COMMIT_SHA}" \
    --revision revisionType=AppSpecContent,appSpecContent="{content='${APPSPEC}'}" \
    --query 'deploymentId' \
    --output text)

echo "📊 Deployment ID: ${DEPLOYMENT_ID}"
echo "🔗 https://console.aws.amazon.com/codesuite/codedeploy/deployments/${DEPLOYMENT_ID}"

# Step 5: Wait for deployment
echo "⏳ Waiting for deployment to complete..."
aws deploy wait deployment-successful --deployment-id ${DEPLOYMENT_ID}

echo "✅ Deployment successful!"
echo "🎉 Version ${COMMIT_SHA} is now live!"

Next Steps

After understanding ECS deployment:

  1. CI/CD Overview: Review complete pipeline architecture
  2. GitHub Actions: Automate deployments in CI/CD
  3. Monitoring: Implement comprehensive monitoring
  4. Configuration: Manage secrets and parameters

Start Simple

Begin with basic ECS service deployment, then add CodeDeploy, then implement blue-green deployments. Each layer adds safety but also complexity.

Internal Documentation: - Container Building: Optimize Docker images for ECS - Environment Variables: Configuration management - AWS SSM Parameters: Secrets storage - 12-Factor App: Design principles

External Resources: - ECS Best Practices Guide - ECS Documentation - CodeDeploy Documentation - Fargate Pricing - ECS Workshop