AWS ECS Deployment¶
Amazon Elastic Container Service (ECS) orchestrates Docker containers on AWS infrastructure. ECS handles container scheduling, networking, load balancing, and health monitoring, providing a managed platform for running containerized applications.
This guide covers ECS deployment patterns for Django applications, from task definitions to blue-green deployments with CodeDeploy.
Philosophy
ECS should be invisible infrastructure. Your application shouldn't know it's running on ECS. Configuration lives in task definitions and environment variables, not in code. Deployments should be zero-downtime by default.
ECS Fundamentals¶
Core Concepts¶
ECS Cluster:
- Logical grouping of tasks and services
- Can run on Fargate (serverless) or EC2 (self-managed)
- Isolated networking and IAM boundaries
- Example: production-cluster, staging-cluster
Task Definition: - Blueprint for running containers - Specifies: image, CPU, memory, environment variables, networking - Versioned (revision number increments on changes) - Immutable (create new revision to change)
Task: - Running instance of a task definition - One or more containers running together - Can be long-running (service) or one-time (scheduled job) - Has unique task ID
Service: - Maintains desired number of tasks running - Integrates with load balancer for web traffic - Handles rolling deployments and health checks - Automatically replaces failed tasks
Container: - Single Docker container within a task - Defined in task definition - Can be essential (task fails if container stops) or non-essential
ECS Architecture for Django¶
graph TB
ALB[Application Load Balancer]
TG1[Target Group Blue]
TG2[Target Group Green]
subgraph ECS_Cluster[ECS Cluster]
subgraph Service[ECS Service]
T1[Task 1<br/>web + worker]
T2[Task 2<br/>web + worker]
T3[Task 3<br/>web + worker]
end
end
ECR[ECR Repository]
RDS[(RDS Database)]
REDIS[(ElastiCache Redis)]
ALB --> TG1
ALB --> TG2
TG1 --> T1
TG1 --> T2
TG2 --> T3
T1 --> RDS
T2 --> RDS
T3 --> RDS
T1 --> REDIS
T2 --> REDIS
T3 --> REDIS
ECR -.Pull Image.-> T1
ECR -.Pull Image.-> T2
ECR -.Pull Image.-> T3
style ALB fill:#ff9900
style ECS_Cluster fill:#f0f0f0
style Service fill:#e1f5ff
Traffic Flow: 1. User request hits Application Load Balancer 2. ALB routes to target group (blue or green) 3. Target group distributes across healthy tasks 4. Task's web container processes request 5. Container connects to RDS/Redis as needed
Task Definition Structure¶
Web Application Task Definition¶
A complete task definition for Django web application:
{
"family": "django-webapp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
"containerDefinitions": [
{
"name": "log_router",
"image": "grafana/fluent-bit-plugin-loki:latest",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true"
}
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/log-router",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "firelens"
}
}
},
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
"essential": true,
"portMappings": [
{
"containerPort": 8000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "DJANGO_SETTINGS_MODULE",
"value": "myproject.settings.production"
},
{
"name": "ENVIRONMENT",
"value": "production"
},
{
"name": "AWS_REGION",
"value": "us-east-1"
}
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
},
{
"name": "SECRET_KEY",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:prod/django-secret-key"
}
],
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"Name": "grafana-loki",
"Url": "https://logs.example.com/loki/api/v1/push",
"Labels": "{job=\"django-web\",environment=\"production\"}",
"RemoveKeys": "container_id,ecs_task_arn"
}
},
"healthCheck": {
"command": [
"CMD-SHELL",
"curl -f http://localhost:8000/health || exit 1"
],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
},
"dependsOn": [
{
"containerName": "log_router",
"condition": "START"
}
]
}
]
}
Key Components:
Resource Allocation:
- cpu: Task-level CPU (1024 = 1 vCPU)
- memory: Task-level memory in MiB
- Must match Fargate valid combinations
IAM Roles:
- executionRoleArn: ECS uses to pull images, fetch secrets
- taskRoleArn: Application uses for AWS API calls
Container Configuration:
- essential: true: Task fails if container stops
- portMappings: Expose container ports
- environment: Plain-text environment variables
- secrets: Sensitive data from SSM/Secrets Manager
Logging:
- logConfiguration: Where logs go
- awsfirelens: Route logs to custom destinations
- awslogs: CloudWatch Logs (default)
Health Checks:
- command: Shell command to test health
- interval: Seconds between checks
- timeout: Max seconds for check to complete
- retries: Failed checks before unhealthy
- startPeriod: Grace period for app startup
Scheduled Task Definition¶
For background jobs and cron-like tasks:
{
"family": "django-scheduled-job",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/djangoAppTaskRole",
"containerDefinitions": [
{
"name": "worker",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/django-app:abc123",
"essential": true,
"command": [
"python",
"manage.py",
"sync_external_data"
],
"environment": [
{
"name": "DJANGO_SETTINGS_MODULE",
"value": "myproject.settings.production"
}
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/database-url"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/django-scheduled-job",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "worker"
}
}
}
]
}
Scheduled Task Characteristics: - No port mappings (doesn't serve traffic) - No health checks (runs to completion) - Different command than web containers - Lower resource allocation - Triggered by EventBridge schedule
Fargate vs EC2 Decision¶
Fargate (Serverless Containers)¶
How Fargate Works: - AWS manages underlying infrastructure - Pay per-second for CPU and memory - No server management required - Automatic scaling
When to Use Fargate: - Small to medium workloads (most Django apps) - Variable traffic patterns - Don't want to manage EC2 instances - Cost-effective for <100 tasks - Need rapid scaling
Fargate Pricing Example: - 1 vCPU, 2 GB memory - $0.04048 per vCPU-hour - \(0.004445 per GB-hour - Total: ~\)29/month per task (if running 24/7)
Fargate CPU/Memory Combinations:
| vCPU | Memory Options (GB) |
|---|---|
| 0.25 | 0.5, 1, 2 |
| 0.5 | 1, 2, 3, 4 |
| 1 | 2, 3, 4, 5, 6, 7, 8 |
| 2 | 4-16 (1 GB increments) |
| 4 | 8-30 (1 GB increments) |
EC2 Launch Type¶
How EC2 Launch Type Works: - You provision and manage EC2 instances - ECS schedules tasks on your instances - Bin-packing to maximize instance usage - Pay for EC2 instances (not per task)
When to Use EC2: - Large workloads (>100 tasks) - Steady, predictable traffic - Need specialized instance types (GPU, ARM) - Cost-effective for high utilization - Require specific instance features
EC2 Cost Example: - m5.xlarge: 4 vCPU, 16 GB memory - \(0.192/hour = ~\)140/month - Can run 8 tasks (0.5 vCPU each) = $17.50/task - Cheaper than Fargate at high utilization
Decision Matrix¶
| Factor | Fargate | EC2 |
|---|---|---|
| Management | AWS manages | You manage |
| Scaling | Instant | Launch instances first |
| Cost (low scale) | Lower | Higher |
| Cost (high scale) | Higher | Lower |
| Flexibility | Limited configs | Full control |
| Startup time | 30-60 seconds | Instant (if instances ready) |
| Recommendation | Default choice | High-scale workloads |
For Most Django Apps: Use Fargate - Simpler operations - Better for variable traffic - No instance management overhead - Cost-effective up to ~50-100 tasks
Service Configuration¶
Creating an ECS Service¶
Service Definition:
aws ecs create-service \
--cluster production-cluster \
--service-name django-web \
--task-definition django-webapp:15 \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-abc123,subnet-def456],
securityGroups=[sg-web123],
assignPublicIp=DISABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/django-web/abc123,
containerName=web,
containerPort=8000" \
--health-check-grace-period-seconds 60 \
--deployment-configuration "maximumPercent=200,minimumHealthyPercent=100" \
--enable-execute-command
Service Parameters:
Desired Count: - Number of tasks to keep running - 3 minimum for high availability - Auto-scaling adjusts this value
Network Configuration:
- subnets: Private subnets for tasks
- securityGroups: Firewall rules
- assignPublicIp: DISABLED for private subnets (use NAT gateway)
Load Balancer:
- targetGroupArn: ALB target group
- containerName: Which container receives traffic
- containerPort: Container's listening port
Deployment Configuration:
- maximumPercent: Maximum % of desired during deployment (200 = double)
- minimumHealthyPercent: Minimum % that must stay healthy (100 = no downtime)
Health Check Grace Period: - Seconds before failing health checks count - Allows application startup time - Should be longer than actual startup time
Service Auto-Scaling¶
Target Tracking Scaling:
# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/production-cluster/django-web \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 3 \
--max-capacity 20
# Create scaling policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/production-cluster/django-web \
--scalable-dimension ecs:service:DesiredCount \
--policy-name django-cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 300
}'
Scaling Metrics:
| Metric | Target Value | Use Case |
|---|---|---|
| CPU Utilization | 70% | CPU-bound tasks |
| Memory Utilization | 80% | Memory-intensive apps |
| Request Count | 1000/task | Request-driven scaling |
| Custom Metric | Variable | Business logic |
Cooldown Periods:
- ScaleOutCooldown: Seconds before another scale-out (60s)
- ScaleInCooldown: Seconds before another scale-in (300s)
- Prevents rapid scaling oscillation
CodeDeploy Integration¶
Blue-Green Deployment Architecture¶
CodeDeploy orchestrates blue-green deployments on ECS:
graph TB
Start[Start Deployment] --> Deploy[Deploy to Green]
Deploy --> Health[Health Checks]
Health --> Pass{Healthy?}
Pass -->|Yes| Shift[Shift Traffic]
Pass -->|No| Rollback1[Auto Rollback]
Shift --> Monitor[Monitor Metrics]
Monitor --> Stable{Stable?}
Stable -->|Yes| Terminate[Terminate Blue]
Stable -->|No| Rollback2[Auto Rollback]
Rollback1 --> Alert[Alert Team]
Rollback2 --> Alert
Terminate --> Complete[Deployment Complete]
style Start fill:#e1f5ff
style Complete fill:#d4edda
style Rollback1 fill:#f8d7da
style Rollback2 fill:#f8d7da
Deployment Process:
- Provision Green Environment:
- Create new tasks with new image
- Wait for tasks to reach healthy state
-
Validate health checks pass
-
Traffic Shift:
- Gradually route traffic from blue to green
- Monitor error rates during shift
-
Can be instant, linear, or canary
-
Validation Period:
- Monitor CloudWatch metrics
- Check error rate, latency, CPU
-
Auto-rollback if alarms trigger
-
Cleanup:
- After successful validation, terminate blue tasks
- Blue environment remains briefly for rapid rollback
CodeDeploy Application Setup¶
Create CodeDeploy Application:
# Create application
aws deploy create-application \
--application-name django-app \
--compute-platform ECS
# Create deployment group
aws deploy create-deployment-group \
--application-name django-app \
--deployment-group-name production \
--service-role-arn arn:aws:iam::123456789012:role/CodeDeployServiceRole \
--ecs-services "clusterName=production-cluster,serviceName=django-web" \
--load-balancer-info '{
"targetGroupPairInfoList": [{
"targetGroups": [
{"name": "django-web-blue"},
{"name": "django-web-green"}
],
"prodTrafficRoute": {
"listenerArns": ["arn:aws:elasticloadbalancing:..."]
}
}]
}' \
--deployment-config-name CodeDeployDefault.ECSAllAtOnce \
--auto-rollback-configuration '{
"enabled": true,
"events": ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_ALARM"]
}' \
--alarm-configuration '{
"enabled": true,
"alarms": [
{"name": "django-5xx-errors"},
{"name": "django-cpu-high"}
]
}'
Key Configuration:
Target Group Pair: - Blue and green target groups - Load balancer switches between them - Both attached to same ALB listener
Deployment Configuration:
- ECSAllAtOnce: Instant traffic shift
- ECSLinear10PercentEvery1Minutes: Gradual shift
- ECSCanary10Percent5Minutes: Canary then full shift
Auto-Rollback: - Triggers on deployment failure - Triggers on CloudWatch alarm - Instantly reverts to blue environment
AppSpec File¶
AppSpec for ECS (embedded in deployment):
{
"version": 1,
"Resources": [
{
"TargetService": {
"Type": "AWS::ECS::Service",
"Properties": {
"TaskDefinition": "arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15",
"LoadBalancerInfo": {
"ContainerName": "web",
"ContainerPort": 8000
},
"PlatformVersion": "LATEST",
"NetworkConfiguration": {
"AwsvpcConfiguration": {
"Subnets": ["subnet-abc123", "subnet-def456"],
"SecurityGroups": ["sg-web123"],
"AssignPublicIp": "DISABLED"
}
}
}
}
}
],
"Hooks": [
{
"BeforeInstall": "LambdaFunctionToRunDatabaseMigrations"
},
{
"AfterInstall": "LambdaFunctionToWarmUpCache"
},
{
"AfterAllowTestTraffic": "LambdaFunctionToRunSmokeTests"
}
]
}
Deployment Hooks:
| Hook | Timing | Use Case |
|---|---|---|
BeforeInstall |
Before new tasks start | Run migrations |
AfterInstall |
After new tasks healthy | Warm cache |
AfterAllowTestTraffic |
After test traffic routed | Smoke tests |
BeforeAllowTraffic |
Before production traffic | Final validation |
AfterAllowTraffic |
After production traffic | Post-deploy tasks |
Creating a Deployment¶
CLI Deployment:
# Create deployment
aws deploy create-deployment \
--application-name django-app \
--deployment-group-name production \
--description "Deploy version abc123" \
--revision '{
"revisionType": "AppSpecContent",
"appSpecContent": {
"content": "{\"version\":1,\"Resources\":[{\"TargetService\":{\"Type\":\"AWS::ECS::Service\",\"Properties\":{\"TaskDefinition\":\"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:15\",\"LoadBalancerInfo\":{\"ContainerName\":\"web\",\"ContainerPort\":8000}}}}]}"
}
}'
Deployment via GitHub Actions:
- name: Deploy to ECS
run: |
# Update task definition with new image
TASK_DEFINITION=$(aws ecs describe-task-definition \
--task-definition django-webapp \
--query 'taskDefinition' \
| jq '.containerDefinitions[0].image = "${{ env.ECR_REGISTRY }}/django-app:${{ github.sha }}"' \
| jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')
# Register new task definition revision
NEW_TASK_DEF=$(aws ecs register-task-definition \
--cli-input-json "$TASK_DEFINITION" \
--query 'taskDefinition.taskDefinitionArn' \
--output text)
# Create AppSpec
APPSPEC=$(cat <<EOF
{
"version": 1,
"Resources": [{
"TargetService": {
"Type": "AWS::ECS::Service",
"Properties": {
"TaskDefinition": "$NEW_TASK_DEF",
"LoadBalancerInfo": {
"ContainerName": "web",
"ContainerPort": 8000
}
}
}
}]
}
EOF
)
# Create deployment
aws deploy create-deployment \
--application-name django-app \
--deployment-group-name production \
--revision revisionType=AppSpecContent,appSpecContent={content="$APPSPEC"}
Monitoring Deployment:
# Get deployment ID from create-deployment output
DEPLOYMENT_ID="d-XXXXXXXXX"
# Watch deployment status
aws deploy get-deployment \
--deployment-id $DEPLOYMENT_ID \
--query 'deploymentInfo.status' \
--output text
# Wait for completion
aws deploy wait deployment-successful \
--deployment-id $DEPLOYMENT_ID
# If failed, get failure reason
aws deploy get-deployment \
--deployment-id $DEPLOYMENT_ID \
--query 'deploymentInfo.errorInformation'
Health Checks and Monitoring¶
Container Health Checks¶
Health Check in Task Definition:
"healthCheck": {
"command": [
"CMD-SHELL",
"curl -f http://localhost:8000/health || exit 1"
],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
Django Health Endpoint:
# myproject/views/health.py
from django.http import JsonResponse
from django.db import connection
from django.core.cache import cache
def health_check(request):
"""Health check endpoint for ECS."""
checks = {
"database": check_database(),
"cache": check_cache(),
"status": "healthy"
}
all_healthy = all(checks.values())
status_code = 200 if all_healthy else 503
return JsonResponse(checks, status=status_code)
def check_database():
"""Verify database connectivity."""
try:
connection.ensure_connection()
return True
except Exception:
return False
def check_cache():
"""Verify cache connectivity."""
try:
cache.set("health_check", "ok", 10)
return cache.get("health_check") == "ok"
except Exception:
return False
Health Check Best Practices: - Return 200 for healthy, 503 for unhealthy - Check critical dependencies (database, cache) - Timeout faster than ECS health check timeout - Don't perform expensive operations - Cache health check results briefly
Load Balancer Health Checks¶
Target Group Health Check:
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:... \
--health-check-protocol HTTP \
--health-check-path /health \
--health-check-interval-seconds 30 \
--health-check-timeout-seconds 5 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3
Health Check Parameters:
| Parameter | Recommended | Purpose |
|---|---|---|
interval |
30 seconds | Time between checks |
timeout |
5 seconds | Max response time |
healthy-threshold |
2 | Consecutive successes to be healthy |
unhealthy-threshold |
3 | Consecutive failures to be unhealthy |
Health Check Flow:
1. Task starts, enters startPeriod grace period
2. After grace period, health checks begin
3. Must pass healthy-threshold checks to receive traffic
4. If unhealthy-threshold failures, task is killed and replaced
CloudWatch Alarms¶
CPU Alarm:
aws cloudwatch put-metric-alarm \
--alarm-name django-cpu-high \
--alarm-description "Alert when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/ECS \
--statistic Average \
--period 300 \
--evaluation-periods 2 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=ServiceName,Value=django-web Name=ClusterName,Value=production-cluster
Error Rate Alarm:
aws cloudwatch put-metric-alarm \
--alarm-name django-5xx-errors \
--alarm-description "Alert on 5xx errors" \
--metric-name HTTPCode_Target_5XX_Count \
--namespace AWS/ApplicationELB \
--statistic Sum \
--period 60 \
--evaluation-periods 2 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=TargetGroup,Value=targetgroup/django-web/abc123
Recommended Alarms:
| Alarm | Metric | Threshold | Action |
|---|---|---|---|
| CPU High | CPUUtilization | >80% | Scale out or rollback |
| Memory High | MemoryUtilization | >85% | Scale out or rollback |
| 5xx Errors | HTTPCode_Target_5XX | >10/min | Rollback deployment |
| Slow Response | TargetResponseTime | >2 seconds | Investigate performance |
| Unhealthy Hosts | UnHealthyHostCount | >0 | Alert on-call |
Rollback Strategies¶
Automatic Rollback¶
CodeDeploy automatically rolls back when:
Deployment Failures: - New tasks fail to reach healthy state - Health check failures exceed threshold - Deployment times out
CloudWatch Alarms: - CPU exceeds threshold during deployment - Error rate spikes during deployment - Custom alarm triggers
Configuration:
{
"autoRollbackConfiguration": {
"enabled": true,
"events": [
"DEPLOYMENT_FAILURE",
"DEPLOYMENT_STOP_ON_ALARM"
]
},
"alarmConfiguration": {
"enabled": true,
"alarms": [
{"name": "django-5xx-errors"},
{"name": "django-cpu-high"}
],
"ignorePollAlarmFailure": false
}
}
Rollback Process: 1. CodeDeploy detects failure condition 2. Stops routing traffic to green environment 3. Routes all traffic back to blue environment 4. Terminates green tasks 5. Sends SNS notification
Rollback Time: 2-5 minutes (faster than new deployment)
Manual Rollback¶
When to Manually Rollback: - Business logic errors discovered after deployment - Data consistency issues - Performance degradation not caught by alarms - User-reported critical bugs
Manual Rollback Methods:
Method 1: Redeploy Previous Task Definition
# List recent task definitions
aws ecs list-task-definitions \
--family-prefix django-webapp \
--sort DESC \
--max-items 5
# Create deployment with previous revision
aws deploy create-deployment \
--application-name django-app \
--deployment-group-name production \
--revision revisionType=AppSpecContent,appSpecContent={content='{"version":1,"Resources":[{"TargetService":{"Type":"AWS::ECS::Service","Properties":{"TaskDefinition":"arn:aws:ecs:us-east-1:123456789012:task-definition/django-webapp:14"}}}]}'}
Method 2: Stop Active Deployment
# Stop current deployment (triggers auto-rollback)
aws deploy stop-deployment \
--deployment-id d-XXXXXXXXX \
--auto-rollback-enabled
Method 3: Update Service Directly (Emergency)
# Bypass CodeDeploy and update service directly
aws ecs update-service \
--cluster production-cluster \
--service django-web \
--task-definition django-webapp:14 \
--force-new-deployment
Emergency Only
Direct service updates bypass CodeDeploy safety checks. Use only in emergencies when CodeDeploy itself is failing.
Database Rollback Considerations¶
Forward-Only Migrations: - Never write destructive migrations (DROP COLUMN, DROP TABLE) - Deploy schema changes separately from code changes - Use feature flags to enable new features
Two-Phase Deployment:
Phase 1: Schema Change
# Migration: Add nullable column
class Migration(migrations.Migration):
operations = [
migrations.AddField(
model_name='user',
name='new_field',
field=models.CharField(max_length=100, null=True, blank=True),
),
]
Phase 2 (after code deployed): Make Required
# Migration: Make column non-nullable (after data backfilled)
class Migration(migrations.Migration):
operations = [
migrations.AlterField(
model_name='user',
name='new_field',
field=models.CharField(max_length=100),
),
]
Rollback-Safe Pattern: - Old code works with new schema (column nullable) - New code works with new schema - Rollback to old code still works
Complete Deployment Example¶
End-to-End Deployment Workflow¶
#!/bin/bash
set -e
# Configuration
CLUSTER="production-cluster"
SERVICE="django-web"
TASK_FAMILY="django-webapp"
APP_NAME="django-app"
DEPLOY_GROUP="production"
ECR_REGISTRY="123456789012.dkr.ecr.us-east-1.amazonaws.com"
IMAGE_NAME="django-app"
COMMIT_SHA=$(git rev-parse --short HEAD)
echo "🚀 Starting deployment of ${COMMIT_SHA}"
# Step 1: Build and push image
echo "📦 Building Docker image..."
docker build -t ${IMAGE_NAME}:${COMMIT_SHA} .
echo "🔐 Logging into ECR..."
aws ecr get-login-password --region us-east-1 | \
docker login --username AWS --password-stdin ${ECR_REGISTRY}
echo "⬆️ Pushing image to ECR..."
docker tag ${IMAGE_NAME}:${COMMIT_SHA} ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}
docker push ${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}
# Step 2: Update task definition
echo "📝 Updating task definition..."
TASK_DEF=$(aws ecs describe-task-definition --task-definition ${TASK_FAMILY})
NEW_TASK_DEF=$(echo $TASK_DEF | \
jq --arg IMAGE "${ECR_REGISTRY}/${IMAGE_NAME}:${COMMIT_SHA}" \
'.taskDefinition | .containerDefinitions[0].image = $IMAGE' | \
jq 'del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy)')
NEW_TASK_ARN=$(aws ecs register-task-definition \
--cli-input-json "$NEW_TASK_DEF" \
--query 'taskDefinition.taskDefinitionArn' \
--output text)
echo "✅ Registered new task definition: ${NEW_TASK_ARN}"
# Step 3: Run database migrations (before deployment)
echo "🗄️ Running database migrations..."
aws ecs run-task \
--cluster ${CLUSTER} \
--task-definition ${TASK_FAMILY} \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc123],securityGroups=[sg-migration123],assignPublicIp=DISABLED}" \
--overrides '{
"containerOverrides": [{
"name": "web",
"command": ["python", "manage.py", "migrate", "--noinput"]
}]
}'
echo "⏳ Waiting for migrations to complete..."
sleep 30
# Step 4: Create CodeDeploy deployment
echo "🚢 Creating CodeDeploy deployment..."
APPSPEC=$(cat <<EOF
{
"version": 1,
"Resources": [{
"TargetService": {
"Type": "AWS::ECS::Service",
"Properties": {
"TaskDefinition": "${NEW_TASK_ARN}",
"LoadBalancerInfo": {
"ContainerName": "web",
"ContainerPort": 8000
}
}
}
}]
}
EOF
)
DEPLOYMENT_ID=$(aws deploy create-deployment \
--application-name ${APP_NAME} \
--deployment-group-name ${DEPLOY_GROUP} \
--description "Deploy ${COMMIT_SHA}" \
--revision revisionType=AppSpecContent,appSpecContent="{content='${APPSPEC}'}" \
--query 'deploymentId' \
--output text)
echo "📊 Deployment ID: ${DEPLOYMENT_ID}"
echo "🔗 https://console.aws.amazon.com/codesuite/codedeploy/deployments/${DEPLOYMENT_ID}"
# Step 5: Wait for deployment
echo "⏳ Waiting for deployment to complete..."
aws deploy wait deployment-successful --deployment-id ${DEPLOYMENT_ID}
echo "✅ Deployment successful!"
echo "🎉 Version ${COMMIT_SHA} is now live!"
Next Steps¶
After understanding ECS deployment:
- CI/CD Overview: Review complete pipeline architecture
- GitHub Actions: Automate deployments in CI/CD
- Monitoring: Implement comprehensive monitoring
- Configuration: Manage secrets and parameters
Start Simple
Begin with basic ECS service deployment, then add CodeDeploy, then implement blue-green deployments. Each layer adds safety but also complexity.
Related Resources¶
Internal Documentation: - Container Building: Optimize Docker images for ECS - Environment Variables: Configuration management - AWS SSM Parameters: Secrets storage - 12-Factor App: Design principles
External Resources: - ECS Best Practices Guide - ECS Documentation - CodeDeploy Documentation - Fargate Pricing - ECS Workshop