Secrets Management¶
Overview¶
This document establishes the theoretical foundations and practical strategies for managing secrets in production Django applications. Secrets management encompasses the entire lifecycle: generation, storage, access, rotation, and emergency response. The approach detailed here reflects production best practices for small development teams deploying to AWS ECS.
Secrets Management Strategy¶
What Is a Secret?¶
A secret is any piece of information that, if disclosed, could compromise system security, data integrity, or privacy. The classification of what constitutes a secret is broader than many developers initially assume.
Obvious Secrets: - Database passwords - API keys and tokens - OAuth client secrets - Encryption keys - Private keys (SSH, SSL/TLS)
Less Obvious Secrets: - Django SECRET_KEY (session signing, CSRF protection) - Session cookies with sensitive data - JWT signing keys - Third-party service tokens (SendGrid, Auth0) - Webhook signing secrets - Internal service-to-service tokens
Not Secrets: - Public API endpoints - Database hostnames (without credentials) - Public OAuth client IDs - Feature flag names (but not their values if used for security) - Application configuration (timeout values, pool sizes)
Theory: The test for whether something is a secret: "If this were published on GitHub, would it compromise security or privacy?" If yes, it's a secret.
Secrets Lifecycle¶
Secrets have a lifecycle that must be managed:
- Generation: Creating cryptographically secure secrets
- Storage: Encrypting and storing secrets securely
- Access: Granting applications access with minimal privileges
- Rotation: Periodically changing secrets to limit exposure window
- Revocation: Immediately invalidating compromised secrets
Theory: Most security breaches stem from failures in one lifecycle phase. Weak generation creates guessable secrets. Poor storage exposes them at rest. Overly broad access gives attackers lateral movement. Lack of rotation means old compromises remain exploitable. Slow revocation allows attackers to maintain access.
Defense in Depth¶
No single security mechanism is perfect. Effective secrets management employs multiple overlapping layers:
- Never in version control - Secrets don't belong in Git
- Encrypted at rest - SecureString in SSM, KMS encryption
- Encrypted in transit - TLS for all network communication
- Least privilege access - IAM policies limit who/what can access
- Audit logging - CloudTrail tracks all secret access
- Time-limited exposure - Secrets rotate regularly
- Application isolation - Secrets loaded once at startup, not persisted
Theory: Defense in depth assumes any single layer can fail. Even if secrets leak from one layer (e.g., memory dump), other layers (encryption, access controls) limit damage. The goal is to make secret compromise require multiple simultaneous failures.
SSM Parameter Store vs AWS Secrets Manager¶
Architectural Differences¶
Both services store secrets, but they serve different use cases with different features:
SSM Parameter Store: - Design goal: General-purpose configuration and secrets storage - Pricing: Free for standard parameters (up to 10,000) - Features: Simple key-value, versioning, hierarchical names - Rotation: Manual (application-managed) - Best for: Static configuration, service tokens, API keys
AWS Secrets Manager: - Design goal: Database credentials and auto-rotating secrets - Pricing: $0.40/secret/month + $0.05/10k API calls - Features: JSON secrets, automatic rotation, RDS integration - Rotation: Automatic with Lambda functions - Best for: Database passwords, OAuth tokens, rotating credentials
Theory: SSM Parameter Store is a general-purpose configuration store that happens to support encrypted secrets. Secrets Manager is purpose-built for secret lifecycle management with first-class rotation support. Choose based on whether you need automatic rotation.
When to Use Each¶
Use SSM Parameter Store when: - Secrets don't need automatic rotation - You manage rotation through deployment - Cost is a primary concern - You need hierarchical organization - Secrets are relatively static (API keys, service tokens)
Use AWS Secrets Manager when: - Database credentials that should rotate automatically - OAuth tokens with refresh flows - You want built-in RDS integration - Compliance requires automatic rotation - Secrets change frequently
Hybrid Approach (recommended for small teams): - SSM Parameter Store for most secrets (Django SECRET_KEY, API keys) - Secrets Manager for database credentials (if using automatic rotation) - Consistent access patterns (boto3 works for both)
Theory: For small teams on Django/ECS, SSM Parameter Store is usually sufficient. Database password rotation happens during deployment (new password, restart tasks). The cost savings are significant (thousands of dollars annually for large parameter sets). Add Secrets Manager selectively when automatic rotation provides clear value.
Cost Comparison Example¶
Scenario: 50 secrets, 10 million API calls/month
SSM Parameter Store: - Parameter storage: $0 (free tier covers 10,000 standard parameters) - API calls: $0 (free tier covers 40 standard API calls/second) - Total: $0/month
AWS Secrets Manager: - Secret storage: 50 × $0.40 = $20/month - API calls: 10,000,000 ÷ 10,000 × $0.05 = $50/month - Total: $70/month
Theory: At scale, SSM Parameter Store's free tier represents substantial savings. For a small team, $70/month may be acceptable for the convenience of automatic rotation. Evaluate based on your rotation requirements, not just cost.
Decision Matrix¶
graph TD
A[Need to store secret] --> B{Requires automatic rotation?}
B -->|Yes| C{Database credential?}
B -->|No| D[SSM Parameter Store]
C -->|Yes| E[Secrets Manager with RDS integration]
C -->|No| F{High rotation frequency?}
F -->|Yes - daily/weekly| G[Secrets Manager with Lambda]
F -->|No - monthly/on-deployment| H[SSM Parameter Store]
D --> I[Cost: $0]
H --> I
E --> J[Cost: $0.40/secret/month]
G --> J
style I fill:#90EE90
style J fill:#FFE4B5
Rotation Policies¶
Why Rotate Secrets?¶
Secret rotation limits the window of exposure if a secret is compromised. An attacker who gains access to a secret has limited time to exploit it before rotation invalidates their access.
Theory: Security assumes eventual compromise. The question is not "if" but "when" secrets are exposed. Rotation reduces the value of old compromises. A secret compromised 6 months ago is worthless if rotated monthly.
Rotation Frequency Guidelines¶
Different secret types have different rotation schedules:
High-Frequency Rotation (Weekly-Monthly): - Database passwords (if using automatic rotation) - OAuth access tokens - Service-to-service authentication tokens - Secrets exposed to many systems
Medium-Frequency Rotation (Quarterly-Biannually): - API keys for third-party services - Django SECRET_KEY (complex, requires all sessions to invalidate) - Application service tokens - Webhook signing secrets
Low-Frequency Rotation (Annually or on events): - Private keys for SSL/TLS (certificate renewal) - Root encryption keys - Secrets with high change cost - Secrets requiring coordination across systems
Event-Driven Rotation (Immediate): - Known or suspected compromise - Employee departure - Service breach notification - Audit findings
Theory: Rotation frequency balances security benefit against operational cost. More frequent rotation reduces exposure window but increases complexity and risk of outages. The goal is to rotate often enough to limit damage but not so often that rotation itself becomes unreliable.
Rotation Strategies¶
Zero-Downtime Rotation Pattern:
1. Generate new secret
2. Deploy new secret alongside old secret
3. Update application to use new secret
4. Verify application works with new secret
5. Remove old secret
Theory: This pattern ensures continuous availability during rotation. Both old and new secrets are valid during transition. Only after confirming the new secret works is the old one removed.
Implementation:
# Django settings can support dual secrets during rotation
SECRET_KEYS = [
pconfig.get_param('DJANGO_SECRET_KEY_NEW'), # Try new first
pconfig.get_param('DJANGO_SECRET_KEY_OLD'), # Fallback to old
]
# Session middleware tries keys in order
for secret_key in SECRET_KEYS:
try:
session = decrypt_session(cookie, secret_key)
break
except InvalidSignature:
continue
Database Password Rotation:
- Create new password in database
- Store new password in SSM/Secrets Manager
- Deploy application update (reads new password)
- Verify application connectivity
- Revoke old password from database
Theory: Database rotation is more complex because it involves two systems (SSM + database). The application must handle both passwords during transition to avoid connection failures.
Automated vs Manual Rotation¶
Automated Rotation (Secrets Manager): - Lambda function generates new secret - Lambda updates database password - Lambda updates secret in Secrets Manager - Application automatically uses new value - No human intervention required
Manual Rotation (SSM Parameter Store): - Human generates new secret - Human updates database/service - Human updates SSM parameter - Deploy application to use new secret - Human verifies rotation succeeded
Theory: Automated rotation reduces human error and ensures rotation happens on schedule. Manual rotation provides more control and suits less frequent rotation schedules. For small teams, manual rotation during deployments is often sufficient.
Access Control Patterns¶
Principle of Least Privilege¶
Every application and service should have access to only the secrets it needs, and no more.
Theory: Least privilege limits lateral movement in case of compromise. If an attacker compromises the web application, they shouldn't automatically gain access to database admin credentials, billing API keys, or other services' secrets.
IAM Role-Based Access¶
ECS Task Role Pattern:
Each ECS service gets its own task role with specific parameter access:
// Web application task role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ssm:GetParameter*"],
"Resource": [
"arn:aws:ssm:*:*:parameter/prod/web/*",
"arn:aws:ssm:*:*:parameter/prod/shared/*"
]
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt"],
"Resource": "arn:aws:kms:*:*:key/web-kms-key-id"
}
]
}
// Background worker task role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ssm:GetParameter*"],
"Resource": [
"arn:aws:ssm:*:*:parameter/prod/worker/*",
"arn:aws:ssm:*:*:parameter/prod/shared/*"
]
},
{
"Effect": "Allow",
"Action": ["kms:Decrypt"],
"Resource": "arn:aws:kms:*:*:key/worker-kms-key-id"
}
]
}
Theory: Each service has its own namespace (/prod/web/*, /prod/worker/*) plus access to shared resources (/prod/shared/*). If the web app is compromised, workers' secrets remain protected.
Path-Based Organization¶
Organize parameters to enable path-based access control:
/prod/web/ ← Web app secrets
/prod/worker/ ← Background worker secrets
/prod/shared/ ← Shared secrets (database, cache)
/prod/admin/ ← Administrative secrets (higher security)
IAM Policy:
{
"Effect": "Allow",
"Action": ["ssm:GetParameter*"],
"Resource": "arn:aws:ssm:*:*:parameter/prod/${service}/*"
}
Theory: Path-based organization maps directly to IAM resource patterns. The service name in the path becomes a variable in the IAM policy, enabling consistent access patterns across services.
Time-Limited Access¶
For human access to secrets (debugging, incident response), use time-limited credentials:
# Assume role with 1-hour session
aws sts assume-role \
--role-arn "arn:aws:iam::123456789012:role/emergency-access" \
--role-session-name "incident-response-2025-10-03" \
--duration-seconds 3600
Theory: Time-limited credentials reduce the window of exposure if credentials are leaked. After expiration, the credentials are worthless. This is especially important for human access, which is more likely to be logged or shared insecurely.
Service-to-Service Authentication¶
When one service needs to call another, use dedicated service tokens:
/prod/web/SERVICE_TOKEN_FOR_API ← Web app uses this to call API
/prod/api/SERVICE_TOKEN_SECRET ← API validates against this
Theory: Service tokens are scoped to a single purpose (web → API). If compromised, an attacker can only impersonate that specific service relationship, not gain broader access.
Local Development Approaches¶
The Development Secrets Problem¶
Production secrets must never be used in development. But development still needs working secrets for testing integrations.
Theory: Development secrets should be:
1. Non-production values - Never real API keys or passwords
2. Clearly marked - Obviously not production
3. Low-security - No encryption needed locally
4. Version-controlled structure - Parameter names in .env.example
5. Locally generated - Each developer creates their own
Approach 1: LocalStack SSM¶
Use LocalStack to simulate SSM Parameter Store locally with development-safe values:
# scripts/init-localstack-ssm.py
import boto3
ssm = boto3.client(
'ssm',
endpoint_url='http://localhost:4566',
aws_access_key_id='test',
aws_secret_access_key='test'
)
# Create development secrets
dev_secrets = {
'/dev/django/SECRET_KEY': 'dev-secret-not-for-production',
'/dev/database/PASSWORD': 'postgres',
'/dev/auth0/CLIENT_SECRET': 'dev-auth0-secret',
'/dev/sendgrid/API_KEY': 'SG.fake-development-key',
}
for name, value in dev_secrets.items():
ssm.put_parameter(
Name=name,
Value=value,
Type='SecureString',
Overwrite=True
)
Theory: LocalStack provides the same API as production SSM, allowing identical code to run in both environments. Development secrets are clearly fake but allow full testing of the integration logic.
Approach 2: .env Fallback¶
For secrets that don't need SSM integration locally, use .env files:
# settings/development.py
import os
# Try SSM first, fall back to environment variable
try:
from poseidon.commons.config.ps_config import pconfig
SECRET_KEY = pconfig.get_param('DJANGO_SECRET_KEY')
except Exception:
# Fallback to .env for local development
SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY', 'dev-fallback-key')
Theory: This hybrid approach uses SSM when available (LocalStack) but falls back to .env for simpler local development. Developers can work without running LocalStack for quick testing.
Approach 3: Development-Specific Values¶
Some third-party services provide development modes or sandbox environments:
# settings/development.py
# Use Auth0 development tenant
AUTH0_DOMAIN = 'dev-tenant.auth0.com' # Not production
AUTH0_CLIENT_ID = 'dev-client-id' # Development app
# Use Stripe test mode
STRIPE_API_KEY = 'sk_test_...' # Test mode key (safe to commit)
# Use SendGrid sandbox mode
EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend' # Logs only
Theory: When services provide development modes, use them. Test API keys are designed to be shared and can often be committed to version control (check service documentation). This reduces friction for new developers.
Local Secret Generation¶
For secrets that must be unique per developer:
# Generate Django SECRET_KEY locally
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"
# Generate random token
python -c "import secrets; print(secrets.token_urlsafe(64))"
Add to .env:
Theory: Each developer generates their own local secrets. These are unique to their environment and never shared. The .env.example file documents what to generate but doesn't provide values.
Emergency Procedures¶
Suspected Secret Compromise¶
When you suspect a secret has been compromised:
Immediate Actions (First 15 Minutes):
- Confirm compromise - Verify the secret is actually exposed
- Assess scope - Which secret? Which systems use it?
- Revoke access - If possible, immediately invalidate the secret
- Alert team - Notify relevant team members
- Monitor for abuse - Check logs for unauthorized access
Short-Term Response (First Hour):
- Rotate the compromised secret - Generate new value
- Update all systems - Deploy new secret to all consumers
- Verify rotation - Confirm new secret works, old is revoked
- Review access logs - Look for evidence of exploitation
- Document incident - Record what happened and response
Long-Term Response (First Day):
- Root cause analysis - How was the secret exposed?
- Prevent recurrence - Add safeguards against similar exposure
- Broader audit - Check for other potential exposures
- Update procedures - Improve rotation and access controls
- Post-mortem - Share learnings with team
Theory: Speed is critical. The goal is to minimize the window where an attacker can exploit the compromised secret. Pre-defined procedures reduce decision time during incidents.
Secret Committed to Git¶
If secret was just committed but not pushed:
# Remove from staging
git reset HEAD <file-with-secret>
# Remove from last commit
git reset --soft HEAD~1
# Edit file to remove secret
# Re-commit without secret
If secret was pushed to remote:
# Immediately rotate the secret (most important)
# Assume it is compromised
# Remove from Git history (disruptive - coordinate with team)
git filter-branch --force --index-filter \
"git rm --cached --ignore-unmatch <file-with-secret>" \
--prune-empty --tag-name-filter cat -- --all
# Force push (requires team coordination)
git push origin --force --all
Theory: Even after removing from history, assume the secret is compromised. GitHub/GitLab may cache commits. Anyone who pulled before the force push has the secret. Rotation is mandatory, not optional.
Database Credential Compromise¶
Immediate:
- Identify compromised credentials - Which user/password?
- Check active connections - Query database for active sessions
- Revoke old credentials -
REVOKE ALLor drop user - Create new credentials - New user with appropriate grants
- Update SSM Parameter - Store new password
- Rolling deployment - Update ECS tasks with new credentials
Database-Specific Commands:
-- MySQL: View active connections
SELECT * FROM information_schema.processlist WHERE user = 'compromised_user';
-- Kill suspicious sessions
KILL <process_id>;
-- Revoke access
REVOKE ALL PRIVILEGES ON *.* FROM 'compromised_user'@'%';
DROP USER 'compromised_user'@'%';
-- Create new user
CREATE USER 'new_user'@'%' IDENTIFIED BY 'new_secure_password';
GRANT SELECT, INSERT, UPDATE, DELETE ON mydb.* TO 'new_user'@'%';
Theory: Database compromise is especially serious because it provides direct data access. The priority is to revoke database access immediately, even before updating the application. Brief application downtime is preferable to ongoing data exfiltration.
API Key Compromise (Third-Party Services)¶
Immediate:
- Log into service dashboard - Auth0, SendGrid, Stripe, etc.
- Revoke compromised key - Immediately invalidate
- Generate new key - Create replacement
- Update SSM Parameter - Store new key
- Deploy update - Rolling deployment with new key
- Monitor service logs - Check for unauthorized usage
Service-Specific Procedures:
Auth0:
1. Dashboard → Applications → [Your App] → Settings
2. Rotate client secret
3. Copy new secret to SSM
4. Deploy application update
SendGrid:
1. Dashboard → Settings → API Keys
2. Delete compromised key
3. Create new key with same permissions
4. Update SSM parameter
5. Deploy
Stripe:
1. Dashboard → Developers → API Keys
2. Roll key (creates new, old remains valid temporarily)
3. Update application to use new key
4. After verification, delete old key
Theory: Third-party services often provide API key management interfaces. Use them. Don't try to be clever with database updates or configuration hacks. Follow the service's documented rotation procedure.
What to Never Commit¶
Explicit Blocklist¶
The following must never be committed to version control:
Environment Files:
- .env
- .env.local
- .env.production
- Any file named *.env except .env.example
Credential Files:
- credentials.json
- service-account-key.json
- *.pem (private keys)
- *.key (private keys)
- *.p12 (certificate bundles)
- id_rsa or id_ed25519 (SSH keys)
Configuration with Secrets:
- config.production.yml (if contains secrets)
- secrets.yml
- database.yml (if contains passwords)
- Any file with "secret", "credential", or "password" in the name
Application Secrets:
- Django SECRET_KEY in settings files
- Database passwords in settings
- API keys in code
- OAuth client secrets in code
- Session signing keys
Gitignore Configuration¶
Essential .gitignore entries:
# Environment variables
.env
.env.*
!.env.example
*.env
*.local
# Credentials
credentials*.json
*-credentials.json
service-account*.json
*.pem
*.key
*.p12
# SSH keys
id_rsa
id_ed25519
*.ppk
# Secret directories
secrets/
.secrets/
# Database files with credentials
database.yml
config/database.yml
# LocalStack data
.localstack/
localstack-data/
# IDE-specific (may contain credentials)
.vscode/settings.json
.idea/dataSources.xml
Theory: .gitignore is the first line of defense but not foolproof. Files can be force-added. Pre-commit hooks provide automated scanning. Code review provides human oversight. Use all three layers.
Pre-commit Hook Configuration¶
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: package-lock.json
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: check-added-large-files
- id: detect-private-key
- id: check-yaml
- id: check-json
Theory: detect-secrets scans for high-entropy strings and known secret patterns. It generates a baseline file of expected findings (e.g., example secrets in tests). New secrets trigger hook failure, preventing commit.
Code Review Checklist¶
When reviewing pull requests:
- No hardcoded secrets in code
-
.env.exampleupdated for new variables -
.env.examplecontains only placeholders - Configuration files use environment variables
- No database passwords in code
- No API keys in code
- Test fixtures use fake secrets
- Comments don't contain real secrets
Theory: Automated tools catch common patterns, but humans catch context-specific issues. A developer might add a "temporary" API key for testing. Tools might miss it if it doesn't match secret patterns. Code review is the human layer in defense-in-depth.
Security Best Practices¶
Secret Generation¶
Use Cryptographically Secure Randomness:
# Good: Cryptographically secure
import secrets
secret_key = secrets.token_urlsafe(64)
# Bad: Not cryptographically secure
import random
secret_key = ''.join(random.choices('abc123', k=50))
Theory: Standard random number generators are predictable given enough output. Cryptographically secure generators (like secrets) are designed to be unpredictable even to attackers who observe many outputs.
Minimum Secret Length: - Django SECRET_KEY: 50+ characters - Database passwords: 32+ characters - API tokens: Service-specific (check documentation) - Private keys: 2048+ bits RSA, 256+ bits EC
Theory: Longer secrets have more entropy, making brute force attacks computationally infeasible. A 50-character random string has ~298 bits of entropy - vastly more than needed to prevent brute force.
Secret Storage¶
Never in Code:
# Bad: Hardcoded secret
SECRET_KEY = 'django-insecure-abc123xyz789'
# Good: From environment
SECRET_KEY = pconfig.get_param('DJANGO_SECRET_KEY')
Never in Environment Variables for Production:
# Bad: Secret in task definition
environment:
- name: SECRET_KEY
value: "hardcoded-secret-here"
# Good: From SSM Parameter Store
secrets:
- name: SECRET_KEY
valueFrom: "arn:aws:ssm:region:account:parameter/prod/SECRET_KEY"
Theory: Secrets in task definitions are visible in CloudFormation, console, and API responses. They're effectively plaintext. Use the secrets key to fetch from SSM/Secrets Manager at runtime.
Secret Transmission¶
Always Use TLS: - API calls to SSM: TLS enforced by AWS - Database connections: Use SSL/TLS - Internal service calls: Use TLS even in VPC - Admin panels: HTTPS only
Theory: Secrets transmitted over plain HTTP can be intercepted via man-in-the-middle attacks. TLS encrypts the connection, preventing eavesdropping. This applies even within a VPC - defense in depth assumes network compromise.
Logging and Monitoring¶
Never Log Secrets:
# Bad: Logs secret
logger.info(f"Using API key: {api_key}")
# Good: Logs sanitized
logger.info("Using API key: ***REDACTED***")
# Better: Don't log at all
logger.info("API key configured successfully")
Sanitize Error Messages:
# Bad: Exception exposes secret
raise Exception(f"Auth failed with key: {api_key}")
# Good: Exception doesn't expose secret
raise Exception("Auth failed - check API key configuration")
Theory: Logs are often stored in plaintext, indexed by search systems, and accessible to many people. A secret in a log entry is a leaked secret. Log that secrets were configured, not their values.
Audit and Compliance¶
CloudTrail Logging:
- Enable CloudTrail for SSM API calls
- Monitor GetParameter calls for unusual patterns
- Alert on access to sensitive parameter paths
- Retain logs for compliance periods (often 1+ years)
Access Reviews: - Quarterly review of who/what has access to secrets - Remove access for departed team members - Audit service account permissions - Document why each principal needs access
Theory: Audit logs provide forensics after incidents and deter insider threats. Regular access reviews prevent permission creep - where principals accumulate unnecessary access over time.
Mermaid Diagrams¶
Secrets Lifecycle¶
graph LR
A[Generate] --> B[Store Encrypted]
B --> C[Grant Access]
C --> D[Application Uses]
D --> E{Rotation Event?}
E -->|Time-based| F[Generate New]
E -->|Compromise| F
E -->|Normal use| D
F --> G[Deploy New]
G --> H[Revoke Old]
H --> B
style A fill:#E3F2FD
style H fill:#FFEBEE
Access Control Layers¶
graph TD
A[Application Request] --> B[IAM Task Role]
B --> C{Has ssm:GetParameter?}
C -->|No| D[Access Denied]
C -->|Yes| E{Resource matches?}
E -->|No| D
E -->|Yes| F[SSM Parameter Store]
F --> G{Parameter exists?}
G -->|No| H[Not Found]
G -->|Yes| I{Has kms:Decrypt?}
I -->|No| D
I -->|Yes| J[KMS Decrypt]
J --> K[Return Secret]
style K fill:#90EE90
style D fill:#FFB6C1
style H fill:#FFE4B5
Emergency Response Flow¶
graph TD
A[Secret Compromise Detected] --> B[Immediate Revocation]
B --> C[Generate New Secret]
C --> D[Update SSM/Secrets Manager]
D --> E[Deploy to All Systems]
E --> F{All Systems Updated?}
F -->|No| G[Continue Deployment]
F -->|Yes| H[Verify New Secret Works]
G --> F
H --> I[Remove Old Secret]
I --> J[Monitor for Abuse]
J --> K[Post-Mortem Analysis]
style A fill:#FFEBEE
style I fill:#90EE90
style K fill:#E3F2FD
Service-to-Service Authentication¶
sequenceDiagram
participant Web as Web Service
participant SSM as SSM Parameter Store
participant API as API Service
participant APIStore as API's SSM Store
Note over Web,APIStore: Service Token Setup
Web->>SSM: Get SERVICE_TOKEN_FOR_API
SSM-->>Web: Return token value
Web->>Web: Cache token in memory
Note over Web,APIStore: API Request
Web->>API: Request with token header
API->>APIStore: Get SERVICE_TOKEN_SECRET
APIStore-->>API: Return secret
API->>API: Validate token matches secret
API-->>Web: Response (if valid)
Note over Web,API: If tokens don't match, request denied
Related Documentation¶
- Environment Variables - Local development configuration
- SSM Parameters - Production parameter storage
- Django Settings - Settings organization
Next Steps¶
- Audit your current codebase for hardcoded secrets
- Implement pre-commit hooks to prevent future commits
- Document your rotation schedule for all secrets
- Create emergency response runbook for secret compromise
- Set up CloudTrail monitoring for SSM access
- Schedule quarterly access reviews
- Test your secret rotation procedures in staging
Secrets Management Philosophy
Assume eventual compromise. Design your secrets management strategy to minimize damage when (not if) secrets are exposed. Rotation, least privilege, and defense in depth are your primary tools.
Common Mistakes
- Using production secrets in development
- Storing secrets in environment variables for ECS tasks
- Never rotating secrets
- Logging secret values
- Giving all services access to all secrets
- Forgetting to revoke access for departed team members
Critical Security Requirements
- Never commit secrets to version control - ever
- Always use SecureString for secrets in SSM
- Always use TLS for secret transmission
- Always rotate compromised secrets immediately
- Always use least privilege IAM policies
- Always audit who has access to secrets