Skip to content

Secrets Management

Overview

This document establishes the theoretical foundations and practical strategies for managing secrets in production Django applications. Secrets management encompasses the entire lifecycle: generation, storage, access, rotation, and emergency response. The approach detailed here reflects production best practices for small development teams deploying to AWS ECS.

Secrets Management Strategy

What Is a Secret?

A secret is any piece of information that, if disclosed, could compromise system security, data integrity, or privacy. The classification of what constitutes a secret is broader than many developers initially assume.

Obvious Secrets: - Database passwords - API keys and tokens - OAuth client secrets - Encryption keys - Private keys (SSH, SSL/TLS)

Less Obvious Secrets: - Django SECRET_KEY (session signing, CSRF protection) - Session cookies with sensitive data - JWT signing keys - Third-party service tokens (SendGrid, Auth0) - Webhook signing secrets - Internal service-to-service tokens

Not Secrets: - Public API endpoints - Database hostnames (without credentials) - Public OAuth client IDs - Feature flag names (but not their values if used for security) - Application configuration (timeout values, pool sizes)

Theory: The test for whether something is a secret: "If this were published on GitHub, would it compromise security or privacy?" If yes, it's a secret.

Secrets Lifecycle

Secrets have a lifecycle that must be managed:

Generation → Storage → Access → Rotation → Revocation
  1. Generation: Creating cryptographically secure secrets
  2. Storage: Encrypting and storing secrets securely
  3. Access: Granting applications access with minimal privileges
  4. Rotation: Periodically changing secrets to limit exposure window
  5. Revocation: Immediately invalidating compromised secrets

Theory: Most security breaches stem from failures in one lifecycle phase. Weak generation creates guessable secrets. Poor storage exposes them at rest. Overly broad access gives attackers lateral movement. Lack of rotation means old compromises remain exploitable. Slow revocation allows attackers to maintain access.

Defense in Depth

No single security mechanism is perfect. Effective secrets management employs multiple overlapping layers:

  1. Never in version control - Secrets don't belong in Git
  2. Encrypted at rest - SecureString in SSM, KMS encryption
  3. Encrypted in transit - TLS for all network communication
  4. Least privilege access - IAM policies limit who/what can access
  5. Audit logging - CloudTrail tracks all secret access
  6. Time-limited exposure - Secrets rotate regularly
  7. Application isolation - Secrets loaded once at startup, not persisted

Theory: Defense in depth assumes any single layer can fail. Even if secrets leak from one layer (e.g., memory dump), other layers (encryption, access controls) limit damage. The goal is to make secret compromise require multiple simultaneous failures.

SSM Parameter Store vs AWS Secrets Manager

Architectural Differences

Both services store secrets, but they serve different use cases with different features:

SSM Parameter Store: - Design goal: General-purpose configuration and secrets storage - Pricing: Free for standard parameters (up to 10,000) - Features: Simple key-value, versioning, hierarchical names - Rotation: Manual (application-managed) - Best for: Static configuration, service tokens, API keys

AWS Secrets Manager: - Design goal: Database credentials and auto-rotating secrets - Pricing: $0.40/secret/month + $0.05/10k API calls - Features: JSON secrets, automatic rotation, RDS integration - Rotation: Automatic with Lambda functions - Best for: Database passwords, OAuth tokens, rotating credentials

Theory: SSM Parameter Store is a general-purpose configuration store that happens to support encrypted secrets. Secrets Manager is purpose-built for secret lifecycle management with first-class rotation support. Choose based on whether you need automatic rotation.

When to Use Each

Use SSM Parameter Store when: - Secrets don't need automatic rotation - You manage rotation through deployment - Cost is a primary concern - You need hierarchical organization - Secrets are relatively static (API keys, service tokens)

Use AWS Secrets Manager when: - Database credentials that should rotate automatically - OAuth tokens with refresh flows - You want built-in RDS integration - Compliance requires automatic rotation - Secrets change frequently

Hybrid Approach (recommended for small teams): - SSM Parameter Store for most secrets (Django SECRET_KEY, API keys) - Secrets Manager for database credentials (if using automatic rotation) - Consistent access patterns (boto3 works for both)

Theory: For small teams on Django/ECS, SSM Parameter Store is usually sufficient. Database password rotation happens during deployment (new password, restart tasks). The cost savings are significant (thousands of dollars annually for large parameter sets). Add Secrets Manager selectively when automatic rotation provides clear value.

Cost Comparison Example

Scenario: 50 secrets, 10 million API calls/month

SSM Parameter Store: - Parameter storage: $0 (free tier covers 10,000 standard parameters) - API calls: $0 (free tier covers 40 standard API calls/second) - Total: $0/month

AWS Secrets Manager: - Secret storage: 50 × $0.40 = $20/month - API calls: 10,000,000 ÷ 10,000 × $0.05 = $50/month - Total: $70/month

Theory: At scale, SSM Parameter Store's free tier represents substantial savings. For a small team, $70/month may be acceptable for the convenience of automatic rotation. Evaluate based on your rotation requirements, not just cost.

Decision Matrix

graph TD
    A[Need to store secret] --> B{Requires automatic rotation?}
    B -->|Yes| C{Database credential?}
    B -->|No| D[SSM Parameter Store]

    C -->|Yes| E[Secrets Manager with RDS integration]
    C -->|No| F{High rotation frequency?}

    F -->|Yes - daily/weekly| G[Secrets Manager with Lambda]
    F -->|No - monthly/on-deployment| H[SSM Parameter Store]

    D --> I[Cost: $0]
    H --> I
    E --> J[Cost: $0.40/secret/month]
    G --> J

    style I fill:#90EE90
    style J fill:#FFE4B5

Rotation Policies

Why Rotate Secrets?

Secret rotation limits the window of exposure if a secret is compromised. An attacker who gains access to a secret has limited time to exploit it before rotation invalidates their access.

Theory: Security assumes eventual compromise. The question is not "if" but "when" secrets are exposed. Rotation reduces the value of old compromises. A secret compromised 6 months ago is worthless if rotated monthly.

Rotation Frequency Guidelines

Different secret types have different rotation schedules:

High-Frequency Rotation (Weekly-Monthly): - Database passwords (if using automatic rotation) - OAuth access tokens - Service-to-service authentication tokens - Secrets exposed to many systems

Medium-Frequency Rotation (Quarterly-Biannually): - API keys for third-party services - Django SECRET_KEY (complex, requires all sessions to invalidate) - Application service tokens - Webhook signing secrets

Low-Frequency Rotation (Annually or on events): - Private keys for SSL/TLS (certificate renewal) - Root encryption keys - Secrets with high change cost - Secrets requiring coordination across systems

Event-Driven Rotation (Immediate): - Known or suspected compromise - Employee departure - Service breach notification - Audit findings

Theory: Rotation frequency balances security benefit against operational cost. More frequent rotation reduces exposure window but increases complexity and risk of outages. The goal is to rotate often enough to limit damage but not so often that rotation itself becomes unreliable.

Rotation Strategies

Zero-Downtime Rotation Pattern:

1. Generate new secret
2. Deploy new secret alongside old secret
3. Update application to use new secret
4. Verify application works with new secret
5. Remove old secret

Theory: This pattern ensures continuous availability during rotation. Both old and new secrets are valid during transition. Only after confirming the new secret works is the old one removed.

Implementation:

# Django settings can support dual secrets during rotation
SECRET_KEYS = [
    pconfig.get_param('DJANGO_SECRET_KEY_NEW'),  # Try new first
    pconfig.get_param('DJANGO_SECRET_KEY_OLD'),  # Fallback to old
]

# Session middleware tries keys in order
for secret_key in SECRET_KEYS:
    try:
        session = decrypt_session(cookie, secret_key)
        break
    except InvalidSignature:
        continue

Database Password Rotation:

  1. Create new password in database
  2. Store new password in SSM/Secrets Manager
  3. Deploy application update (reads new password)
  4. Verify application connectivity
  5. Revoke old password from database

Theory: Database rotation is more complex because it involves two systems (SSM + database). The application must handle both passwords during transition to avoid connection failures.

Automated vs Manual Rotation

Automated Rotation (Secrets Manager): - Lambda function generates new secret - Lambda updates database password - Lambda updates secret in Secrets Manager - Application automatically uses new value - No human intervention required

Manual Rotation (SSM Parameter Store): - Human generates new secret - Human updates database/service - Human updates SSM parameter - Deploy application to use new secret - Human verifies rotation succeeded

Theory: Automated rotation reduces human error and ensures rotation happens on schedule. Manual rotation provides more control and suits less frequent rotation schedules. For small teams, manual rotation during deployments is often sufficient.

Access Control Patterns

Principle of Least Privilege

Every application and service should have access to only the secrets it needs, and no more.

Theory: Least privilege limits lateral movement in case of compromise. If an attacker compromises the web application, they shouldn't automatically gain access to database admin credentials, billing API keys, or other services' secrets.

IAM Role-Based Access

ECS Task Role Pattern:

Each ECS service gets its own task role with specific parameter access:

// Web application task role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ssm:GetParameter*"],
      "Resource": [
        "arn:aws:ssm:*:*:parameter/prod/web/*",
        "arn:aws:ssm:*:*:parameter/prod/shared/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["kms:Decrypt"],
      "Resource": "arn:aws:kms:*:*:key/web-kms-key-id"
    }
  ]
}

// Background worker task role
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ssm:GetParameter*"],
      "Resource": [
        "arn:aws:ssm:*:*:parameter/prod/worker/*",
        "arn:aws:ssm:*:*:parameter/prod/shared/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["kms:Decrypt"],
      "Resource": "arn:aws:kms:*:*:key/worker-kms-key-id"
    }
  ]
}

Theory: Each service has its own namespace (/prod/web/*, /prod/worker/*) plus access to shared resources (/prod/shared/*). If the web app is compromised, workers' secrets remain protected.

Path-Based Organization

Organize parameters to enable path-based access control:

/prod/web/          ← Web app secrets
/prod/worker/       ← Background worker secrets
/prod/shared/       ← Shared secrets (database, cache)
/prod/admin/        ← Administrative secrets (higher security)

IAM Policy:

{
  "Effect": "Allow",
  "Action": ["ssm:GetParameter*"],
  "Resource": "arn:aws:ssm:*:*:parameter/prod/${service}/*"
}

Theory: Path-based organization maps directly to IAM resource patterns. The service name in the path becomes a variable in the IAM policy, enabling consistent access patterns across services.

Time-Limited Access

For human access to secrets (debugging, incident response), use time-limited credentials:

# Assume role with 1-hour session
aws sts assume-role \
  --role-arn "arn:aws:iam::123456789012:role/emergency-access" \
  --role-session-name "incident-response-2025-10-03" \
  --duration-seconds 3600

Theory: Time-limited credentials reduce the window of exposure if credentials are leaked. After expiration, the credentials are worthless. This is especially important for human access, which is more likely to be logged or shared insecurely.

Service-to-Service Authentication

When one service needs to call another, use dedicated service tokens:

/prod/web/SERVICE_TOKEN_FOR_API      ← Web app uses this to call API
/prod/api/SERVICE_TOKEN_SECRET       ← API validates against this

Theory: Service tokens are scoped to a single purpose (web → API). If compromised, an attacker can only impersonate that specific service relationship, not gain broader access.

Local Development Approaches

The Development Secrets Problem

Production secrets must never be used in development. But development still needs working secrets for testing integrations.

Theory: Development secrets should be: 1. Non-production values - Never real API keys or passwords 2. Clearly marked - Obviously not production 3. Low-security - No encryption needed locally 4. Version-controlled structure - Parameter names in .env.example 5. Locally generated - Each developer creates their own

Approach 1: LocalStack SSM

Use LocalStack to simulate SSM Parameter Store locally with development-safe values:

# scripts/init-localstack-ssm.py
import boto3

ssm = boto3.client(
    'ssm',
    endpoint_url='http://localhost:4566',
    aws_access_key_id='test',
    aws_secret_access_key='test'
)

# Create development secrets
dev_secrets = {
    '/dev/django/SECRET_KEY': 'dev-secret-not-for-production',
    '/dev/database/PASSWORD': 'postgres',
    '/dev/auth0/CLIENT_SECRET': 'dev-auth0-secret',
    '/dev/sendgrid/API_KEY': 'SG.fake-development-key',
}

for name, value in dev_secrets.items():
    ssm.put_parameter(
        Name=name,
        Value=value,
        Type='SecureString',
        Overwrite=True
    )

Theory: LocalStack provides the same API as production SSM, allowing identical code to run in both environments. Development secrets are clearly fake but allow full testing of the integration logic.

Approach 2: .env Fallback

For secrets that don't need SSM integration locally, use .env files:

# settings/development.py
import os

# Try SSM first, fall back to environment variable
try:
    from poseidon.commons.config.ps_config import pconfig
    SECRET_KEY = pconfig.get_param('DJANGO_SECRET_KEY')
except Exception:
    # Fallback to .env for local development
    SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY', 'dev-fallback-key')

Theory: This hybrid approach uses SSM when available (LocalStack) but falls back to .env for simpler local development. Developers can work without running LocalStack for quick testing.

Approach 3: Development-Specific Values

Some third-party services provide development modes or sandbox environments:

# settings/development.py

# Use Auth0 development tenant
AUTH0_DOMAIN = 'dev-tenant.auth0.com'  # Not production
AUTH0_CLIENT_ID = 'dev-client-id'       # Development app

# Use Stripe test mode
STRIPE_API_KEY = 'sk_test_...'          # Test mode key (safe to commit)

# Use SendGrid sandbox mode
EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'  # Logs only

Theory: When services provide development modes, use them. Test API keys are designed to be shared and can often be committed to version control (check service documentation). This reduces friction for new developers.

Local Secret Generation

For secrets that must be unique per developer:

# Generate Django SECRET_KEY locally
python -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())"

# Generate random token
python -c "import secrets; print(secrets.token_urlsafe(64))"

Add to .env:

DJANGO_SECRET_KEY=<generated-value>

Theory: Each developer generates their own local secrets. These are unique to their environment and never shared. The .env.example file documents what to generate but doesn't provide values.

Emergency Procedures

Suspected Secret Compromise

When you suspect a secret has been compromised:

Immediate Actions (First 15 Minutes):

  1. Confirm compromise - Verify the secret is actually exposed
  2. Assess scope - Which secret? Which systems use it?
  3. Revoke access - If possible, immediately invalidate the secret
  4. Alert team - Notify relevant team members
  5. Monitor for abuse - Check logs for unauthorized access

Short-Term Response (First Hour):

  1. Rotate the compromised secret - Generate new value
  2. Update all systems - Deploy new secret to all consumers
  3. Verify rotation - Confirm new secret works, old is revoked
  4. Review access logs - Look for evidence of exploitation
  5. Document incident - Record what happened and response

Long-Term Response (First Day):

  1. Root cause analysis - How was the secret exposed?
  2. Prevent recurrence - Add safeguards against similar exposure
  3. Broader audit - Check for other potential exposures
  4. Update procedures - Improve rotation and access controls
  5. Post-mortem - Share learnings with team

Theory: Speed is critical. The goal is to minimize the window where an attacker can exploit the compromised secret. Pre-defined procedures reduce decision time during incidents.

Secret Committed to Git

If secret was just committed but not pushed:

# Remove from staging
git reset HEAD <file-with-secret>

# Remove from last commit
git reset --soft HEAD~1

# Edit file to remove secret
# Re-commit without secret

If secret was pushed to remote:

# Immediately rotate the secret (most important)
# Assume it is compromised

# Remove from Git history (disruptive - coordinate with team)
git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch <file-with-secret>" \
  --prune-empty --tag-name-filter cat -- --all

# Force push (requires team coordination)
git push origin --force --all

Theory: Even after removing from history, assume the secret is compromised. GitHub/GitLab may cache commits. Anyone who pulled before the force push has the secret. Rotation is mandatory, not optional.

Database Credential Compromise

Immediate:

  1. Identify compromised credentials - Which user/password?
  2. Check active connections - Query database for active sessions
  3. Revoke old credentials - REVOKE ALL or drop user
  4. Create new credentials - New user with appropriate grants
  5. Update SSM Parameter - Store new password
  6. Rolling deployment - Update ECS tasks with new credentials

Database-Specific Commands:

-- MySQL: View active connections
SELECT * FROM information_schema.processlist WHERE user = 'compromised_user';

-- Kill suspicious sessions
KILL <process_id>;

-- Revoke access
REVOKE ALL PRIVILEGES ON *.* FROM 'compromised_user'@'%';
DROP USER 'compromised_user'@'%';

-- Create new user
CREATE USER 'new_user'@'%' IDENTIFIED BY 'new_secure_password';
GRANT SELECT, INSERT, UPDATE, DELETE ON mydb.* TO 'new_user'@'%';

Theory: Database compromise is especially serious because it provides direct data access. The priority is to revoke database access immediately, even before updating the application. Brief application downtime is preferable to ongoing data exfiltration.

API Key Compromise (Third-Party Services)

Immediate:

  1. Log into service dashboard - Auth0, SendGrid, Stripe, etc.
  2. Revoke compromised key - Immediately invalidate
  3. Generate new key - Create replacement
  4. Update SSM Parameter - Store new key
  5. Deploy update - Rolling deployment with new key
  6. Monitor service logs - Check for unauthorized usage

Service-Specific Procedures:

Auth0:
1. Dashboard → Applications → [Your App] → Settings
2. Rotate client secret
3. Copy new secret to SSM
4. Deploy application update

SendGrid:
1. Dashboard → Settings → API Keys
2. Delete compromised key
3. Create new key with same permissions
4. Update SSM parameter
5. Deploy

Stripe:
1. Dashboard → Developers → API Keys
2. Roll key (creates new, old remains valid temporarily)
3. Update application to use new key
4. After verification, delete old key

Theory: Third-party services often provide API key management interfaces. Use them. Don't try to be clever with database updates or configuration hacks. Follow the service's documented rotation procedure.

What to Never Commit

Explicit Blocklist

The following must never be committed to version control:

Environment Files: - .env - .env.local - .env.production - Any file named *.env except .env.example

Credential Files: - credentials.json - service-account-key.json - *.pem (private keys) - *.key (private keys) - *.p12 (certificate bundles) - id_rsa or id_ed25519 (SSH keys)

Configuration with Secrets: - config.production.yml (if contains secrets) - secrets.yml - database.yml (if contains passwords) - Any file with "secret", "credential", or "password" in the name

Application Secrets: - Django SECRET_KEY in settings files - Database passwords in settings - API keys in code - OAuth client secrets in code - Session signing keys

Gitignore Configuration

Essential .gitignore entries:

# Environment variables
.env
.env.*
!.env.example
*.env
*.local

# Credentials
credentials*.json
*-credentials.json
service-account*.json
*.pem
*.key
*.p12

# SSH keys
id_rsa
id_ed25519
*.ppk

# Secret directories
secrets/
.secrets/

# Database files with credentials
database.yml
config/database.yml

# LocalStack data
.localstack/
localstack-data/

# IDE-specific (may contain credentials)
.vscode/settings.json
.idea/dataSources.xml

Theory: .gitignore is the first line of defense but not foolproof. Files can be force-added. Pre-commit hooks provide automated scanning. Code review provides human oversight. Use all three layers.

Pre-commit Hook Configuration

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
        exclude: package-lock.json

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: check-added-large-files
      - id: detect-private-key
      - id: check-yaml
      - id: check-json

Theory: detect-secrets scans for high-entropy strings and known secret patterns. It generates a baseline file of expected findings (e.g., example secrets in tests). New secrets trigger hook failure, preventing commit.

Code Review Checklist

When reviewing pull requests:

  • No hardcoded secrets in code
  • .env.example updated for new variables
  • .env.example contains only placeholders
  • Configuration files use environment variables
  • No database passwords in code
  • No API keys in code
  • Test fixtures use fake secrets
  • Comments don't contain real secrets

Theory: Automated tools catch common patterns, but humans catch context-specific issues. A developer might add a "temporary" API key for testing. Tools might miss it if it doesn't match secret patterns. Code review is the human layer in defense-in-depth.

Security Best Practices

Secret Generation

Use Cryptographically Secure Randomness:

# Good: Cryptographically secure
import secrets
secret_key = secrets.token_urlsafe(64)

# Bad: Not cryptographically secure
import random
secret_key = ''.join(random.choices('abc123', k=50))

Theory: Standard random number generators are predictable given enough output. Cryptographically secure generators (like secrets) are designed to be unpredictable even to attackers who observe many outputs.

Minimum Secret Length: - Django SECRET_KEY: 50+ characters - Database passwords: 32+ characters - API tokens: Service-specific (check documentation) - Private keys: 2048+ bits RSA, 256+ bits EC

Theory: Longer secrets have more entropy, making brute force attacks computationally infeasible. A 50-character random string has ~298 bits of entropy - vastly more than needed to prevent brute force.

Secret Storage

Never in Code:

# Bad: Hardcoded secret
SECRET_KEY = 'django-insecure-abc123xyz789'

# Good: From environment
SECRET_KEY = pconfig.get_param('DJANGO_SECRET_KEY')

Never in Environment Variables for Production:

# Bad: Secret in task definition
environment:
  - name: SECRET_KEY
    value: "hardcoded-secret-here"

# Good: From SSM Parameter Store
secrets:
  - name: SECRET_KEY
    valueFrom: "arn:aws:ssm:region:account:parameter/prod/SECRET_KEY"

Theory: Secrets in task definitions are visible in CloudFormation, console, and API responses. They're effectively plaintext. Use the secrets key to fetch from SSM/Secrets Manager at runtime.

Secret Transmission

Always Use TLS: - API calls to SSM: TLS enforced by AWS - Database connections: Use SSL/TLS - Internal service calls: Use TLS even in VPC - Admin panels: HTTPS only

Theory: Secrets transmitted over plain HTTP can be intercepted via man-in-the-middle attacks. TLS encrypts the connection, preventing eavesdropping. This applies even within a VPC - defense in depth assumes network compromise.

Logging and Monitoring

Never Log Secrets:

# Bad: Logs secret
logger.info(f"Using API key: {api_key}")

# Good: Logs sanitized
logger.info("Using API key: ***REDACTED***")

# Better: Don't log at all
logger.info("API key configured successfully")

Sanitize Error Messages:

# Bad: Exception exposes secret
raise Exception(f"Auth failed with key: {api_key}")

# Good: Exception doesn't expose secret
raise Exception("Auth failed - check API key configuration")

Theory: Logs are often stored in plaintext, indexed by search systems, and accessible to many people. A secret in a log entry is a leaked secret. Log that secrets were configured, not their values.

Audit and Compliance

CloudTrail Logging: - Enable CloudTrail for SSM API calls - Monitor GetParameter calls for unusual patterns - Alert on access to sensitive parameter paths - Retain logs for compliance periods (often 1+ years)

Access Reviews: - Quarterly review of who/what has access to secrets - Remove access for departed team members - Audit service account permissions - Document why each principal needs access

Theory: Audit logs provide forensics after incidents and deter insider threats. Regular access reviews prevent permission creep - where principals accumulate unnecessary access over time.

Mermaid Diagrams

Secrets Lifecycle

graph LR
    A[Generate] --> B[Store Encrypted]
    B --> C[Grant Access]
    C --> D[Application Uses]
    D --> E{Rotation Event?}
    E -->|Time-based| F[Generate New]
    E -->|Compromise| F
    E -->|Normal use| D
    F --> G[Deploy New]
    G --> H[Revoke Old]
    H --> B

    style A fill:#E3F2FD
    style H fill:#FFEBEE

Access Control Layers

graph TD
    A[Application Request] --> B[IAM Task Role]
    B --> C{Has ssm:GetParameter?}
    C -->|No| D[Access Denied]
    C -->|Yes| E{Resource matches?}
    E -->|No| D
    E -->|Yes| F[SSM Parameter Store]
    F --> G{Parameter exists?}
    G -->|No| H[Not Found]
    G -->|Yes| I{Has kms:Decrypt?}
    I -->|No| D
    I -->|Yes| J[KMS Decrypt]
    J --> K[Return Secret]

    style K fill:#90EE90
    style D fill:#FFB6C1
    style H fill:#FFE4B5

Emergency Response Flow

graph TD
    A[Secret Compromise Detected] --> B[Immediate Revocation]
    B --> C[Generate New Secret]
    C --> D[Update SSM/Secrets Manager]
    D --> E[Deploy to All Systems]
    E --> F{All Systems Updated?}
    F -->|No| G[Continue Deployment]
    F -->|Yes| H[Verify New Secret Works]
    G --> F
    H --> I[Remove Old Secret]
    I --> J[Monitor for Abuse]
    J --> K[Post-Mortem Analysis]

    style A fill:#FFEBEE
    style I fill:#90EE90
    style K fill:#E3F2FD

Service-to-Service Authentication

sequenceDiagram
    participant Web as Web Service
    participant SSM as SSM Parameter Store
    participant API as API Service
    participant APIStore as API's SSM Store

    Note over Web,APIStore: Service Token Setup

    Web->>SSM: Get SERVICE_TOKEN_FOR_API
    SSM-->>Web: Return token value
    Web->>Web: Cache token in memory

    Note over Web,APIStore: API Request

    Web->>API: Request with token header
    API->>APIStore: Get SERVICE_TOKEN_SECRET
    APIStore-->>API: Return secret
    API->>API: Validate token matches secret
    API-->>Web: Response (if valid)

    Note over Web,API: If tokens don't match, request denied

Next Steps

  1. Audit your current codebase for hardcoded secrets
  2. Implement pre-commit hooks to prevent future commits
  3. Document your rotation schedule for all secrets
  4. Create emergency response runbook for secret compromise
  5. Set up CloudTrail monitoring for SSM access
  6. Schedule quarterly access reviews
  7. Test your secret rotation procedures in staging

Secrets Management Philosophy

Assume eventual compromise. Design your secrets management strategy to minimize damage when (not if) secrets are exposed. Rotation, least privilege, and defense in depth are your primary tools.

Common Mistakes

  • Using production secrets in development
  • Storing secrets in environment variables for ECS tasks
  • Never rotating secrets
  • Logging secret values
  • Giving all services access to all secrets
  • Forgetting to revoke access for departed team members

Critical Security Requirements

  • Never commit secrets to version control - ever
  • Always use SecureString for secrets in SSM
  • Always use TLS for secret transmission
  • Always rotate compromised secrets immediately
  • Always use least privilege IAM policies
  • Always audit who has access to secrets