AWS Well-Architected Framework — The 6 Pillars

AWS-WELL-ARCHITECTED

The six pillars AWS uses to evaluate cloud architectures: operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability.

awswell-architectedarchitecturebest-practices

Overview

The AWS Well-Architected Framework is a structured set of architectural best practices organized around six pillars. Each pillar addresses a distinct concern — operations, security, reliability, performance, cost, and environmental impact — and provides design principles, specific questions to evaluate your architecture against, and recommended AWS services and patterns.

The Framework is not a prescriptive blueprint. It is a checklist for evaluating trade-offs. No architecture perfectly satisfies all six pillars simultaneously. The process of working through the Framework surfaces gaps, prioritizes remediation, and forces explicit decisions about which trade-offs are acceptable for a given system.

AWS provides the Well-Architected Tool in the console, which codifies the Framework into a structured review process. This article covers the substance of each pillar and the key design decisions each one drives.


Pillar 1 — Operational Excellence

Operational excellence is the ability to run and monitor systems to deliver business value and to continuously improve processes and procedures. It is not a destination — it is a practice that evolves as the system and the team evolve.

Design Principles

Key Services and Tools

ServiceRole in Operational Excellence
AWS CloudFormationInfrastructure as code. Define stacks in JSON or YAML. Stack updates are controlled, reviewable, and reversible. Drift detection identifies out-of-band changes.
AWS CDKInfrastructure as code using TypeScript, Python, Java, or Go. Generates CloudFormation. Higher abstraction and better IDE support than raw CloudFormation.
AWS CloudWatchMetrics, logs, alarms, and dashboards. The observability backbone. CloudWatch Alarms trigger Auto Scaling, SNS notifications, and Systems Manager automation.
AWS X-RayDistributed tracing. Visualizes request flows through microservices. Identifies latency bottlenecks and errors across service boundaries.
AWS Systems ManagerOperational toolbox: run commands, patch management (Patch Manager), session management (Session Manager — console access without SSH keys), parameter store for configuration, and automation documents.
AWS ConfigRecords configuration history of AWS resources. Config Rules flag non-compliant configurations. Remediates drift automatically or flags it for review.

Example Design Decisions


Pillar 2 — Security

Security encompasses identity and access management, detection, infrastructure protection, data protection, and incident response. Security must be addressed at every layer, not as a perimeter around the outside.

Design Principles

Key Services and Tools

ServiceRole in Security
AWS KMSManaged key service. Create and rotate customer managed keys. Enforces key usage policies. Integrated with S3, EBS, RDS, Secrets Manager, and most storage services.
AWS ACMProvision, manage, and auto-renew TLS certificates for ALBs, CloudFront, API Gateway, and custom domains. Free for AWS-integrated resources.
Amazon GuardDutyContinuous threat detection using ML on CloudTrail, VPC Flow Logs, and DNS logs. Identifies compromised instances, unauthorized API calls, unusual network patterns, and cryptocurrency mining.
AWS Security HubAggregates findings from GuardDuty, Inspector, Macie, Firewall Manager, and third-party tools. Provides compliance posture scoring against CIS AWS Foundations Benchmark, PCI DSS, and NIST standards.
AWS WAFWeb Application Firewall at the ALB, CloudFront, or API Gateway layer. Block OWASP Top 10, SQL injection, XSS, bad bots, and custom rule sets. AWS Managed Rules provide instant coverage for known threats.
Amazon InspectorAutomated vulnerability scanning for EC2 instances (OS packages and CVEs), ECR container images, and Lambda functions. Continuously reassesses as new vulnerabilities are published.
Amazon MacieML-based sensitive data discovery in S3. Identifies PII, financial data, credentials. Generates findings for unencrypted or publicly accessible buckets containing sensitive data.

Example Design Decisions


Pillar 3 — Reliability

Reliability is the ability of a workload to perform its intended function correctly and consistently when expected to. It encompasses recovery from infrastructure or service disruptions, the ability to scale to meet demand, and the ability to mitigate disruptions.

Design Principles

Key Services and Tools

ServiceRole in Reliability
EC2 Auto ScalingReplaces failed instances, scales out under load, and scales in during low demand. Works with Launch Templates and Target Tracking policies for automated capacity management.
Multi-AZ deploymentsRDS Multi-AZ, ElastiCache Multi-AZ, OpenSearch Multi-AZ provide synchronous standby replicas that fail over automatically in the event of an AZ failure.
Amazon Route 53Health checks monitor endpoints and trigger DNS failover. Weighted, latency-based, failover, and geolocation routing policies provide flexible traffic management.
AWS BackupCentralized backup service covering EBS, RDS, DynamoDB, EFS, FSx, and EC2. Define backup plans with retention policies. Cross-region and cross-account backup copies for disaster recovery.
Elastic Load BalancingDistributes traffic across healthy targets in multiple AZs. Connection draining gracefully handles instance deregistration.
Amazon SQSDecouples producers from consumers. Messages are durably stored (up to 14 days) so a consumer failure does not lose data. Dead-letter queues capture unprocessable messages for investigation.

Example Design Decisions


Pillar 4 — Performance Efficiency

Performance efficiency is the ability to use computing resources efficiently to meet system requirements and to maintain that efficiency as demand changes and technologies evolve.

Design Principles

Key Services and Tools

ServiceRole in Performance Efficiency
Amazon CloudFrontCDN with global edge network. Sub-10ms content delivery to end users regardless of origin region. Dramatically reduces latency for static assets, SPAs, and cacheable API responses.
Amazon ElastiCacheManaged in-memory caching. Redis (ElastiCache for Redis) supports data structures, pub/sub, Lua scripting, and persistence. Memcached is simpler and multi-threaded. Reduces database load for read-heavy workloads.
Amazon AuroraMySQL and PostgreSQL-compatible relational database with up to 5× MySQL throughput and 3× PostgreSQL throughput. Distributed storage across 6 copies in 3 AZs. Aurora Serverless v2 auto-scales in fine-grained increments.
AWS LambdaServerless compute. Instant scaling from zero to thousands of concurrent executions. Eliminates idle capacity cost. Compute time billed in 1ms increments.
AWS GravitonARM-based processors designed by AWS. Available for EC2 (M7g, C7g, R7g families), Lambda, RDS, ElastiCache, and others. 20–40% better price/performance than equivalent x86 instances for most workloads.

Example Design Decisions


Pillar 5 — Cost Optimization

Cost optimization is the ability to run systems that deliver business value at the lowest price point. This is not simply about spending less — it is about understanding where money goes, eliminating waste, and choosing the right consumption model for each workload.

Design Principles

Key Services and Tools

ServiceRole in Cost Optimization
AWS Savings PlansCommit to a specific dollar amount per hour ($/hr) for 1 or 3 years in exchange for up to 66% discount. Compute Savings Plans apply to EC2, Lambda, and Fargate regardless of instance family, size, or region. More flexible than Reserved Instances.
Spot InstancesSpare EC2 capacity at up to 90% discount. Interruptible with 2-minute warning. Best for fault-tolerant, stateless, batch workloads. Use Spot Fleet or Auto Scaling Groups with mixed instance policies to maintain capacity across multiple Spot pools.
AWS Cost ExplorerVisualize cost and usage over time. Hourly granularity. Filter by service, region, tag, linked account. Cost Explorer also provides Savings Plans and Reserved Instance recommendations based on actual usage history.
AWS Trusted AdvisorChecks across cost optimization, performance, security, fault tolerance, and service limits. Cost optimization checks include idle load balancers, underutilized EC2 instances, unassociated Elastic IPs, and unused RDS instances.
AWS Compute OptimizerML-based recommendations for right-sizing EC2 instances, ECS tasks, Lambda function memory, EBS volumes, and Auto Scaling Groups. Analyzes 14 days of CloudWatch metrics and recommends optimal configurations with projected cost and performance impact.

Example Design Decisions


Pillar 6 — Sustainability

Sustainability is the newest pillar, addressing the environmental impact of running cloud workloads. The goal is to minimize the energy consumption and carbon footprint of cloud infrastructure without sacrificing the other five pillars.

Design Principles

Key Services and Tools

ServiceRole in Sustainability
AWS GravitonARM-based processors deliver same or better performance at 60% lower energy use compared to equivalent x86 instances. Graviton3 is the most energy-efficient AWS processor currently available.
AWS Compute OptimizerRight-sizing recommendations reduce idle compute capacity. Fewer running instances means lower energy consumption.
Amazon S3 Intelligent-TieringMoves objects to lower-cost, lower-energy storage tiers automatically. Cold data stored at lower energy per byte than hot storage.
Managed services broadlyRDS, Aurora, DynamoDB, Lambda, Fargate — all run at higher infrastructure utilization rates than equivalent self-managed deployments, translating to less energy per unit of work.
AWS Customer Carbon Footprint ToolMonthly carbon emissions data by service, region, and account. Available in the AWS Cost and Usage Dashboard.

Example Design Decisions


Pillar Summary

PillarCore QuestionKey AWS ServicesExample Design Decision
Operational ExcellenceCan we run and improve this system sustainably?CloudFormation, CloudWatch, X-Ray, SSM, ConfigAll infrastructure defined as CDK code; runbooks as SSM documents
SecurityIs the data and system protected at every layer?IAM, KMS, GuardDuty, Security Hub, WAF, InspectorGuardDuty + Security Hub enabled in all regions; WAF on all public ALBs
ReliabilityWill it recover from failures without human intervention?Auto Scaling, Multi-AZ, Route 53, AWS Backup, SQSThree-AZ deployment with automatic EC2 and RDS failover
Performance EfficiencyAre we using the right tools at the right size?CloudFront, ElastiCache, Aurora, Lambda, GravitonMigrate batch jobs to Graviton3; cache read-heavy data in ElastiCache
Cost OptimizationAre we spending efficiently and attributing cost correctly?Savings Plans, Spot, Cost Explorer, Trusted Advisor, Compute Optimizer70% Savings Plan coverage; Spot for batch; enforced resource tagging
SustainabilityAre we minimizing environmental impact?Graviton, Compute Optimizer, managed services, Customer Carbon Footprint ToolAll compute migrated to Graviton3; self-managed clusters replaced with managed services

The AWS Well-Architected Tool

The AWS Well-Architected Tool is a free self-service questionnaire in the AWS Management Console that walks through the Framework systematically.

How a review works:

  1. Define a workload: name it, describe it, select the applicable lenses (the standard Framework, plus specialty lenses for serverless, SaaS, IoT, analytics, ML, financial services, and government).
  2. Answer questions for each pillar. Questions are structured as “Do you [specific practice]?” with four options: Yes, No, I don’t know, N/A.
  3. The tool generates a gap report: which best practices are not implemented, categorized by risk (High, Medium).
  4. Improvement plan: recommendations for each gap, linked to documentation and relevant AWS services.
  5. Track progress over time by saving milestone snapshots of each review.

The Well-Architected Tool is most valuable when used as a recurring practice — not a one-time review. Run it quarterly on critical workloads, and whenever a workload undergoes significant architectural change.

Engineering Team
Well-Architected Tool
Define workload, select lenses
Describe scope: production order service, multi-AZ, 3 tiers
Answer pillar questions
~50-70 questions across 6 pillars
Gap report generated
High Risk: no GuardDuty, no Multi-AZ for RDS, no Savings Plans
Create tickets for each high-risk finding
Prioritize by risk and implementation effort
Save milestone snapshot
Baseline for measuring progress in next review
Quarterly re-review
Confirm improvements closed, check for new gaps

AWS partners can also conduct Well-Architected Reviews as a formal engagement, providing external perspective and access to AWS field team support for addressing findings.