Cost Optimization Guide for Bedrock Budgeteer
Overview
Comprehensive cost optimization strategies implemented in the production environment to minimize AWS costs while maintaining performance and reliability.
Production Architecture Cost Strategy
Focus
Predictable costs with auto-scaling performance for enterprise workloads
DynamoDB Optimization
- Billing Mode: PROVISIONED (predictable baseline costs)
- Auto-Scaling: Enabled with table-specific configurations
- Encryption: AWS-managed encryption (SSE) by default, optional customer-managed KMS
- Point-in-Time Recovery: Enabled (data protection)
- Log Retention: 6 months (compliance and troubleshooting)
Auto-Scaling Configuration
# Table-specific scaling limits
scaling_configs = {
"user_budgets": {
"min_read": 5, "max_read": 40,
"min_write": 5, "max_write": 40
},
"usage_tracking": {
"min_read": 10, "max_read": 100,
"min_write": 20, "max_write": 200 # Higher for usage events
},
"budget_alerts": {
"min_read": 3, "max_read": 25,
"min_write": 3, "max_write": 25
},
"audit_logs": {
"min_read": 10, "max_read": 100,
"min_write": 25, "max_write": 250 # Higher for audit events
}
}
Auto-Scaling Strategy
Performance Targets
- Target Utilization: 70% (optimal cost/performance balance)
- Scale-Out Cooldown: 60 seconds (quick response to load)
- Scale-In Cooldown: 60 seconds (prevent thrashing)
Scaling Behavior
- Read Capacity: Scales based on consumed read capacity units
- Write Capacity: Scales based on consumed write capacity units
- Global Secondary Indexes: Auto-scaled independently
Cost Benefits
- Baseline Costs: Predictable minimum capacity for budgeting
- Peak Handling: Automatic scaling for traffic spikes
- Cost Control: Maximum limits prevent runaway costs
- Efficiency: 70% target utilization optimizes cost per operation
Cost Monitoring and Alerts
System Budget
Production Environment: $5000/month
- No automatic cleanup (compliance requirement)
- Alert at 80% ($4000)
- Critical alert at 95% ($4750)
- Emergency controls at 100% ($5000)
Cost Allocation Tags
All resources tagged for cost tracking:
- Environment: production
- Component: data-storage, security, monitoring, event-ingestion, workflow-orchestration
- CostCenter: engineering-ops
- BillingProject: bedrock-budget-control
Implementation Details
DynamoDB Cost Optimization
# Production billing mode
def _get_billing_mode(self) -> dynamodb.BillingMode:
# Production environment uses provisioned capacity with auto-scaling
return dynamodb.BillingMode.PROVISIONED
# Auto-scaling always enabled for production
self._add_auto_scaling(table, table_type)
Point-in-Time Recovery
def _get_point_in_time_recovery(self) -> bool:
# Production environment always enables PITR for data protection
return True
Encryption Configuration
def _get_encryption_config(self) -> Dict[str, Any]:
"""Get encryption configuration based on available KMS key"""
if self.kms_key:
# Use customer-managed KMS key if provided
return {
"encryption": dynamodb.TableEncryption.CUSTOMER_MANAGED,
"encryption_key": self.kms_key
}
else:
# Default to AWS-managed encryption (SSE)
return {
"encryption": dynamodb.TableEncryption.AWS_MANAGED
}
Cost Optimization Best Practices
1. Right-Sizing Resources
- Production: Provisioned capacity with auto-scaling for predictable workloads
- Baseline: Conservative minimum capacity to handle normal load
- Scaling: Automatic adjustment for traffic spikes and valleys
2. Data Lifecycle Management
- Log Retention: 6-month retention for compliance and troubleshooting
- Backup Strategy: PITR enabled for all tables
- Archive Strategy: S3 lifecycle policies for long-term storage
3. Feature Configuration
- Encryption: AWS-managed encryption by default, optional customer-managed KMS keys
- Monitoring: Comprehensive monitoring for production reliability
- Backup: Multi-layer backup strategy with PITR and automated snapshots
4. Resource Tagging
- Cost Allocation: Clear attribution of costs to components and projects
- Lifecycle Management: Automated management based on tags
- Budget Tracking: Component-level cost monitoring
S3 Cost Optimization
Lifecycle Management
Active Logs (0-30 days): Standard Storage
Archive Transition (30-90 days): Infrequent Access
Cold Archive (90-365 days): Glacier
Deep Archive (365+ days): Deep Archive
Storage Classes
- Standard: Active logs and frequently accessed data
- Infrequent Access: Monthly reports and historical data
- Glacier: Long-term compliance archives
- Deep Archive: Regulatory compliance data (7+ years)
Monitoring Cost Optimization
Key Metrics
- DynamoDB Consumed Capacity: Track utilization vs provisioned
- Auto-Scaling Events: Monitor scaling frequency and triggers
- S3 Storage Costs: Track storage class transitions and access patterns
- Lambda Duration: Monitor function efficiency and cold starts
Optimization Opportunities
- Unused Provisioned Capacity: Reduce baseline where consistently under-utilized
- Inefficient Query Patterns: Optimize access patterns to reduce RCU/WCU
- Over-Provisioned Auto-Scaling: Adjust max limits based on actual usage
- S3 Lifecycle Policies: Optimize transition timing based on access patterns
Cost Savings Strategies
Auto-Scaling Benefits
- Baseline Optimization: 20-40% savings vs peak-provisioned capacity
- Optimal Utilization: 70% target maximizes cost efficiency
- Burst Handling: Automatic scaling prevents over-provisioning
Storage Optimization
- Lifecycle Policies: 50-80% savings on long-term storage costs
- Compression: GZIP compression reduces storage and transfer costs
- Data Partitioning: Optimized partition keys reduce query costs
Regular Optimization Tasks
Weekly
- Review DynamoDB capacity utilization
- Check for unused provisioned capacity
- Monitor cost trends and anomalies
- Analyze auto-scaling patterns
Monthly
- Optimize baseline capacity settings
- Review and adjust budget alerts
- Analyze S3 storage class distribution
- Review Lambda function performance metrics
Quarterly
- Evaluate new AWS cost optimization features
- Review access patterns for optimization opportunities
- Update capacity planning based on growth trends
- Assess Reserved Instance opportunities
Annually
- Comprehensive cost optimization review
- Reserved capacity planning for predictable workloads
- Architecture review for cost efficiency improvements
- Cost allocation model updates
Emergency Cost Controls
The system includes built-in cost protection mechanisms:
- Budget Thresholds: Automatic alerts at 80%, 90%, and 95%
- Emergency Stop: System-wide halt capability at 100% budget
- Suspension Controls: Automatic user suspension for budget violations
- Admin Override: Manual controls for emergency situations
Cost Monitoring Integration
- Real-time Alerts: SNS notifications for budget threshold violations
- Dashboard Monitoring: CloudWatch dashboards for cost tracking
- Automated Responses: Step Functions workflows for cost control actions
- Audit Trail: Complete audit log of all cost-related actions