1. Background
In this study note, I documented a complete hands-on learning journey of Amazon ECS with Fargate, focusing on:
How ECS actually runs workloads
How auto scaling really behaves (not just how it’s configured)
Why scaling alone does not distribute traffic
How observability affects what you can and cannot see
Why ALB is required for a production-ready elastic service
This was not a theoretical exercise — all conclusions are based on real behavior observed in a running ECS service.
2. ECS Core Concepts (Clarified)
ECS Hierarchy
Cluster
└── Service
└── Task
└── Container(s)Key distinctions
Task
The atomic scheduling unit in ECS.
A task defines:
CPU & memory limits
Networking (IP, ENI)
Lifecycle (start/stop together)
One or more containers
Container
A runtime process inside a task (e.g. a JVM, Node app, sidecar).
ECS schedules tasks, not containers.
This is conceptually equivalent to:
ECS Task ≈ Kubernetes Pod
3. ECS Fargate vs ECS EC2 vs EC2 Auto Scaling Group
Fargate is serverless compute:
No instance types
No AMIs
No OS patching
No capacity planning
You declare desired tasks, AWS handles everything else.
4. ECS Service Auto Scaling: What Really Happens
Important realization
ECS Service Auto Scaling does NOT move load between tasks.
It only changes the number of running tasks.
Scaling logic:
High average CPU
→ Increase Service DesiredCount
→ Start new tasksIt does not:
Rebalance existing workload
Migrate threads
Redistribute requests
This explains a critical observation:
One task can be busy
Newly created tasks can remain idle
Service CPU average still drops → scaling stops
This behavior is correct and expected.
5. Metrics: Service vs Task vs Container
Default ECS Metrics (without Container Insights)
Cluster-level CPU / memory
Service-level average CPU / memory
❌ No task-level visibility
This is why initially it was impossible to confirm whether:
New tasks were actually doing work
Load was concentrated on a single task
6. Container Insights with Enhanced Observability
To see per-task and per-container metrics, ECS requires:
Container Insights with enhanced observability
Once enabled:
New CloudWatch namespace: ECS/ContainerInsights
Metrics become available:
task_cpu_utilized
container_cpu_utilized
task_memory_utilized
This immediately revealed the truth:
One task had high CPU
Other tasks were mostly idle
This visibility is essential for:
Debugging scaling behavior
Understanding real workload distribution
Production-grade observability
7. Why Auto Scaling Alone Is Not Enough
Auto scaling increases capacity, not traffic distribution.
If clients connect directly to:
A task IP
A cached endpoint
Then:
All requests keep hitting the same task
New tasks receive no traffic
Scaling appears “ineffective”
This is not an ECS issue — it’s an architecture issue.
8. Why Application Load Balancer (ALB) Is Required
A production ECS service requires a stable ingress layer.
With ALB:
Client
↓
ALB (stable DNS)
↓
Target Group (IP mode)
↓
ECS TasksBenefits:
Stable entry point
Automatic registration of new tasks
Automatic deregistration of stopped tasks
True request-level load distribution
Only with ALB + ECS Service Auto Scaling does elastic compute become elastic traffic.
9. Networking Insight (Important)
In Fargate:
Tasks can have dynamic IPs
IPs can change on restart
Direct IP access is not production-safe
Best practice:
ALB in public subnets
Tasks in private subnets
No public IP on tasks
Security group allows: ALB → task port
10. Observed Scaling Behavior (Real)
From hands-on testing:
CPU spike on a single task
Service auto scaled from 1 → 3 tasks
New tasks started quickly (seconds)
Service CPU average dropped
No further scaling triggered
Load remained uneven without ALB
This validates:
ECS auto scaling is a control system, not a trigger.
11. How to Safely Stop Everything (Cost Control)
To stop all compute costs without deleting infrastructure:
Update ECS Service
Set Desired count = 0
(Ensure auto scaling min capacity = 0)
Result:
All Fargate tasks stop
No compute cost
Service & task definitions preserved
12. Final Takeaways
ECS scales tasks, not workload
Fargate removes server management, not architecture responsibility
Auto scaling without ALB only adds capacity
Container Insights is mandatory for real understanding
ALB completes the elastic loop
13. Personal Reflection
This exercise bridged the gap between:
“Knowing ECS”
And understanding how ECS behaves under real load
It also clarified why:
Many production systems look “over-provisioned”
Load appears uneven even with auto scaling
Observability is not optional in distributed systems