DevOps Study Note｜Understanding ECS Fargate, Auto Scaling, and Load Balancing in Practice

1. Background

In this study note, I documented a complete hands-on learning journey of Amazon ECS with Fargate, focusing on:

How ECS actually runs workloads
How auto scaling really behaves (not just how it’s configured)
Why scaling alone does not distribute traffic
How observability affects what you can and cannot see
Why ALB is required for a production-ready elastic service

This was not a theoretical exercise — all conclusions are based on real behavior observed in a running ECS service.

2. ECS Core Concepts (Clarified)

ECS Hierarchy

Cluster
 └── Service
      └── Task
           └── Container(s)

Key distinctions

Task
The atomic scheduling unit in ECS.
A task defines:
- CPU & memory limits
- Networking (IP, ENI)
- Lifecycle (start/stop together)
- One or more containers
Container
A runtime process inside a task (e.g. a JVM, Node app, sidecar).

ECS schedules tasks, not containers.

This is conceptually equivalent to:

ECS Task ≈ Kubernetes Pod

3. ECS Fargate vs ECS EC2 vs EC2 Auto Scaling Group

Model	What scales	Who manages servers
EC2 ASG	EC2 instances	You
ECS EC2	EC2 + Tasks	You
ECS Fargate	Tasks only	AWS

Fargate is serverless compute:

No instance types
No AMIs
No OS patching
No capacity planning

You declare desired tasks, AWS handles everything else.

4. ECS Service Auto Scaling: What Really Happens

Important realization

ECS Service Auto Scaling does NOT move load between tasks.

It only changes the number of running tasks.

Scaling logic:

High average CPU
→ Increase Service DesiredCount
→ Start new tasks

It does not:

Rebalance existing workload
Migrate threads
Redistribute requests

This explains a critical observation:

One task can be busy
Newly created tasks can remain idle
Service CPU average still drops → scaling stops

This behavior is correct and expected.

5. Metrics: Service vs Task vs Container

Default ECS Metrics (without Container Insights)

Cluster-level CPU / memory
Service-level average CPU / memory
❌ No task-level visibility

This is why initially it was impossible to confirm whether:

New tasks were actually doing work
Load was concentrated on a single task

6. Container Insights with Enhanced Observability

To see per-task and per-container metrics, ECS requires:

Container Insights with enhanced observability

Once enabled:

New CloudWatch namespace: ECS/ContainerInsights
Metrics become available:
- task_cpu_utilized
- container_cpu_utilized
- task_memory_utilized

This immediately revealed the truth:

One task had high CPU
Other tasks were mostly idle

This visibility is essential for:

Debugging scaling behavior
Understanding real workload distribution
Production-grade observability

7. Why Auto Scaling Alone Is Not Enough

Auto scaling increases capacity, not traffic distribution.

If clients connect directly to:

A task IP
A cached endpoint

Then:

All requests keep hitting the same task
New tasks receive no traffic
Scaling appears “ineffective”

This is not an ECS issue — it’s an architecture issue.

8. Why Application Load Balancer (ALB) Is Required

A production ECS service requires a stable ingress layer.

With ALB:

Client
  ↓
ALB (stable DNS)
  ↓
Target Group (IP mode)
  ↓
ECS Tasks

Benefits:

Stable entry point
Automatic registration of new tasks
Automatic deregistration of stopped tasks
True request-level load distribution

Only with ALB + ECS Service Auto Scaling does elastic compute become elastic traffic.

9. Networking Insight (Important)

In Fargate:

Tasks can have dynamic IPs
IPs can change on restart
Direct IP access is not production-safe

Best practice:

ALB in public subnets
Tasks in private subnets
No public IP on tasks
Security group allows: ALB → task port

10. Observed Scaling Behavior (Real)

From hands-on testing:

CPU spike on a single task
Service auto scaled from 1 → 3 tasks
New tasks started quickly (seconds)
Service CPU average dropped
No further scaling triggered
Load remained uneven without ALB

This validates:

ECS auto scaling is a control system, not a trigger.

11. How to Safely Stop Everything (Cost Control)

To stop all compute costs without deleting infrastructure:

Update ECS Service
Set Desired count = 0
(Ensure auto scaling min capacity = 0)

Result:

All Fargate tasks stop
No compute cost
Service & task definitions preserved

12. Final Takeaways

ECS scales tasks, not workload
Fargate removes server management, not architecture responsibility
Auto scaling without ALB only adds capacity
Container Insights is mandatory for real understanding
ALB completes the elastic loop

13. Personal Reflection

This exercise bridged the gap between:

“Knowing ECS”
And understanding how ECS behaves under real load

It also clarified why:

Many production systems look “over-provisioned”
Load appears uneven even with auto scaling
Observability is not optional in distributed systems

Menu

Share

DevOps Study Note｜Understanding ECS Fargate, Auto Scaling, and Load Balancing in Practice

1. Background

2. ECS Core Concepts (Clarified)

ECS Hierarchy

Key distinctions

3. ECS Fargate vs ECS EC2 vs EC2 Auto Scaling Group

4. ECS Service Auto Scaling: What Really Happens

Important realization

5. Metrics: Service vs Task vs Container

Default ECS Metrics (without Container Insights)

6. Container Insights with Enhanced Observability

7. Why Auto Scaling Alone Is Not Enough

8. Why Application Load Balancer (ALB) Is Required

9. Networking Insight (Important)

10. Observed Scaling Behavior (Real)

11. How to Safely Stop Everything (Cost Control)

12. Final Takeaways

13. Personal Reflection

Comment

Welcome to My Digital Home

Architecture and Design of a Banking AI Assistant

LeetCode Study Note | Problem 49: Group Anagrams

LeetCode Study Note | Problem 3: Longest Substring Without Repeating Characters

LeetCode Study Note | Problem 45: Jump Game II

Wisdom Chatroom

Designing a Proximity Service (Yelp, Google Places)

LeetCode Study Note | Problem 128: Longest Consecutive Sequence

LeetCode Study Note | Problem 53: Maximum Subarray

LeetCode Study Note | Problem 198: House Robber