Administrator
Published on 2026-01-19 / 5 Visits
0
0

DevOps Study Note|Understanding ECS Fargate, Auto Scaling, and Load Balancing in Practice

1. Background

In this study note, I documented a complete hands-on learning journey of Amazon ECS with Fargate, focusing on:

  • How ECS actually runs workloads

  • How auto scaling really behaves (not just how it’s configured)

  • Why scaling alone does not distribute traffic

  • How observability affects what you can and cannot see

  • Why ALB is required for a production-ready elastic service

This was not a theoretical exercise — all conclusions are based on real behavior observed in a running ECS service.


2. ECS Core Concepts (Clarified)

ECS Hierarchy

Cluster
 └── Service
      └── Task
           └── Container(s)

Key distinctions

  • Task

    The atomic scheduling unit in ECS.

    A task defines:

    • CPU & memory limits

    • Networking (IP, ENI)

    • Lifecycle (start/stop together)

    • One or more containers

  • Container

    A runtime process inside a task (e.g. a JVM, Node app, sidecar).

ECS schedules tasks, not containers.

This is conceptually equivalent to:

  • ECS Task ≈ Kubernetes Pod


3. ECS Fargate vs ECS EC2 vs EC2 Auto Scaling Group

Model

What scales

Who manages servers

EC2 ASG

EC2 instances

You

ECS EC2

EC2 + Tasks

You

ECS Fargate

Tasks only

AWS

Fargate is serverless compute:

  • No instance types

  • No AMIs

  • No OS patching

  • No capacity planning

You declare desired tasks, AWS handles everything else.


4. ECS Service Auto Scaling: What Really Happens

Important realization

ECS Service Auto Scaling does NOT move load between tasks.

It only changes the number of running tasks.

Scaling logic:

High average CPU
→ Increase Service DesiredCount
→ Start new tasks

It does not:

  • Rebalance existing workload

  • Migrate threads

  • Redistribute requests

This explains a critical observation:

  • One task can be busy

  • Newly created tasks can remain idle

  • Service CPU average still drops → scaling stops

This behavior is correct and expected.


5. Metrics: Service vs Task vs Container

Default ECS Metrics (without Container Insights)

  • Cluster-level CPU / memory

  • Service-level average CPU / memory

  • ❌ No task-level visibility

This is why initially it was impossible to confirm whether:

  • New tasks were actually doing work

  • Load was concentrated on a single task


6. Container Insights with Enhanced Observability

To see per-task and per-container metrics, ECS requires:

Container Insights with enhanced observability

Once enabled:

  • New CloudWatch namespace: ECS/ContainerInsights

  • Metrics become available:

    • task_cpu_utilized

    • container_cpu_utilized

    • task_memory_utilized

This immediately revealed the truth:

  • One task had high CPU

  • Other tasks were mostly idle

This visibility is essential for:

  • Debugging scaling behavior

  • Understanding real workload distribution

  • Production-grade observability


7. Why Auto Scaling Alone Is Not Enough

Auto scaling increases capacity, not traffic distribution.

If clients connect directly to:

  • A task IP

  • A cached endpoint

Then:

  • All requests keep hitting the same task

  • New tasks receive no traffic

  • Scaling appears “ineffective”

This is not an ECS issue — it’s an architecture issue.


8. Why Application Load Balancer (ALB) Is Required

A production ECS service requires a stable ingress layer.

With ALB:

Client
  ↓
ALB (stable DNS)
  ↓
Target Group (IP mode)
  ↓
ECS Tasks

Benefits:

  • Stable entry point

  • Automatic registration of new tasks

  • Automatic deregistration of stopped tasks

  • True request-level load distribution

Only with ALB + ECS Service Auto Scaling does elastic compute become elastic traffic.


9. Networking Insight (Important)

In Fargate:

  • Tasks can have dynamic IPs

  • IPs can change on restart

  • Direct IP access is not production-safe

Best practice:

  • ALB in public subnets

  • Tasks in private subnets

  • No public IP on tasks

  • Security group allows: ALB → task port


10. Observed Scaling Behavior (Real)

From hands-on testing:

  • CPU spike on a single task

  • Service auto scaled from 1 → 3 tasks

  • New tasks started quickly (seconds)

  • Service CPU average dropped

  • No further scaling triggered

  • Load remained uneven without ALB

This validates:

ECS auto scaling is a control system, not a trigger.


11. How to Safely Stop Everything (Cost Control)

To stop all compute costs without deleting infrastructure:

  • Update ECS Service

  • Set Desired count = 0

  • (Ensure auto scaling min capacity = 0)

Result:

  • All Fargate tasks stop

  • No compute cost

  • Service & task definitions preserved


12. Final Takeaways

  1. ECS scales tasks, not workload

  2. Fargate removes server management, not architecture responsibility

  3. Auto scaling without ALB only adds capacity

  4. Container Insights is mandatory for real understanding

  5. ALB completes the elastic loop


13. Personal Reflection

This exercise bridged the gap between:

  • “Knowing ECS”

  • And understanding how ECS behaves under real load

It also clarified why:

  • Many production systems look “over-provisioned”

  • Load appears uneven even with auto scaling

  • Observability is not optional in distributed systems


Comment