CannotPullContainerError on ECS Fargate: 7 Root Causes and Fixes

Seven causes of CannotPullContainerError on ECS Fargate: IAM, networking, image tag, architecture, disk, Docker Hub, VPC endpoints. Match the cause and fix it.

By Muhammad Moeed · May 20, 2026 ·Updated June 12, 2026 · 16 min read

CannotPullContainerError is the error you see when an ECS task stops before its container even starts. It shows up in the stopped task details in the ECS console, in CloudWatch, and at the bottom of your CI logs. The frustrating part is the message is generic. The underlying cause is not generic — there are seven distinct reasons this fires, and the fix for one will not help with the others.

This guide takes every variant the official AWS troubleshooting page documents, groups them by root cause, and gives you the exact diagnosis and fix for each. Open your stopped task error message in another tab, match it to one of the seven, and skip to that section.

CannotPullContainerError has seven causes: IAM role missing ECR permissions, network can’t reach ECR (no NAT/VPC endpoints, no public IP, blocked security groups), the image or tag does not exist, architecture mismatch (ARM vs AMD), disk space or ephemeral storage too small, Docker Hub rate limit, or a missing S3 gateway endpoint when using VPC endpoints. The fix depends on which one. Read the exact error message first, then match it to the right section.

Quick error message lookup

Open the stopped task, copy the text after CannotPullContainerError, and match it to the row below. The cause column points to the section with the fix.

If your error message contains…	The root cause is	Jump to
`pull access denied` or “role has the permissions”	IAM role missing ECR permissions	Cause 1 below
`i/o timeout`, `dial tcp`, `connection issue`, `request canceled`	Network can’t reach ECR	Cause 2 below
`failed to resolve ref`, `image manifest`, `manifest unknown`	Image or tag does not exist	Cause 3 below
`unexpected status code`, `failed to copy`, or you build on Apple Silicon	Architecture mismatch	Cause 4 below
`no space left on device`	Disk or ephemeral storage too small	Cause 5 below
`toomanyrequests`, `pull rate limit`	Docker Hub rate limit	Cause 6 below
`Context canceled` after VPC endpoints look correct	Missing S3 gateway endpoint	Cause 7 below

Most users find their fix in a single scan of this table.

Title card for the article on fixing CannotPullContainerError in ECS, listing the seven causes

First, get the exact error message

Before you change anything, find the full message. The error category is CannotPullContainerError, but the text after it is what tells you which of the seven causes you have.

Open the ECS console, go to your cluster, then your service.
Open the Tasks tab and switch the filter to Stopped.
Click the task. The Stopped reason field has the full message.

If you are on Fargate platform version 1.4, the message is truncated. You can also see the full message in the StoppedReason field of aws ecs describe-tasks. Copy that whole line and keep it nearby.

Cause 1: The task execution role is missing ECR permissions

You see:

The task can’t pull the image. Check that the role has the permissions to pull images from the registry.

Or sometimes:

pull access denied

This is the most common cause. Pulling from ECR is an authenticated action, and ECS needs an IAM role with the right permissions to do it on your behalf.

Fix on Fargate. Attach the AWS managed policy AmazonECSTaskExecutionRolePolicy to your task execution role. This policy contains the minimum ECR permissions plus the CloudWatch Logs permissions ECS needs at startup. If the task definition references no task execution role at all, create one and attach this policy.

Fix on EC2 launch type. The container instance role, not the task execution role, is what pulls the image. Attach AmazonEC2ContainerRegistryReadOnly to the EC2 instance profile your container instances use.

The minimum ECR permissions, if you write a custom policy:

ecr:GetAuthorizationToken on Resource: "*"
ecr:BatchCheckLayerAvailability
ecr:GetDownloadUrlForLayer
ecr:BatchGetImage

If you have separated dev and prod ECR repos by IAM condition keys, scope BatchCheckLayerAvailability, GetDownloadUrlForLayer, and BatchGetImage to the specific repository ARNs. GetAuthorizationToken always needs Resource: "*".

After updating the role, restart the service so it picks up a fresh task with the new permissions.

Cause 2: The task cannot reach ECR over the network

You see:

The task cannot pull ’…’ from the Amazon ECR repository ’…’. There is a connection issue between the task and Amazon ECR. Check your task network configuration.

Or:

API error (500): Get https://aws_account.dkr.ecr.us-east-1.amazonaws.com/v2/: net/http: request canceled while waiting for connection

This is the second most common cause and the most painful, because the configuration is split across several AWS services and there is no single page that shows it. The fix depends on which subnet your task runs in.

Task in a public subnet. Set assignPublicIp to ENABLED in the network configuration. Without a public IP, the task has no way out to the ECR endpoint. The subnet route table must also have a default route (0.0.0.0/0) pointing to an internet gateway.

Task in a private subnet with internet egress. The subnet’s default route must point to a NAT gateway in a public subnet, and the NAT gateway must have an Elastic IP. Confirm by inspecting the private subnet’s route table — there should be a 0.0.0.0/0 row whose target is the NAT gateway.

Task in a private subnet without internet egress (VPC endpoints). Create three endpoints in the VPC, all attached to the private subnets:

com.amazonaws.{region}.ecr.api (Interface endpoint)
com.amazonaws.{region}.ecr.dkr (Interface endpoint)
com.amazonaws.{region}.s3 (Gateway endpoint, with a route entry added to the route table)

All three are required on Fargate platform version 1.4.0 and later, as documented in the official Amazon ECR interface VPC endpoints page. The S3 gateway endpoint is the one most people forget — ECR stores image layers in S3, so without that endpoint the layers can never be pulled. The interface endpoints also need a security group that allows inbound TCP 443 from the task’s security group.

Last check: outbound from the task’s security group. Whichever network model you use, the task’s security group must allow outbound TCP 443 to the ECR or VPC endpoint addresses. Most teams who hit this issue have it locked down to only the load balancer.

Cause 3: The image or tag does not exist in ECR

You see:

CannotPullContainerError: pull image manifest has been retried 5 time(s): failed to resolve ref

This is usually one of three things: a typo in the task definition, a tag that was overwritten, or a tag that was deleted. ECS asked ECR for a specific reference and ECR said it does not exist.

Diagnose:

In the task definition, copy the full image value.
Run aws ecr describe-images --repository-name {repo} --image-ids imageTag={tag} for that tag.
If the command errors with ImageNotFoundException, the tag is the problem. If it returns metadata, you have a permissions or network problem instead, not this cause.

Fix:

Push the image again under that tag, or update the task definition to a tag that actually exists.
Stop using :latest for production. ECS enforces image version stability, and the moment the underlying digest for :latest changes or is deleted, all tasks pulling it stop. Use semantic tags or the image digest (@sha256:...) instead.

Cause 4: The image architecture does not match the task

You see:

ref pull has been retried 1 time(s): failed to copy: httpReaderSeeker: failed open: unexpected status code

This often points to an architecture mismatch. The image is built for one CPU architecture (commonly ARM64) and the task is scheduled on another (commonly AMD64), or it is a Linux image being scheduled on a Windows task, or vice versa.

This used to be rare. It became common as more developers moved to Apple Silicon (M1, M2, M3, M4), which builds ARM64 images by default. If you docker build on a Mac and push without a platform flag, you push an ARM64 image. ECS Fargate defaults to AMD64 unless you tell it otherwise.

Fix. Either build for the right platform, or build a multi-architecture image so it works on both.

For a single platform that matches your ECS runtime:

docker buildx build --platform linux/amd64 -t {repo}:{tag} . --push

For a multi-architecture image:

docker buildx build --platform linux/amd64,linux/arm64 -t {repo}:{tag} . --push

The multi-arch version is the safer default because the same image works on Fargate AMD64, Fargate ARM (Graviton), and your local laptop.

Cause 5: Disk space or ephemeral storage is too small

You see (EC2 launch type):

write /var/lib/docker/tmp/GetImageBlob{xxxxxxxxxx}: no space left on device

You see (Fargate):

ref pull has been retried 1 time(s): failed to extract layer no space left on device: unknown

The image is larger than the disk available to extract it. On EC2, this means the container instance is full. On Fargate, this means the task’s ephemeral storage (20 GiB by default) is too small for the image.

Fix on EC2. Identify the largest consumers with du -Sh / | sort -rh | head -20 on the instance. Usually it is old container logs in /var/lib/docker/containers/. Set a max-size on the json-file log driver in your task definition, or switch to the awslogs driver so logs go to CloudWatch instead of disk.

Fix on Fargate. Increase ephemeral storage in the task definition. The field is ephemeralStorage.sizeInGiB and the maximum is 200 GiB. Most images do not need that much, but if you are pulling a 5 GiB image with a Node node_modules layer or a multi-stage build that does not stage properly, the default 20 GiB can be tight. Slim the image first if you can — a .dockerignore file and multi-stage builds usually cut image size by half or more.

Cause 6: Docker Hub rate limit

You see:

ERROR: toomanyrequests: Too Many Requests or You have reached your pull rate limit.

The image lives on Docker Hub and you have hit the anonymous pull rate limit (100 pulls per 6 hours from an IP) or the authenticated free tier (200 pulls per 6 hours). This usually shows up in CI environments where many tasks share an outbound IP via NAT.

Fix, in order of effort:

Authenticate the pull. Store Docker Hub credentials in AWS Secrets Manager and reference the secret in your task definition’s repositoryCredentials block. This doubles your limit and de-anonymizes the request.
Use ECR pull through cache. Create a pull-through cache rule that proxies Docker Hub images through ECR. After the first pull, subsequent ones come from ECR with no Docker Hub limit. Change the task definition image to the ECR URI of the cached repo.
Pay for Docker Hub. A paid Docker plan increases the rate limit. Worth it only if you cannot move off Docker Hub for organizational reasons.

The pull-through cache is the right answer for most teams.

Cause 7: Missing the S3 gateway endpoint (subtle VPC case)

You see:

Context canceled

Or:

pull command failed: panic: runtime error: invalid memory address or nil pointer dereference

This is the trap that catches teams who set up the two ECR interface endpoints but forgot the S3 gateway endpoint. ECR stores its image layers in S3. The interface endpoints handle the API calls; the S3 gateway endpoint handles the actual layer downloads. Without the S3 endpoint, the API authenticates fine and then the pull hangs.

Fix:

Add a com.amazonaws.{region}.s3 Gateway endpoint to the VPC.
In the route table for the private subnets, add a route whose destination is the prefix list for S3 in that region and whose target is the gateway endpoint. The console adds this automatically if you select the right route tables when creating the endpoint; check it is actually there.

The same fix applies if your security group rules look correct but pulls still time out with the API responding fast.

A faster way to diagnose, in order

If you do not yet know which cause it is, do these in this order. Each step rules out the most common case before the rarer ones.

Read the full error message. The text after CannotPullContainerError matches one of the seven sections above almost word-for-word.
Check the task execution role attachments. If AmazonECSTaskExecutionRolePolicy is not there for Fargate (or AmazonEC2ContainerRegistryReadOnly for EC2), you found it. Fix and re-run.
From any working machine in the same VPC, run aws ecr get-login-password --region {region} and then attempt a docker pull against the same image URI. If this fails, the network is the issue, not ECS.
Run aws ecr describe-images to confirm the tag exists. If it does not, you are in cause 3.
Check the image manifest with docker buildx imagetools inspect {image}. If it does not list linux/amd64, you are in cause 4.

In most production cases the answer is found in step one. The exact wording of the stopped reason is the actual lookup key.

Prevention: a short checklist for new ECS services

If you do these once when you set up the service, six of the seven causes never happen:

The task execution role exists and has AmazonECSTaskExecutionRolePolicy attached.
The image tag in the task definition is a real, immutable tag (not :latest).
The build pipeline uses docker buildx --platform linux/amd64 (or multi-arch).
The VPC has either NAT internet egress or the three VPC endpoints (ecr.api, ecr.dkr, s3).
The task security group allows outbound TCP 443.
For Fargate, the task ephemeral storage is sized for the image, not the default 20 GiB if your image is large.

If you also use a base image from Docker Hub, set up the ECR pull-through cache. That covers the last case.

Frequently asked questions

Why does CannotPullContainerError say “no space left on device”?

The task ran out of disk while extracting the image. On Fargate, your ephemeral storage (20 GiB by default) is too small for the image layers. Raise ephemeralStorage.sizeInGiB in the task definition (up to 200 GiB), or rebuild the image smaller with .dockerignore and multi-stage builds. On EC2, the container instance disk is full — clear /var/lib/docker/containers/ logs or switch the log driver to awslogs.

Why does CannotPullContainerError say “pull access denied”?

The task execution role does not have permission to read from ECR, or the image is private and no credentials are attached. Attach the AWS managed policy AmazonECSTaskExecutionRolePolicy to the task execution role for Fargate, or AmazonEC2ContainerRegistryReadOnly to the EC2 instance profile. For a private Docker Hub image, add repositoryCredentials to the task definition pointing at a Secrets Manager secret.

Why does CannotPullContainerError say “toomanyrequests”?

You hit the Docker Hub anonymous pull rate limit, which is 100 pulls per 6 hours per outbound IP. Fix it in order of effort: authenticate the pull with repositoryCredentials (doubles the limit), set up an ECR pull-through cache so subsequent pulls come from ECR instead of Docker Hub, or move the base image into ECR entirely. The pull-through cache is the right answer for most teams.

Why does CannotPullContainerError say “i/o timeout” or “dial tcp”?

The task cannot reach the ECR endpoint over the network. If the task is in a private subnet, you need either a NAT gateway with a default route to 0.0.0.0/0, or three VPC endpoints — com.amazonaws.<region>.ecr.api, com.amazonaws.<region>.ecr.dkr, and the com.amazonaws.<region>.s3 gateway endpoint. Check the route tables and the task’s security group egress on TCP 443.

Why does CannotPullContainerError say “failed to resolve ref” or “image manifest not found”?

The tag in your task definition does not exist in the repository. The most common causes are a typo in the image field, a :latest tag that was overwritten, or a tag that was deleted by a lifecycle policy. Verify the tag with aws ecr describe-images --repository-name <repo> --image-ids imageTag=<tag>. If the command returns ImageNotFoundException, push the image again or change the task definition to a tag that exists.

Why does CannotPullContainerError say “Context canceled” in a VPC endpoint setup?

You set up the two ECR interface endpoints but forgot the S3 gateway endpoint. ECR stores image layers in S3, so the API authenticates fine and then the layer download hangs and times out. Add a com.amazonaws.<region>.s3 gateway endpoint and confirm the route table for the private subnets points at the gateway endpoint for the S3 prefix list.

What is CannotPullContainerError in ECS?

It is the error ECS raises when a task stops before its container starts, because the image could not be retrieved from the registry. The cause is in the message after the error category — there are seven distinct causes documented by AWS, each with its own fix.

How do I see the full CannotPullContainerError message?

Open the ECS console, go to your service’s stopped tasks, click the task, and read the Stopped reason field. On Fargate 1.4 the message can be truncated in the console; aws ecs describe-tasks returns the full text.

Why does CannotPullContainerError happen in a private subnet?

The task has no route to the ECR endpoint. Either add a NAT gateway with a default route, or three VPC endpoints — ecr.api, ecr.dkr, and the S3 gateway endpoint. Missing the S3 endpoint is the most commonly overlooked piece.

Does CannotPullContainerError mean my image is bad?

Not necessarily. The image might be perfect and the task still fails to pull it for permission, network, or rate-limit reasons. Match the exact text after CannotPullContainerError to one of the seven causes above before changing the image.

What is the difference between CannotPullContainerError and CannotStartContainerError?

CannotPullContainerError happens before the container exists locally — ECS could not get the image. CannotStartContainerError happens after the image is pulled, when ECS tries to run it. Different stages, different causes, different fixes.

Should I use `:latest` as my image tag in ECS?

No. ECS enforces image version stability based on the digest, and when the digest behind :latest changes, in-flight tasks fail to pull. Use semantic tags or the image digest in production task definitions.

Where to go next

If you are still setting up ECS, the ECS introduction covers how task definitions, services, and clusters fit together. If the image registry side is new to you, the ECR guide walks through how repositories, tags, and permissions work. If you are starting a new service from scratch and want the network and IAM defaults already done for you, ECS Express Mode handles all of cause 1 and cause 2 automatically. And if you have been deploying with AWS Copilot CLI, note that the tool reaches end of support on June 12, 2026 — the migration paths are in that article. If the pull error showed up during a rollout, the deployment strategies guide covers how each rollout style (rolling, blue-green, canary) reacts when an image is unpullable, which can change which root cause shows up first.

Save the seven-cause map. The next time CannotPullContainerError shows up, you will spend a minute matching the message, not an afternoon guessing.

Quick error message lookup

First, get the exact error message

Cause 1: The task execution role is missing ECR permissions

Cause 2: The task cannot reach ECR over the network

Cause 3: The image or tag does not exist in ECR

Cause 4: The image architecture does not match the task

Cause 5: Disk space or ephemeral storage is too small

Cause 6: Docker Hub rate limit

Cause 7: Missing the S3 gateway endpoint (subtle VPC case)

A faster way to diagnose, in order

Prevention: a short checklist for new ECS services

Frequently asked questions

Why does CannotPullContainerError say “no space left on device”?

Why does CannotPullContainerError say “pull access denied”?

Why does CannotPullContainerError say “toomanyrequests”?

Why does CannotPullContainerError say “i/o timeout” or “dial tcp”?

Why does CannotPullContainerError say “failed to resolve ref” or “image manifest not found”?

Why does CannotPullContainerError say “Context canceled” in a VPC endpoint setup?

What is CannotPullContainerError in ECS?

How do I see the full CannotPullContainerError message?

Why does CannotPullContainerError happen in a private subnet?

Does CannotPullContainerError mean my image is bad?

What is the difference between CannotPullContainerError and CannotStartContainerError?

Should I use :latest as my image tag in ECS?

Where to go next

Should I use `:latest` as my image tag in ECS?