Learn How to Optimize Docker Hub Costs With Our Usage Dashboards

By: Melissa Sussmann

13 November 2024 at 21:46

Effective infrastructure management is crucial for organizations using Docker Hub. Without a clear understanding of resource consumption, unexpected usage can emerge and skyrocket. This is particularly true if pulls and storage needs are not budgeted and forecasted correctly. By implementing proactive post controls and monitoring usage patterns, development teams can sustain their Docker Hub usage while keeping expenses under control.

To support these goals, we’ve introduced new Docker Hub Usage dashboards, offering organizations the ability to access and analyze their usage patterns for storage and pulls.

Docker Hub’s Usage dashboards put you in control, giving visibility into every pull and image your Docker systems request. Each pull and cache becomes a deliberate choice — not a random event — so you can make every byte count. With clear insights into what’s happening and why, you can design more efficient, optimized systems.

Reclaim control and manage technical resources by kicking bad habits

hub usage f1 — Figure 1: Docker Hub Usage dashboards.

The Docker Hub Usage dashboards (Figure 1) provide valuable insights, allowing teams to track peaks and valleys, detect high usage periods, and identify the images and repositories driving the most consumption. This visibility not only aids in managing usage but also strengthens continuous improvement efforts across your software supply chain, helping teams build applications more efficiently and sustainably.

This information helps development teams to stay on top of challenges, such as:

Redundant pulls and misconfigured repositories: These can quickly and quietly drive up technical expenses while falling out of scope of the most relevant or critical use cases. Docker Hub’s Usage dashboards can help development teams identify patterns and optimize accordingly. They let you view usage trends across IPs and users as well, which helps with pinpointing high consumption areas and ensuring accountability in an organization when it comes to resource management.
Poor caching management: Repository insights and image tagging helps customers assess internal usage patterns, such as frequently accessed images, where there might be an opportunity to improve caching. With proper governance models, organizations can also establish policies and processes that reduce the variability of resource usage as a whole. This goal goes beyond keeping track of seasonality usage patterns to help you design more predictable usage patterns so you can budget accordingly.
Accidental automation: Accidental automated system activities can really hurt your usage. Let’s say you are using a CI/CD pipeline or automated scripts configured to pull images more often than they should. They may pull on every build instead of the actual version change, for example.

Usage dashboards can help you identify these inefficiencies by showing detailed pull data associated with automated tooling. This information can help your teams quickly identify and adjust misconfigured systems, fine-tune automations to only pull when needed, and ultimately focus on the most relevant use cases for your organization, avoiding accidental overuse of resources:

Details of Docker Hub Usage Dashboard with columns for Date/hour, Username, Repository library, IPs, Version checks, pulls, and more. — Figure 2: Details from the Usage dashboards.

Docker Hub’s Usage dashboards offer a comprehensive view of your usage data, including downloadable CSV reports that include metrics such as pull counts, repository names, IP addresses, and version checks (Figure 2). This granular approach allows your organization to gain valuable insights and trend data to help optimize your team’s workflows and inform policies.

Integrate robust operational principles into your development pipeline by leveraging these data-driven reports and maintain control over resource consumption and operational efficiency with Docker Hub.

Learn more

Subscribe to the Docker Newsletter.
View Docker Hub Usage dashboards.
Take steps to optimize and manage your usage.
Get the latest release of Docker Desktop.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Docker Best Practices: Using Tags and Labels to Manage Docker Image Sprawl

Docker

By: Jay Schmidt

1 October 2024 at 19:51

With many organizations moving to container-based workflows, keeping track of the different versions of your images can become a problem. Even smaller organizations can have hundreds of container images spanning from one-off development tests, through emergency variants to fix problems, all the way to core production images. This leads us to the question: How can we tame our image sprawl while still rapidly iterating our images?

A common misconception is that by using the “latest” tag, you are guaranteeing that you are pulling the “latest” version of the image. Unfortunately, this assumption is wrong — all latest means is “the last image pushed to this registry.”

Read on to learn more about how to avoid this pitfall when using Docker and how to get a handle on your Docker images.

Using tags

One way to address this issue is to use tags when creating an image. Adding one or more tags to an image helps you remember what it is intended for and helps others as well. One approach is always to tag images with their semantic versioning (semver), which lets you know what version you are deploying. This sounds like a great approach, and, to some extent, it is, but there is a wrinkle.

Unless you’ve configured your registry for immutable tags, tags can be changed. For example, you could tag my-great-app as v1.0.0 and push the image to the registry. However, nothing stops your colleague from pushing their updated version of the app with tag v1.0.0 as well. Now that tag points to their image, not yours. If you add in the convenience tag latest, things get a bit more murky.

Let’s look at an example:

FROM busybox:stable-glibc

# Create a script that outputs the version
RUN echo -e "#!/bin/sh\n" > /test.sh && \
    echo "echo \"This is version 1.0.0\"" >> /test.sh && \
    chmod +x /test.sh

# Set the entrypoint to run the script
ENTRYPOINT ["/bin/sh", "/test.sh"]

We build the above with docker build -t tagexample:1.0.0 . and run it.

$ docker run --rm tagexample:1.0.0
This is version 1.0.0

What if we run it without a tag specified?

$ docker run --rm tagexample
Unable to find image 'tagexample:latest' locally
docker: Error response from daemon: pull access denied for tagexample, repository does not exist or may require 'docker login'.
See 'docker run --help'.

Now we build with docker build . without specifying a tag and run it.

$ docker run --rm tagexample
This is version 1.0.0

The latest tag is always applied to the most recent push that did not specify a tag. So, in our first test, we had one image in the repository with a tag of 1.0.0, but because we did not have any pushes without a tag, the latest tag did not point to an image. However, once we push an image without a tag, the latest tag is automatically applied to it.

Although it is tempting to always pull the latest tag, it’s rarely a good idea. The logical assumption — that this points to the most recent version of the image — is flawed. For example, another developer can update the application to version 1.0.1, build it with the tag 1.0.1, and push it. This results in the following:

$ docker run --rm tagexample:1.0.1
This is version 1.0.1

$ docker run --rm tagexample:latest
This is version 1.0.0

If you made the assumption that latest pointed to the highest version, you’d now be running an out-of-date version of the image.

The other issue is that there is no mechanism in place to prevent someone from inadvertently pushing with the wrong tag. For example, we could create another update to our code bringing it up to 1.0.2. We update the code, build the image, and push it — but we forget to change the tag to reflect the new version. Although it’s a small oversight, this action results in the following:

$ docker run --rm tagexample:1.0.1
This is version 1.0.2

Unfortunately, this happens all too frequently.

Using labels

Because we can’t trust tags, how should we ensure that we are able to identify our images? This is where the concept of adding metadata to our images becomes important.

The first attempt at using metadata to help manage images was the MAINTAINER instruction. This instruction sets the “Author” field (org.opencontainers.image.authors) in the generated image. However, this instruction has been deprecated in favor of the more powerful LABEL instruction. Unlike MAINTAINER, the LABEL instruction allows you to set arbitrary key/value pairs that can then be read with docker inspect as well as other tooling.

Unlike with tags, labels become part of the image, and when implemented properly, can provide a much better way to determine the version of an image. To return to our example above, let’s see how the use of a label would have made a difference.

To do this, we add the LABEL instruction to the Dockerfile, along with the key version and value 1.0.2.

FROM busybox:stable-glibc

LABEL version="1.0.2"

# Create a script that outputs the version
RUN echo -e "#!/bin/sh\n" > /test.sh && \
    echo "echo \"This is version 1.0.2\"" >> /test.sh && \
    chmod +x /test.sh

# Set the entrypoint to run the script
ENTRYPOINT ["/bin/sh", "/test.sh"]

Now, even if we make the same mistake above where we mistakenly tag the image as version 1.0.1, we have a way to check that does not involve running the container to see which version we are using.

$ docker inspect --format='{{json .Config.Labels}}' tagexample:1.0.1
{"version":"1.0.2"}

Best practices

Although you can use any key/value as a LABEL, there are recommendations. The OCI provides a set of suggested labels within the org.opencontainers.image namespace, as shown in the following table:

Label	Content
org.opencontainers.image.created	The date and time on which the image was built (string, RFC 3339 date-time).
org.opencontainers.image.authors	Contact details of the people or organization responsible for the image (freeform string).
org.opencontainers.image.url	URL to find more information on the image (string).
org.opencontainers.image.documentation	URL to get documentation on the image (string).
org.opencontainers.image.source	URL to the source code for building the image (string).
org.opencontainers.image.version	Version of the packaged software (string).
org.opencontainers.image.revision	Source control revision identifier for the image (string).
org.opencontainers.image.vendor	Name of the distributing entity, organization, or individual (string).
org.opencontainers.image.licenses	License(s) under which contained software is distributed (string, SPDX License List).
org.opencontainers.image.ref.name	Name of the reference for a target (string).
org.opencontainers.image.title	Human-readable title of the image (string).
org.opencontainers.image.description	Human-readable description of the software packaged in the image (string).

Because LABEL takes any key/value, it is also possible to create custom labels. For example, labels specific to a team within a company could use the com.myorg.myteam namespace. Isolating these to a specific namespace ensures that they can easily be related back to the team that created the label.

Final thoughts

Image sprawl is a real problem for organizations, and, if not addressed, it can lead to confusion, rework, and potential production problems. By using tags and labels in a consistent manner, it is possible to eliminate these issues and provide a well-documented set of images that make work easier and not harder.

Learn more

Read the Docker Best Practices collection.
Subscribe to the Docker Newsletter.
Get the latest release of Docker Desktop.
Vote on what’s next! Check out our public roadmap.
Have questions? The Docker community is here to help.
New to Docker? Get started.

Reading view

Reclaim control and manage technical resources by kicking bad habits

Learn more

Using tags

Using labels

Best practices

Final thoughts

Learn more