About docker registry and the company

Hi all,

last month there are two important piece of news that I believe should not pass under the radar regarding the default docker registry.

First, stale images will be deleted from the docker hub. A stale image is an image that did not receive either pull or push in the last six months.

Second, pull to images will be limited to 200 pulls every 6 hours.

These changes are more than acceptable from a developer, or small company perspective. However I am afraid that for our needs we require something more.

For instance DUCC and unpacked.cern.ch are definitely above the 200 pulls every 6 hours limit. And unfortunately I donā€™t see any way to bring them down without increasing the latency between publication on the docker hub and publication on unpacked.cern.ch.

I am also afraid that a lot of scientific code is just forgotten in some docker hub image, and that code will eventually be lost.

Gitlab registries provide a nice solution to this problem. But rely on them should be a conscious and deliberate choice, not just the result of inertia and using the most convenient tool at the moment.

CERN has the capacity to operate an image registry for our needs and not doing so should be our deliberate choice.

I see a risk on the use of Gitlab registries. They are not the core business of gitlab but a complementary service. There is no long-term guarantee on their availability and support. Moreover, if the financial situation of gitlab changes, they could decide to move them in another price tier too expensive for CERN.

I would like to know what is your opinion on the matter.

Cheers,
Simone

We already have registry.cern.ch which is based on Harbor (a CNCF graduated project). It is not used for container images today but for other OCI artifacts like helm charts, gitlab remains as the recommended image registry.

Thereā€™s ongoing discussion and work to improve the registry functionality we have at CERN, including vulnerability scans, artifact signing, among many others - and also how to integrate gitlab with an external registry.

I donā€™t think we need to worry about gitlabā€™s commitment to its registry, thereā€™s a clear and easy transition if it would ever be necessary.

At least for the time being, registry.cern.ch seems to be available only internally. What we have in mind is a cvmfs-enabled container registry for the benefit of WLCG sites.

Yes, and itā€™s not used for docker images. I was pointing out that the transition from gitlab wouldnā€™t be hard if ever needed. I didnā€™t get this was a request for a cvmfs backed registry from the previous message.

Not so much a request :slightly_smiling_face:
But the Docker announcement might be a good motivation to seriously discuss the registry question.

A Docker Pro plan is only $60 per year, so wouldnā€™t that be a lot easier/cheaper than switching to a separate registry?

Yep, thatā€™s our secret backup plan, at least for the time being. Docker might raise the prices eventuallyā€¦

Our small-ish k8s cluster (~ 500 cores) was often getting 504 gateway timeouts when pulling images from CERN gitlab. This is with the Docker graph driver plugin, so each ā€œimageā€ is really only a few KB of JSON.

Changing from using the ā€˜latestā€™ tag to an explicit tag sidestepped the problem by avoiding the need to check for image updates, which is the best practice for production anyway. But it doesnā€™t make me feel confident about large scale use of the gitlab registry.

Hi,

In ATLAS we are talking about organising a meeting to talk about this. Itā€™s clear that if we move stuff to CERN we need to ask for resources. Gitlab, as Ryan says, is not powerful enough, not even for small images so we need to either get gitlab in shape or have a different registry and it needs of course to be discussed with you guys for the interaction with CVMFS. Itā€™s likely 1 meeting will be not enough.

cc @lheinric with whom we were discussing.

1 Like

Hi,

we raised the issue in the IT-ATLAS meeting and this was brought tothe attention of CERN-IT. It needs an executive decision from CERN-IT management. The more groups make noise the more likely is we get resources.

The Docker company has done so much for us, I donā€™t think we should begrudge giving them $60 per year. Think of it is as community support. Even a 1-hour meeting with CERN-IT will cost CERN way more than that in people time. Itā€™s not even worth the paperwork to get reimbursed for that amount. I volunteer to personally create and pay for an account if no-one else wants to do it.

Hi Dave,

they did indeed do a lot for us but after looking at the pricing my understanding is that with $60 a month only the authenticated owner has unlimited downloads. Which is I donā€™t think it is workable.

cheers
alessandra

Docker pricing for the Pro plan is $5 per month. I am assuming that containers will be distributed to the grid through cvmfs, so we should mainly care only about the cost to download to unpacked.cern.ch and its successor for distributing per-user containers. gitlab.cern.ch would never be able to reasonably sustain a rate of downloading individual user containers to grid nodes.

Ok the separate issue is long-term archival of containers. I agree it makes sense to use gitlab for that, but if users only upload there containers that they want to keep long term, itā€™s not clear that gitlab isnā€™t already adequate for that.

Dave

I agree. Their current pricing makes some tasks still possible, e.g. converting images to unpacked.cern.ch. But we cannot have the grid getting images from dockerhub (which is anyway not a good idea). For archival, having them in cvmfs is an option, too.

ATLAS images are downloaded from other places than the grid too. Currently the bulk of pulls is not from the grid.