Consuming packages from a private Azure Pipelines Python artifact feed

view gist on github

June 12, 2019
gist

Consuming Azure Pipelines Python artifact feeds in Docker

Recently, I was building out a set of Python packages and services and needed to find a way to pull down packages from an Azure Artifacts feed into a Docker image. It was straightforward to use the tasks to package an artifact, authenticate to the feed, and publish.

I had to do a bit more digging to piece together a flow I was comfortable with for building container images. This post describes some of the challenges involved and how I solved for them.

What’s the problem?

The PipAuthenticate task is great - it authenticates with your artifacts feed and per the docs, will store the location of a config file that can be used to connect in the PYPIRC_PATH environment variable.

That said - by design, containers run in an isolated environment. We can’t directly access it while building a container image. We need a way to get that config inside the build phase so that our calls to python -m pip install are successful. You are using a virtual environment & python -m pip install to install packages, right?

Challenge 1: No volumes at build time!

Docker doesn’t currently support* mounting volumes at build time. So we can’t just mount our PYPIRC_PATH file from the Azure Pipelines host into the build.

It would be much easier to pass a string as a --build-arg to Docker and then consume it. Azure Pipelines tasks are open source on GitHub, so I thought I’d take a look to see how the task worked and possibly extend it. It turns out that the PipAuthenticate task has some undocumented behavior bonus features and it already does what I want! It populates the PIP_EXTRA_INDEX_URL environment variable, which is automatically picked up by pip.

*Well, sort of! You can solve this with --mount=type=secret when you enable BuildKit. If this was a personal project, I’d have stopped there and said #shipit! In this case, I was really looking to find something that works for all users and isn’t explicitly marked “experimental”.

Challenge 2: Keep it secret, keep it safe!

Great! We pass in our build arg, set ENV PIP_EXTRA_INDEX_URL=$PIP_EXTRA_INDEX_URL and call it a day, right! Right…?

Not so fast - we want to have PIP_EXTRA_INDEX_URL available when we pull packages, but we don’t want secret environment variables baked into any of the layers of a runtime image. So we’ll combine what we’ve learned so far with a multi-stage build and we’re off to the races!

Bonus!

In my real container build, I needed to install gcc, musl-dev, python3-dev and a bunch of other things to pull down my dependencies & build wheels - so a multi-stage build drops my final image size from >1GB down to ~100MB anyway

Wrapping up

I’ve attached a few sample files that I pulled from my working pipeline to get you started with this approach. I hope this helps and plan for this post to be soon obsolete after I complete a few pull requests into Microsoft docs! :)

Dockerfile

azure-pipelines-docker.yml

azure-pipelines-package.yml