Building A Python Docker Image using Private PyPI Repository

One of the greatest advantages of our Orquestra Platform is that it’s hardware agnostic. That means users can run their machine learning and other application workflows on both classical and quantum hardware using cloud computing resources. This abstraction is great for running experiments, benchmarking different hardware backends, and ultimately choosing the best hardware configuration for your given use case — with the flexibility to swap in more powerful hardware down the line.

To build this abstraction into the platform, we’re using the Ray framework on Kubernetes. A Ray cluster consists of multiple nodes where each node corresponds to a Kubernetes pod. To use our proprietary Python code on Ray nodes, we needed to build a custom Docker image that contain Python modules which are stored in our private PyPI repository. In this post, I will describe how we implemented this Docker image and avoided possible security issues along the way.

The Naive Approach

To build the Docker image and install the Python dependencies, I have first tried to use basic authentication in the index URL passed to pip as follows:

RUN pip install --index-url https://username:password@nexus.example.io orquestra-runtime

This works. And since this Dockerfile is stored in a private version control repository, you might think that’s OK. However, it’s not a good idea to have the credentials in clear text. Anyone who has access to your version control repository can have access to your PyPI repository in case of a security breach in your version control system.

Using Environment Variables

A common technique we use when we don’t want to provide sensitive information directly is obtaining it from an environment variable and injecting it from a secure source such as Vault. I couldn’t do it in this case because pip doesn’t let you pass credentials in environment variables. You can only use basic authentication or a .netrc file.

.netrc file needs to be stored in your user’s home directory and contain the credentials for the server you’re using.

machine nexus.example.io

login emre-aydin

password qwerty

In order for the Docker build process to use .netrc, we need to make it available to the pip process that installs the dependency. For that, we have to COPY the file into the container before executing pip. To make sure that the user code which will work on this Docker container doesn’t have access to the credentials, we delete .netrc afterwards.

FROM python:3.9

COPY .netrc .

RUN pip install --index-url https://nexus.example.io orquestra-runtime && rm /root/.netrc

Knowing how Docker container images are file layers stored on top of each other, I wondered whether the file was still readable somehow. To see the layers of the image, I have used the dive tool. In the following screenshot, you can see how the .netrc file that we intended to keep as a secret is stored in one of the layers in the image.

If someone acquires access to the Docker agent that runs on our clusters or grasp the Docker image in another way, they will have access to the credentials for our Nexus repository as well.

Enter Docker Secret Mounts

Docker BuildKit has a neat feature called secret mounts to solve this problem. It’s a way to mount local files secretly during the build process. This way, the secret information that we store in the .netrc file doesn’t end up stored in the final image or any of its layers.

FROM python:3.9

RUN --mount=type=secret,id=netrc,uid=1000 \

   cp /run/secrets/netrc /root/.netrc \

   && pip install --index-url https://nexus.example.io orquestra-runtime

The secret with ID netrc above is injected to the GitHub Actions workflow by GitHub using the repository secrets which means it comes from a secure source and not accessible in plain text by anyone.

1 - name: Build and push 2 uses: docker/build-push-action@v3 3 with: 4 context: . 5 push: true 6 secrets: | 7 "netrc=${{ secrets.NET_RC }}" 8 tags: nexus.example.io/ray 9

Conclusion

It might be tempting and easy to pass secret information in your build process but it might have unforeseen security implications. In this post, we have seen such an example and found a solution to it by improving it incrementally.