Autoscale large images faster using Longhorn (distributed storage)
While running ML models in production is already a difficult task, doing so on Kubernetes takes the game to a whole new level. The biggest issue it introduces is the absurd image sizes, which in my case ranged from 20GB to more than 40GB. These images take an absurd amount of time to autoscale.
Understanding the problem
The model data in our images was clearly the cause of our large image sizes. As a result, we had to extract the model from the image so that it could be downloaded faster from the registry and scaled. There are two obvious solutions to this problem:
Downloading model data from GS Bucket
This reduces the size of the docker images, but our models will not be able to start until the data is available to our code, and remember that the network is the bottleneck here.
PS: You can quickly copy data from cloud storages by using some handy utilities, which I’ll link to at the bottom. This would have reduced image size as well as model spin up time, but we are looking for a better solution.
Using a NFS in Kubernetes
We decided to use a persistent volume to save all of our models and then use a claim to mount the volumes in the pod. This solves the network bandwidth issue because the data is available in our cluster and we only have to download it once.
Longhorn, when compared to Filestore, theoretically provided the same benefits at a 6 times lower cost.
So let’s get started.
- Fire pipeline to download model data from the Google Cloud Storage
- Pipeline connects to a bastion host which exec’s into a pod in the cluster dedicated to update the persistent volume claim
- The pod downloads data from Google Cloud Storage
- The pod then syncs the data to the persistent volume claim
- Once the data has been updated in the volume, a fresh deployment is launched to create new pods that will use the new data. We wanted to have a manual trigger to switch to the new model because the model file is so large to make sure the download is clean.
We have the gitlab agent to sync all the yaml from the gitlab repository to our cluster. Hence we can just push the code to our codebase while gitlab agent does the heavy lifting to sync it to the cluster.
GKE requires a user to manually claim themselves as cluster admin to enable role-based access control. Before installing Longhorn, run the following command:
kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=<email@example.com>
Creation of node pool
Create a new node pool in your GKE cluster just for Longhorn. For CSI driver compatibility, it is critical to change the image type to Ubuntu. You can also change the boot disc to an SSD for faster I/O.
Copy the content of the official installation yaml in a yaml file
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.3.1/deploy/longhorn.yaml
Add the node affinity blocks listed below to all deployments and daemonset. Longhorn will now only be scheduled in the specialised nodepools you’ve created for it.
- key: cloud.google.com/gke-nodepool
Once you have created a volume and PVC you should then easily be able to
Other hacks to improve performance by over 50%
- If you’re on GKE, container image streaming is a must. Even for large image like ours GKE is able to spin pods up blazingly fast.
- Use the gcloud alpha command to take benefits of multi threading while downloading. This easily increases the download performace by 79%–94%. On AWS you can use S5CMD.