A lot goes into training a model: cleaning your data, versioning it, splitting your data for training and validation, and then the painstaking process of training your model and sharing the findings with your team members.
If you work in an NLP-focused company like mine, which relies on ML models and researcher collaboration to share data and models, the entire lifecycle management process can quickly become a disaster.
Until we arrived at weights and biases.
Keeping marketing jargons aside, Let me quickly walk you through how to deploy it on your private cloud.
Infra required:
1. Redis
2. Pub/Sub
3. Cloud Storage
4. MySQL
5. Kubernetes cluster
Notice: We will not be focusing on the security of the resources here, it would help you a lot to deploy these resources with appropriate security compliances as per your company.
Deploying Redis
Head over to the memory store, where you should be able to create an instance. Enable AUTH for the instance.
Head over to the security section of the instance, and you should be able to see the AUTH string for your Redis instance.
REDIS: redis://:<REDIS_PASSWORD>@<REDIS_IP>:<REDIS_PORT>/0
Save this string to be used later.
Creating a service account with required permissions
Go to service accounts and make a service account that has the right permissions to use pub/sub and cloud logging.
Creating a Pub/Sub Queue and cloud bucket
Create a pub/sub queue and a cloud bucket for W&B to access.
Give the service account you created admin rights to your bucket.
We will then need to setup a notification channel on pub/sub for W&B to know about the changes in the bucket. That you can do by running the following command:
gcloud storage buckets notifications create gs://BUCKET_NAME --topic=TOPIC_NAME
Creating your SQL server
Go ahead and create an SQL instance for W&B to use. Make sure to change the network settings so that the pods in your K8s cluster are able to communicate to the SQL server
You should then have a user with a password to access this SQL instance.
MYSQL: mysql://<MYSQL_USER>:<MYSQL_PASSWORD>@<MYSQL_DATABASE>/<DATABASE_NAME>
Deploying your kubernetes service
apiVersion: v1
kind: ConfigMap
metadata:
name: wandb-config
data:
LICENSE: <W&B_ENTERPRISE_LICENSE>
MYSQL: mysql://<MYSQL_USER>:<MYSQL_PASSWORD>@<MYSQL_DATABASE>/<DATABASE_NAME>
BUCKET: gs://<BUCKET_NAME>
HOST: "https://wandb.quillbot.dev"
REDIS: redis://:<REDIS_PASSWORD>@<REDIS_IP>:<REDIS_PORT>/0
BUCKET_QUEUE: pubsub:/<GOOGLE_PROJECT_NAME>/<TOPIC_NAME>/<SUBSCRIPTION_NAME>
LOGGING_ENABLED: "true"
SLACK_CLIENT_ID: "858162898657.1612441232960"
SLACK_SECRET: 366951cdad42a3a561f97de0951a9930
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: wandb
spec:
selector:
matchLabels:
app: wandb
template:
metadata:
labels:
app: wandb
spec:
volumes:
- name: google-cloud-key
secret:
secretName: service-account-credentials
containers:
- name: wandb
volumeMounts:
- mountPath: /var/secrets/google
name: google-cloud-key
image: wandb/local
envFrom:
- configMapRef:
name: wandb-config
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/key.json
resources:
limits:
memory: "1G"
cpu: "500m"
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: wandb
spec:
type: ClusterIP
selector:
app: wandb
ports:
- port: 80
targetPort: 8080
---
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: dev-gateway
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: tls-certs
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: wandb-ingress
spec:
hosts:
- "*"
gateways:
- dev-gateway
http:
- route:
- destination:
host: wandb
port:
number: 80