Introduction To Machine Learning Deployment Using Docker and Kubernetes


Deployment is perhaps one of the most overlooked topics in the Machine Learning world. But it most certainly is important, if you want to get into the industry as a Machine Learning Engineer (MLE). In this article, we will take a sober look at how painless this process can be, if you just know the small ins and outs of the technologies involved in deployment.
All the files for this project are available on GitHub, and you can perhaps use this project as a Hello World application, such that you have something running and later on replace it with something more complex.
Table of Contents (Click To Scroll)
- Setup & Installation
- What Is Docker And Kubernetes?
- A Sample Project
- The Deployment With Docker And Kubernetes
Setup & Installation
These are all the steps to set up your environment. We are going to be using Google Cloud Platform (GCP), as they are largely the leader in Kubernetes, and they are also the one’s who developed and open-sourced the internal project. The main benefits of using GCP is that it has a great UI/UX and is easy to set up. The same cannot be said for their competitors.
- You probably already have a Google account. Sign up for Google Cloud Platform and get $300 free credits. You have to enter your credit card, but it won’t be charged unless you give them permission. Remember to enable billing.
- Download the Google Cloud SDK.
- Export the SDK to PATH by finding the path to the SDK bin folder.
a. Windows: Add an environmental variable. 1) Open search and search for “Edit the environment variables”, 2) Click on the “Environment Variables” button at the bottom, 3) For your user, double click Path in “User variables for <user>” OR click new if it does not exist (Variable Name is “Path”). Click new and enter the path to the SDK bin folder and save.
b. MacOS: In Terminal, type innano ~/.bash_profile
and add the path to your bin folder as a new line in the file.export PATH="/Applications/google-cloud-sdk/bin"
Save by doing CTRL+O, press Enter/Return and press CTRL+X.
c. Linux: Depending on the distribution and what you have installed, there could be different profiles for Terminal. Checknano ~/.bash_profile
,nano ~/.bash_login
,nano ~/.profile
ornano ~/.bashrc
. Add the path to the bin folder of the SDK:export PATH="/Applications/google-cloud-sdk/bin"
Save by doing CTRL+O, press Enter/Return and press CTRL+X. - Download Docker. Note that you cannot use Docker with Windows Home – consider using a local Ubuntu VM by using Hyper-V on Windows instead.
a. Windows Pro/Enterprise/Education: Sign up and download Docker Desktop on Windows. Once logged in, you can download Docker. Make sure the program is running and that you are logged in locally.
b. MacOS: Sign up and download Docker Desktop on Mac. Once logged in, you can download Docker. Make sure the program is running and that you are logged in locally.
c. Linux: Use the three following command to download and start Docker:
1)apt install docker.io
, 2)systemctl start docker
, and 3)systemctl enable docker
. You can login withdocker login
if you have a registry you want to login in to. - After this is done, you should be able to type
gcloud init
and configure the SDK for the setup.
a. Type Y, press enter and log into your account when gcloud displays: “You must log in to continue. Would you like to log in (Y/n)?”
b. Type the number of your project, mine was 1, and press enter when gcloud displays: “Please enter numeric choice or text value (must exactly match list item)”
c. Type n and press enter when gcloud displays: “Do you want to configure a default Compute Region and Zone? (Y/n)?” - You need to use
gcloud auth configure-docker
in the Terminal to be able to push your containers later on. If this does not run, try restarting Terminal.
After you have prepared the tools, we need to create a cluster, such that we can go through using these tools to deploy a Machine Learning application.
Creating Your Cluster
Creating your cluster is very individual and down to what your needs are. For this tutorial, I simply need 1 node, since we are just deploying a toy dataset that just needs to work remotely. If you want to replicate my cluster, adjust the number of nodes to 1, make the machine type g1-small and create your cluster – these are the only steps I took for this article.
We will go through the options, but before please go ahead and create your cluster in the Kubernetes Engine.
Step 1: Choosing The Cluster Type For Your Clusters
Consider the resources your application needs. Are you doing Deep Learning? Then you will need GPUs, so you need to use the “GPU Accelerated Computing” cluster template in the left side.
If you want high availability, you should make use of the cluster template “Highly Available” in the left sidebar. If your application has a need for CPU power or lots of RAM, then choose those cluster templates.
Remember that you can actually combine these templates, so you can have highly available GPU nodes by using the location type of regional.

Step 2: Automatic Scaling, Automatic Upgrades And Automatic Repairs
While creating the cluster, you should pay attention to the menu under “Machine Configuration” > “More Options”.
What if your application hits the front page of Hacker News or some big blog or magazine? Are you sure that your application can handle that amount of traffic? You should enable autoscaling, such that Kubernetes automatically scales your application up and down when needed. Just enter the maximum number of nodes you are prepared to pay for if your application experiences a huge load.

Another great option is auto-upgrading and auto-repairing. The nodes can actually be automatically upgraded with no downtime, just by enabling auto-upgrade – this ensures there is fewer security flaws and the latest features of the stable version.
If a node is somehow failing or not ready to be used, Kubernetes can take care of automatically restoring and making the node work again. Auto-repairing does this, by monitoring all nodes with health checks, multiple times an hour.

After Your Cluster Is Created
After you have created your cluster, you want to connect to your cluster in your local Terminal. You can connect to the cluster by clicking on “Connect”, and copy the line of code that pops up under “Command-line access”.

What Is Docker And Kubernetes?
Normally, I don’t recommend YouTube videos. But the following YouTube video is a very comprehensive and great explanation of Docker and Kubernetes. It even answers why we want to use Docker and Kubernetes, and why we shifted away from VMs to containers.
The basic idea of Kubernetes consists of Ingress’, Services, Deployments and Pods. We can think about the Ingress as a Load Balancer between multiple services, and we can think about a Service as a Load Balancer between the Pods of a specific Deployment. For this article, we will not be using an Ingress, since we have just a single container, but you should look into it, since you will likely need it for larger applications or when doing a microservice architecture.

Making A Sample Project
We are deploying a machine learning model on the Auto MPG dataset, which is a toy dataset. Let’s walk through the steps of training a Random Forest model.
Training A Random Forest Model
To train a model, we don’t have to do much work with the chosen dataset. We start by importing the packages we need, as well as the classes from the core folder.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import joblib
from core.clean_data import CleanData
from core.predict_data import PredictData
from core.load_data import LoadData
Then we instantiate the classes, such that we can call all the functions from the classes later on.
loader = LoadData()
cleaner = CleanData()
predicter = PredictData()
The next step is loading the dataset into a dataframe, which we provided a function for in the LoadData class. When we call the function, we use Pandas to read a data table, and then we specify the column names from the class. Remember to check out the GitHub repository to see the full code.
df = loader.load_dataset_as_df()
Now that we have the dataset loaded, we want to think about what preprocessing steps we need to take, to be able to make a model. I found that there are some question marks in the horsepower feature of the dataset, so we need to get rid of those rows.
The CleanData class has a function for this, where we specify the dataframe to only contain the rows where horsepower does not contain a question mark – in other words, it removes the rows where horsepower has a question mark instead of a number. In a similar fashion as the loader, we called the class cleaner by the following line of code.
df = cleaner.clear_question_marks(df)
The next and last step before training our model is specifying which feature we want to predict. Naturally, for this dataset at least, we want to predict the mpg feature, which is miles per gallon for a car.
What happens here is that we split the dataset into training and testing, where y is the feature we are trying to predict, and X is the features we are using to predict y.
y = df['mpg']
X = cleaner.drop_unused_columns(df)
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42
)
Now we are finally ready to do fit a random forest model to the dataset, since it has been cleaned and prepared for the algorithm.
We start off by instantiating the Random Forest with default parameters, and then we tell scikit-learn to train a random forest model with the training data. After that is done, we can predict on the testing data, and we can also score how well the predictions went.
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
pred = predicter.predict(X_test, rf)
score = predicter.score_r2(y_test)
The absolute last step is “dumping” the model, which means exporting it to a file, that we can load into Python at another point in time. For this, we use joblib.
joblib.dump(rf, "models/rf_model.pkl")
Making Models Accessible Through A Service
Now that we have a model, we need a way to expose it as a service. Flask is the most common way to do this, and it will scale easily. The first step is to create the Flask app by app = Flask(__name__)
, and then a function def
with an annotation @
, a route /
and a methods parameters methods=[]
.
Upon running the script, the following code imports the random forest model we exported earlier. Then once someone asks us at the specified route, we make predictions and return them.
from core.clean_data import CleanData
from core.predict_data import PredictData
from core.load_data import LoadData
from flask import Flask, jsonify, request
app = Flask(__name__)
loader = LoadData()
cleaner = CleanData()
model = loader.load_model_from_path('./models/rf_model.pkl')
predicter = PredictData(model)
@app.route("/", methods=['POST'])
def do_prediction():
json = request.get_json()
df = loader.json_to_df(json)
df = cleaner.clear_question_marks(df)
X = cleaner.drop_car_name(df)
prediction = predicter.predict(X)
return jsonify(prediction[0])
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000)
When we run this code, we have told flask to run the method do_prediction()
, if someone queries the any computers IP on port 5000 (e.g. locally localhost:5000
). This does not expose it to the world yet, though, since we need a cloud provider for that. This is simply the entrypoint to our application, which we will package and ship off to a cloud provider.
The Deployment With Docker And Kubernetes
This section combines the powerful combination of Docker, Kubernetes and Machine Learning to expose your application to the world. The following information was not easy to learn, although it seems easy now.
Making A Dockerfile
The very first thing we always do is: creating a Dockerfile. By making a Dockerfile, we take only the necessary files from our project and package them into an image. We want the image to be as small as possible, so we don’t want to clutter it with tons of unnecessary files.
The syntax is not hard to learn. Let me give you a brief overview of the commands used in this Dockerfile.
Command | What it does |
---|---|
FROM | This specifies the base image, which is usually a Python image for Machine Learning. You can browse base images on Docker Hub. |
WORKDIR | This command changes (and creates) the directory within the image to the specified path. |
RUN | This runs a command in Terminal inside the image. It could be anything, but don’t clutter it up. |
ADD | This specifies the files to add from your directory to a directory in the image (also creates directory if it does not exist). |
EXPOSE | This command opens up a specific port number, like port 5000. |
CMD | Takes the argument for running the application. |
Now, the following is the Dockerfile, which I created for this project. We run the installation of packages directly by specifying the names of the packages, instead of through a requirements.txt file, like one would usually do.
After having installed all the packages and added the necessary files, we tell Docker to run the command gunicorn --bind 0.0.0.0:5000 main:app
, which is the syntax for using Gunicorn. We always want to use Gunicorn on top of Flask, because it is suited as a production service, where as Flask is suited for development purposes. The application will work exactly the same in development and production.
The main is the filename without the extension (main.py), and the app is what we called the Flask app in the file. So if you change the name, you also have to change it here.
FROM python:3.7
WORKDIR /app
RUN pip install pandas scikit-learn flask gunicorn
ADD ./core ./core
ADD ./models ./models
ADD main.py main.py
EXPOSE 5000
CMD [ "gunicorn", "--bind", "0.0.0.0:5000", "main:app" ]
The image does not create itself though. The Dockerfile just lists the specifics of how the image should look like – so we need a way to build the image, before we can use it.
Building & Pushing An Image To Google Cloud
To be able to push the image to Google Cloud, we have to build it, tag it and push it. Luckily for you, I have mostly automated this process for Google Cloud, to the point where you just have to enter your details once and change the version number when you want to update the images.
First, you need to find your project id. You can find your project in the top left and you can find your ID by clicking on the arrow on the right.

All you have to enter is the address, project id, repository and version, and then this script will build and push the image. Use that ID for your project id.
#!/bin/bash
ADDRESS=gcr.io
PROJECT_ID=macro-authority-266522
REPOSITORY=auto
VERSION=0.17
docker build -t ${PROJECT_ID}:${VERSION} .
ID="$(docker images | grep ${REPOSITORY} | head -n 1 | awk '{print $3}')"
docker tag ${ID} $ADDRESS/${PROJECT_ID}/${REPOSITORY}:${VERSION}
docker push $ADDRESS/${PROJECT_ID}/${REPOSITORY}:${VERSION}
You can run the image with git bash or terminal sh build_push_image.sh
. You should run this script instead of manually running these commands each time.
Alternatively, you can experiment with using docker build
, docker tag
and docker push
yourself.
Deploying With Kubernetes
The image is now built and pushed to the cloud. To be able to run it, we need to create a deployment and service file, which is defined in the syntax of YAML.
I’m going to show you how to deploy the product now. The general procedure is the following steps.
- Create/Update the actual containers.
- Update the details in the bash script and run it. If you want to update the container, you just have to update the version in the script.
- Update the details of the image in the deployment YAML file. If you want to update the container, you just have to update the version in the image.
- Run the
kubectl apply -f <filename>.yml
command on a deployment or service.
Creating A Deployment
Below, you can see the deployment.yaml
file that I used – it works by naming your project (rename mpg) and specifying the url for the image. When we apply this deployment, we have specified for Kubernetes to create 3 replicas, which are called Pods – a pod will run one instance of the container.
apiVersion: apps/v1
kind: Deployment
metadata:
name: mpg
labels:
app: mpg
spec:
replicas: 3
selector:
matchLabels:
app: mpg
template:
metadata:
labels:
app: mpg
spec:
containers:
- name: auto
image: gcr.io/macro-authority-266522/auto:0.17
imagePullPolicy: Always
ports:
- containerPort: 5000
The containerPort is set 5000, because that is the port we set in our Flask application. The image is set to gcr.io/<project_id>/<repository>:<version>
, which is the same variable names as in the bash file from earlier.
When you have such a deployment file, you very simply run the below line of code.
kubectl apply -f deployment.yml
It should say deployment.apps/mpg created
when you create it for the first time, and it will also give you another message if you update the image.
Remember we can always go back and reapply this deployment file, if we have made changes to the container. It really is as simple as running the apply command once again, after having changed the version number in your bash script to push to the cloud and in the deployment file.
Though, we are not quite done yet. The application you made is running in the cloud, but it’s not exposed yet. Let me introduce services.
Creating A Service
A service is also defined in a YAML file, just like the deployment. You use the same command to make and update a service. This is why some DevOps engineers are called YAML engineers, because most of the configuration for deployments are done using YAML files just like these one’s presented here.
Below is the service.yaml
file used for this tutorial. You want to specify the type to be a Load Balancer, since that will distribute traffic equally amongst the available pods from your deployments. This enables for immense scaling capacity, since you can just create more nodes in your cluster and create more pods.
We called our deployment mpg, so we specify the app in the selector to be the same app name, such that this service is linked to our deployment. The port is set to 80, because that defaults to just the external-ip address, while the targetPort is set to 5000, since that is what we specified in the deployment file and Flask application.
apiVersion: v1
kind: Service
metadata:
name: mlfromscratch
spec:
type: LoadBalancer
selector:
app: mpg
ports:
- protocol: TCP
port: 80
targetPort: 5000
Quite simply as before, we apply the service in the same way as the deployment.
kubectl apply -f service.yml
And we get back a response service/mlfromscratch created
.
Ok, now we know that everything is created. Let me give you a rundown of how you can access your deployment.
How Do I Access My Application?
First things first, let’s take a look at what we just created, shall we? The very first command to run is (note that “service” can be shortened to “svc”).
kubectl get service
We get back a response. Your EXTERNAL-IP
might still say <pending>
, but it will come through rather quickly with the actual public ip. Mine took 2 minutes to come online and another 2 minutes to give me the response I was expecting, instead of endlessly loading. Note that we don’t care about the service named kubernetes, but instead the name specified in your service, which in this case was mlfromscratch.
NAME | TYPE | CLUSTER-IP | EXTERNAL-IP | PORT(S) | AGE |
---|---|---|---|---|---|
kubernetes | ClusterIP | 10.12.0.1 | <none> |
443/TCP | 20m |
mlfromscratch | LoadBalancer | 10.12.15.238 | 35.225.0.74 | 80:30408/TCP | 5m14s |
All we have to do to access the application is access the external-ip from the service, and that’s it! We just made a Machine Learning model ready to serve predictions at an endpoint. Ideally, we would query this IP-address from another service/product, such that we continually could use these predictions. You could even make your own SaaS by turning this into a microservice and deploying it using this Flask template I integrated with Stripe.
Diagnosing And Checking In On Our Application
Sometimes you want to check what the console prints or just the status of the application to see if it’s running, and if there is any errors.
Just like we used the kubectl get service
command to get information about our services earlier, we can use it to get information on pieces of our Kubernetes or just access it all through this command. Optionally we can add -o wide
at the end for even more information.
kubectl get svc,deployment,pods -o wide
The above command will give us a look at the pods. They should say running under status, else, your application is not working like expected and you need to go back and make sure that it runs locally.
After getting a specific pod name, we could see the logs for that specific pod and see what happened. Tail can also be --tail=1h
to see the last hour of logs instead of just 20 lines.
kubectl logs -f pod/mpg-768578c99c-lt6j2 --tail=20
Similarly to these get command, we have other a whole bunch of commands that you can list by typing kubectl
in your Terminal. Most interestingly you can delete your deployments and services, e.g. kubectl delete -f service.yaml
, if you just want to start over.
Basic Commands (Beginner):
create Create a resource from a file or from stdin.
expose Take a replication controller, service, deployment or pod and expose it as a new Kubernetes Service
run Run a particular image on the cluster
set Set specific features on objects
Basic Commands (Intermediate):
explain Documentation of resources
get Display one or many resources
edit Edit a resource on the server
delete Delete resources by filenames, stdin, resources and names, or by resources and label selector
Deploy Commands:
rollout Manage the rollout of a resource
scale Set a new size for a Deployment, ReplicaSet, Replication Controller, or Job
autoscale Auto-scale a Deployment, ReplicaSet, or ReplicationController
Cluster Management Commands:
certificate Modify certificate resources.
cluster-info Display cluster info
top Display Resource (CPU/Memory/Storage) usage.
cordon Mark node as unschedulable
uncordon Mark node as schedulable
drain Drain node in preparation for maintenance
taint Update the taints on one or more nodes
Troubleshooting and Debugging Commands:
describe Show details of a specific resource or group of resources
logs Print the logs for a container in a pod
attach Attach to a running container
exec Execute a command in a container
port-forward Forward one or more local ports to a pod
proxy Run a proxy to the Kubernetes API server
cp Copy files and directories to and from containers.
auth Inspect authorization
Advanced Commands:
diff Diff live version against would-be applied version
apply Apply a configuration to a resource by filename or stdin
patch Update field(s) of a resource using strategic merge patch
replace Replace a resource by filename or stdin
wait Experimental: Wait for a specific condition on one or many resources.
convert Convert config files between different API versions
kustomize Build a kustomization target from a directory or a remote url.
Settings Commands:
label Update the labels on a resource
annotate Update the annotations on a resource
completion Output shell completion code for the specified shell (bash or zsh)
Other Commands:
api-resources Print the supported API resources on the server
api-versions Print the supported API versions on the server, in the form of "group/version"
config Modify kubeconfig files
plugin Provides utilities for interacting with plugins.
version Print the client and server version information
Making A Request To Our Application
We have the external-ip from earlier, which we are going to reuse here. The following JSON request can now be sent as with a HTTP POST method, and we will receive the expected response.
{
"cylinders": 8,
"displacement": 307.0,
"horsepower": 130.0,
"weight": 3504,
"acceleration": 12.0,
"model_year": 70,
"origin": 1,
"car_name": "chevrolet chevelle malibu"
}
In just 0.29 seconds, we received the prediction of this car, and our machine learning model predicted 16.383
.

Or from a Python script.
import json
import requests
data = {
"cylinders": 8,
"displacement": 307.0,
"horsepower": 130.0,
"weight": 3504,
"acceleration": 12.0,
"model_year": 70,
"origin": 1,
"car_name": "chevrolet chevelle malibu"
}
r = requests.post('http://35.225.0.74', json=data)
Printing r.text
gives us the prediction $16.383$, and printing r.status_code
gives us a HTTP status code of $200$.