OUR LATEST ARTICLES

How to Install Kubeflow on Various Operating Systems

How to Install Kubeflow on Various Operating Systems

As more and more machine learning models get deployed to production, companies are realizing the complexity that comes with maintaining their machine learning pipeline. Coordinating retraining, deployment, and inference can be a logistical nightmare. Fortunately, Kubeflow is making it easier to ensure models remain predictive and accurate.

What is Kubeflow?

 

Designed to simplify machine learning pipelines, Kubeflow is a collection of tools built to run on top Kubernetes, which is, of course, the container orchestration platform.

 

By allowing users to use Kubernetes objects, they can declaratively define their pipelines using Kubernetes manifests and sit back as Kubernetes handles the processing of data and scaling of pods to complete the intended process. This allows DevOps people, SREs, data scientists, and data engineers to easily and quickly get their pipelines up and running with much less effort and coordination than previously required.

 

While Kubeflow is available on Google Cloud Platform as a fully managed solution and is much easier to set up, you may want to prototype locally without the need to sign up for a cloud service. This tutorial will show you how to deploy Kubeflow to begin prototyping straight to your laptop or local workstation. 

 

In this tutorial we will go over the installation options available for various OS platforms. Once the install is successful, we will show you how to launch a model on your local Kubeflow cluster for training and inference.

 

Option 1: Microk8s

 

Microk8s is a wonderful tool for easily launching Kubernetes. It comes from the fine folks at Canonical, the company behind the popular Linux distribution platform, Ubuntu.

 

Canonical has stripped away much of the complexity that comes with standing up a Kubernetes cluster. Now, with the release of Kubeflow 1.0, they have integration with Microk8s, making it easier than ever to get started.

 

To ensure that this solution works properly, your system should meet the following minimum requirements:

 

  • 4 CPU

  • 50 GB storage

  • 14 GB memory

 

The installation process should work whether you are on a virtual machine or with Ubuntu installed directly on your machine.

 

For Windows and Mac OS users, we recommend you use Multipass to install Microk8s because it's the simplest way to get an Ubuntu VM up and running. For Mac users you can install Microk8s directly.

 

Note: Being that Multipass is a Snap package, you cannot install it via Windows Subsystem for Linux as Snap is not supported.

 

How to Install Microk8s on Windows

 

Canonical recently made it super easy to install Multipass for Windows. Simply download the .exe from here. Once complete, run the installer with default options.

 

These next steps follow the base installation available from the Kubeflow website here.

 

Wait for installation of Microk8s to be ready before moving onto the next step. Use the following command to ensure install was successful:



$ microk8s.status --wait-ready

 

 

Optional: If you do not have a version of the Kubernetes command line utility kubectl installed, you can create an alias to make life easier. 



$ sudo snap alias microk8s.kubectl kubectl

 

 

You can also add your current user to the group to avoid needing root privileges and gain access to the .kube caching directory.

 

$ sudo usermod -a -G microk8s $USER

$ sudo chown -f -R $USER ~/.kube

 

 

Step 2: Enable features to make sure your Kubernetes cluster runs properly.

 

$ microk8s.enable dns storage dashboard

 

 

Optional: Enable GPU functionality if a GPU is available.

 

$ microk8s.enable gpu

 

 

Step 3: Get an access token to allow external connectivity to your VM.

 

$ token=$(microk8s.kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)

 

 

Step 4: Enable Kubeflow

 

$ microk8s.enable kubeflow

 

 

 

 

Note: This command can take up to 30 minutes to complete. Be patient. 

 

Once that command completes, you should see a success message with a username and password, which you can use to access your Kubeflow dashboard.

 

 

Step 5: Use port forwarding to allow external access to your VM.

 

$ microk8s.kubectl port-forward -n kube-system service/kubernetes-dashboard 1234:443 --address 0.0.0.0

 

 

 

Step 6: Navigate to your Kubeflow dashboard and enter the username and password from Step 4.

 

Get your VM IP address.

 

$ multipass list

 

Navigate to our Kubeflow dashboard in your browser by connecting to your Multipass VM IP along with the port you configured in Step 5.

 

 

How to Install Microk8s on MacOS

 

Step 1: Install Homebrew

 

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"



Step 2: Install Microk8s

 

$ brew install ubuntu/microk8s/microk8s
$ microk8s install



Step 3: Enable features to make sure your Kubernetes cluster runs properly.



$ microk8s.enable dns storage dashboard

 

Optional: Enable GPU functionality if a GPU is available.

 

$ microk8s.enable gpu

 

Step 4: Enable Kubeflow

 

$ microk8s.enable kubeflow

 

Note: This command can take up to 30 minutes to complete. Be patient. 

 

Once that command completes, you should see a success message with a username and password, which you can use to access your Kubeflow dashboard.

 

 

 

After the Kubeflow install is complete, navigate to the URL provided by the output.

 

 

Enter your credentials in the appropriate fields and click Login

 

 

 

Option 2: Vagrant and VirtualBox

Overview

Vagrant is a tool for creating development environments that can be easily deployed on other workstations, regardless of the host operating system. VirtualBox is a mature open-source software that gives users the ability to create a launch custom VMs.

 

Step 1: Download and install Vagrant and Virtualbox

To start, you will need to install the Vagrant and VirtualBox software. 

 

Note: If you are not up to installing and/or using either of these programs you can skip to Step 3, where you will use VMWare and a custom script to accomplish the same thing.

 

Step 2: Install MiniKF

To ensure that this solution works properly, your system should meet the following minimum requirements:

 

  • 4 CPU

  • 50 GB storage

  • 12 GB memory

 

Open up a terminal (for Windows folks, I suggest Powershell as administrator as we will be running a PS command soon) and create a new directory.

 

Powershell:



PS > mkdir minikf-demo; cd minikf-demo

 

Bash:

 

$ sudo mkdir minikf-demo && cd minikf-demo

 

 

Note to Windows Users Only: If you receive a message stating that there was an error while executing `VBoxManage` (pictured below) you will need to disable Hyper-V and complete Step #2 again before continuing.

 

 

To disable Hyper-V open CMD prompt as Admin and run:

 

> bcdedit /set hypervisorlaunchtype off

 

Reboot your computer and then turn Hyper-V back on.

 

> bcdedit /set hypervisorlaunchtype on

 

If you receive the following error:

Run the following command instead:

 

> bcdedit /set hypervisorlaunchtype auto



 

Step 4: Navigate to the given URL

You should have been provided a URL to navigate to once your command from Step 3 completes.

 

 

Navigate to the given URL and wait for your MiniKF terminal to start. 

Once you see this screen click ok and Kubeflow will begin provisioning. This process can take up to 30 mins so be patient.

 

Step 5: Login to the Kubeflow Console

 

 Once the process finishes you will see a completion screen with a username and password.

 

Click connect to Kubeflow.

 

 

Enter your credentials in the appropriate fields and click Login

Option 3: Full Install on Ubuntu

 

This solution is meant to be run on Ubuntu systems so you can either provision a VM using a hypervisor like VMWare, VirtualBox, or Google Cloud Platform, which offers a free tier. Alternately, you could run it on your local machine.

 

By the way, if you do provision a VM on GCP, remember that 2 vCPUs = 1 CPU. In order for your system to perform correctly, we suggest provisioning a VM with at least 32 vCPUs. The other thing is that you need to enable nested virtualization for your GCE instance. You can read about that here.

 

Installing using Google Compute Engine

 

Step 1: Create an Ubuntu boot disk.

Go to your GCP Console and open the Cloud Shell and run the following command to create a boot disk. 

 

$ gcloud compute disks create kfdisk --size 50G --image-project ubuntu-os-cloud --image-family ubuntu-minimal-1804-lts --zone us-central1-b

 

Step 2: Create an image with a license key required for nested virtualization.

 

$ gcloud compute images create kfimage \
  --source-disk kfdisk --source-disk-zone us-central1-b \
  --licenses "https://compute.googleapis.com/compute/v1/projects/vm-options/global/licenses/enable-vmx"

 

 

Step 3: Create a GCE instance that uses the custom image we created.

 

$ gcloud compute instances create kf-demo-vm --zone us-central1-b \
              --min-cpu-platform "Intel Haswell" \
              --image kfimage

 

 

Step 4: Verify that nested virtualization is enabled on your image.

 

Once your VM instance finishes initializing, check to make sure that virtualization is enabled. SSH into your instance and run the following command. 



$ grep -cw vmx /proc/cpuinfo

 

 

If you receive a non-zero response then you are good to go.

 

Step 5: Download and run Kubeflow install script.

 

SSH into your VM instance through the GCP console.

 

 

Once your shell is connected, run the following command to download and run the Kubeflow install script.

 

$ gsutil cp gs://manceps-public/lab-scripts/kf-install.sh .
$ chmod +x kf-install.sh
$ ./kf-install

 

This script will take about 30 minutes to configure your environment. Be sure to save the provided IP address and port. You will need them in the following step.

 

Step #6 Create the proper firewall rules

 

To be able to connect to your Kubeflow dashboard, you need to create the proper firewall rules. [INGRESS_PORT] and [SECURE_INGRESS_PORT] are the ports that you copied from the output from the previous step.



$ export INGRESS_PORT=[KF_INGRESS_PORT]
$ export SECURE_INGRESS_PORT=[KF_SECURE_INGRESS_PORT]

 

 

$ gcloud compute firewall-rules create allow-gateway-http --allow tcp:$INGRESS_PORT
$ gcloud compute firewall-rules create allow-gateway-https --allow tcp:$SECURE_INGRESS_PORT

 

Step #7 Get an access token to allow external connectivity to your VM.

 

$ token=$(kubectl -n kube-system get secret | grep default-token | cut -d " " -f1)

 

 

Step #8 Use port forwarding to allow external access to your VM.

 

$ kubectl port-forward -n kube-system service/kubernetes-dashboard 1234:443 --address 0.0.0.0

 

 

Step #9 Navigate to your Kubeflow Console

 

You are now ready to navigate to your Kubeflow console. Paste the external IP address of your instance along with the Port number from the previous step into your browser.

 

Enter your credentials in the appropriate fields and click login.

Conclusion

This tutorial showed you several ways that you can get a local Kubeflow environment up and running on your laptop or in the cloud. Kubeflow is one of the technologies that is leading the way in MLOps and mastering it will be an asset for anyone looking to step into a role as a Data Scientist or Data Engineer. Now that you have a working Kubeflow playground, go forth and build something awesome!

 

09.04.2020

50 AI Secrets: How Every Fortune 50 Company is Using AI Right Now

Get notified when we publish a new story.

Our Most Recent Articles

Tutorial: Building Your First Kubeflow Pipelines Workflow (Part 2)
Data science workflows on Kubernetes with Kubeflow Pipelines (Part 1)
A Tale of Two Companies
The Ideal Phases of Machine Learning Projects