There was this painful period of time that I still remember when my teammate and I were working on a machine learning (ML) project.


Tediously and studiously, we were manually transferring the results of our countless experiments to a Google Sheet and organizing our saved models in folders. Of course, we did try to automate the process as much as possible, but managing our ML experiments was still a messy affair.

繁琐而艰辛的工作是,我们将无数实验的结果手动传输到Google表格中,并在文件夹中整理保存的模型。 当然,我们确实尝试了使过程尽可能自动化,但是管理ML实验仍然是一件麻烦事。

If the above situation sounds like something you are in, hopefully, this article will be able to help you out and reduce your pain.


Being one of the best open source solutions (see other tools here) for managing ML experiments, MLflow will greatly improve your well being (as a data scientist, machine learning specialist, etc.) and let you and your team remain sane while keeping track of your models 💪.


MLflow如何帮助我? (How can MLflow help me?)

With just a few lines of code integrated into your script, you can auto-log your model parameters and metrics into an organized dashboard as shown below.


MLflow dashboard

Clicking into each of the table rows will show you more details, including the path of the model saved for that run (one run is basically one model training).


MLflow dashboard showing path to saved model

And as mentioned earlier, the important thing is that all these can be automated with just a few additional lines of code in your script.


In our example code snippet below, we have placed comments above all the lines of code relating to MLflow.


X_train, X_test, y_train, y_test = data_processing()#################### 1. Setup Experiment ###########################
# set experiment name to organize runs
mlflow.set_experiment('New Experiment Name') 
experiment = mlflow.get_experiment_by_name('New Experiment Name')# set path to log data, e.g., mlruns local folder
mlflow.set_tracking_uri('./mlruns')# launch new run under the experiment name
with mlflow.start_run(experiment_id = experiment.experiment_id):#################### 2. Normal Model Training ######################hyperparams = {'max_depth': 10, 'max_samples': 0.8, 'max_features': 'sqrt'}clf = RandomForestClassifier(**hyperparams,random_state=0)clf.fit(X_train, y_train)accuracy = clf.score(X_test, y_test)################ 3. Log params, metrics and model ################## log model paramsmlflow.log_params(hyperparams)# log model metricmlflow.log_metric('accuracy', accuracy)# log modelmlflow.sklearn.log_model(clf, "model")

In general, there are three main sections in our example:


1. Setup experiment: Here we set an experiment name (mlflow.set_experiment()) and path (mlflow.set_tracking_uri()) to log our run, before starting our run with mlflow.start_run().

1. 设置实验 :在使用mlflow.start_run()开始运行之前,在此处设置实验名称( mlflow.set_experiment() )和路径( mlflow.set_tracking_uri() )以记录运行。

2. Train model: Nothing special here, just normal model training.

2. 训练模型 :这里没有什么特别的,只是普通的模型训练。

3. Logging: Log parameters (mlflow.log_params()), metrics (mlflow.log_metric()) and model (mlflow.sklearn.log_model()).

3. 记录 :记录参数( mlflow.log_params() ),指标( mlflow.log_metric() )和模型( mlflow.sklearn.log_model() )。

After running the code, you can execute mlflow ui in your terminal and there will be a link to your MLflow dashboard.

运行代码后,您可以在终端中执行mlflow ui ,并且将有指向MLflow仪表板的链接。

Simple and neat right? 😎

简单利落吧? 😎

However, what we have shown you so far are in the local environment. What if we would like to collaborate with other teammates? This is where a remote server can come into play and our next section of the article shows you the steps to do that.

但是,到目前为止,我们向您展示的内容都是在本地环境中进行的。 如果我们想与其他队友合作怎么办? 这是远程服务器可以发挥作用的地方,本文的下一部分将向您展示执行此操作的步骤。

在Google Cloud上部署MLflow的步骤 (Steps to Deploy MLflow on Google Cloud)

We list down first the general steps to take before detailing each of the steps with screenshots (feel free to click on each step to navigate). Having a Google Cloud account is the only prerequisite for following the steps. Do note that Google Cloud has a free trial for new signups, so you can experiment at no cost.

我们先列出要执行的一般步骤,然后再用屏幕截图详细说明每个步骤(可随时单击每个步骤进行导航)。 拥有Google Cloud帐户是执行这些步骤的唯一先决条件。 请注意,Google Cloud为新注册提供免费试用 ,因此您可以免费试用 。

  1. Setup virtual machine to serve MLflow


  2. Create Cloud Storage bucket to store our models

    创建Cloud Storage存储桶以存储我们的模型

  3. Launch MLflow server


  4. Add user authentication through reverse-proxy with Nginx


  5. Modify code to allow access to MLflow server


1.设置虚拟机(VM) (1. Setup Virtual Machine (VM))

Our first step is to set up a Compute Engine VM instance through Google Cloud console.

我们的第一步是通过Google Cloud控制台设置Compute Engine VM实例。

a) Enable the Compute Engine API after logging in to your Google Cloud console

a)登录到Google Cloud控制台后启用Compute Engine API

Enable the Compute Engine API

b) Start Google Cloud Shell

b)启动Google Cloud Shell

You should see a button similar to the one in red box below in the top right corner of your console page. Click on it and a terminal will pop out. We shall be using this terminal to launch our VM.

您应该在控制台页面右上角看到一个类似于下面红色框中的按钮。 单击它,将弹出一个终端。 我们将使用此终端来启动我们的VM。

Click on button in red box to start Google Cloud Shell
单击红色框中的按钮以启动Google Cloud Shell

c) Create a Compute Engine VM instance

c)创建一个Compute Engine VM实例

Key in the following into Google Cloud Shell to create a VM instance named mlflow-server.

在Google Cloud Shell中键入以下内容以创建名为mlflow-server的VM实例。

gcloud compute instances create mlflow-server \
--machine-type n1-standard-1 \
--zone us-central1-a \
--tags mlflow-server \
--metadata startup-script='#! /bin/bash
sudo apt update
sudo apt-get -y install tmux
echo Installing python3-pip
sudo apt install -y python3-pip
export PATH="$HOME/.local/bin:$PATH"
echo Installing mlflow and google_cloud_storage
pip3 install mlflow google-cloud-storage'

A brief description of the parameters in the code above:


  • machine-type specifies the amount of CPU and RAM for our VM. You can choose other types from this list.

    机器类型为我们的VM指定CPU和RAM的数量。 您可以从此列表中选择其他类型。

  • zone refers to the data center zone that your cluster resides in. You can choose somewhere that is not too far away from your users.

    指的是数据中心地带,你的集群所在。您可以选择的地方 ,是不是太远离你的用户。

  • tags allow us to identify the instances when adding network firewall rules later.


  • metadata startup-script provides a bash script that will be executed when our instance boots up, installing various packages required.


d) Create firewall rule


This is to allow access on port 5000 to our MLflow server.


gcloud compute firewall-rules create mlflow-server \
--direction=INGRESS --priority=999 --network=default \
--action=ALLOW --rules=tcp:5000 --source-ranges= \

2.创建云存储桶 (2. Create Cloud Storage Bucket)

Run the code below in the Google Cloud Shell, replacing <BUCKET_NAME> with a unique name of your choice. This bucket will be where we will store our models later.

在Google Cloud Shell中运行以下代码,将<BUCKET_NAME>替换为您选择的唯一名称。 这个存储桶将是我们以后存储模型的地方。

gsutil mb gs://<BUCKET_NAME>

3.启动MLflow服务器 (3. Launch MLflow Server)

We shall now SSH into our mlflow-server instance.


Go to the Compute Engine page and click on the SSH button for your instance. A terminal for your VM instance should pop out.

转到“ 计算引擎”页面 ,然后单击您实例的SSH按钮。 VM实例的终端应弹出。

Image for post

While the terminal gets ready, take note of the internal and external IPs for your mlflow-server instance that is shown on the Compute Engine page. We will need them later.

终端准备就绪后,请注意Compute Engine页面上显示的mlflow-server实例的内部和外部IP。 我们稍后将需要它们。

Before launching our MLflow server, let’s do a quick check to ensure that everything has been installed. As our startup script will take a few minutes to finish execution, the packages may not have all been installed if you SSH in too quickly. To check that MLflow has been installed, key in the terminal:

在启动MLflow服务器之前,让我们快速检查一下是否已安装所有内容。 由于我们的启动脚本将需要几分钟才能完成执行,因此如果您以太快的速度进行SSH,则可能尚未安装所有软件包。 要检查是否已安装MLflow,请输入终端:

mlflow --version

You should see the version of MLflow if it has been installed. If not, no worries, either wait a while more or execute the commands in our bash script under step 1c to manually install the packages.

如果已经安装了MLflow,则应该看到它的版本。 如果没有,请稍候,或者在步骤1c中执行bash脚本中的命令以手动安装软件包。

Check that MLflow has been installed with version showing

If MLflow has been installed, we can now bring up a new window using tmux by executing:



And launch our MLflow server by running code below, replacing <BUCKET_NAME> and <INTERNAL_IP> respectively with the bucket name in step 2 and your internal IP address noted earlier.


mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root gs://<BUCKET_NAME> \
--host <INTERNAL_IP>

If you see something similar to screenshot below, congratulations your MLflow server is up and running 😄. You can now visit <External_IP>:5000 in your browser to view your MLflow dashboard.

如果您看到与以下屏幕截图类似的内容,则表示您的MLflow服务器已启动并正在运行。 现在,您可以在浏览器中访问<External_IP>:5000来查看MLflow仪表板。

MLflow server up and running

4.添加用户身份验证 (4. Add User Authentication)

If you don’t mind letting anyone who has your external IP address to view your MLflow dashboard, then you can skip this step. But I am guessing you are not such an exhibitionist right? Or are you? 😱

如果您不介意让拥有外部IP地址的任何人查看MLflow仪表板,则可以跳过此步骤。 但是我猜你不是这样的暴露狂吧? 还是你 😱

To add user authentication, first let’s stop our MLflow server for now by pressing Ctrl+c. And then say out Terminator’s famous line “I’ll be back” before detaching our window by Ctrl+b d.

要添加用户身份验证,首先让我们现在按Ctrl+c停止MLflow服务器。 然后在按Ctrl+b d分离窗口之前,说出Terminator著名的一行“我会回来”。

a) Install Nginx and Apache Utilities


In our terminal’s main window, execute:


sudo apt-get install nginx apache2-utils

Nginx shall set up our web server while Apache Utilities will give us access to the htpasswd command which we will use next to create password file.

Nginx将设置我们的Web服务器,而Apache Utilities将使我们能够访问htpasswd命令,接下来将使用它创建密码文件。

b) Add password file


Run the folllowing, replacing <USERNAME> with a cool name.


sudo htpasswd -c /etc/nginx/.htpasswd <USERNAME>

Then set your nobody-can-decipher password.


If you need a party, just leave out the -c argument to add additional users:


sudo htpasswd /etc/nginx/.htpasswd <ANOTHER_USER>

c) Enable password and reverse-proxy


We need to configure Nginx to let our password file take effect and set up reverse-proxy to our MLflow server. We do this by modifying the default server block file:

我们需要配置Nginx,以使我们的密码文件生效,并为我们的MLflow服务器设置反向代理。 我们通过修改default服务器阻止文件来做到这一点:

sudo nano /etc/nginx/sites-enabled/default

Modify the file by replacing the content under location to the three bold lines:


server {
location / {proxy_pass http://localhost:5000;
auth_basic "Restricted Content";
auth_basic_user_file /etc/nginx/.htpasswd;


Press Ctrl+x y Enter to save changes and exit the editor.

Ctrl+x y Enter保存更改并退出编辑器。

Restart Nginx for the changes to take effect:


sudo service nginx restart

Create a new session with tmux or re-attach to our earlier tmux session:


tmux attach-session -t 0

Launch our MLflow server again but this time around, our host is set to localhost:


mlflow server \
--backend-store-uri sqlite:///mlflow.db \
--default-artifact-root gs://<BUCKET_NAME> \
--host localhost

d) Enable HTTP traffic


Lastly, we enable HTTP traffic for our instance to allow access to our Nginx web server by following the steps in this link. Essentially, when you click on our mlflow-server instance on the Compute Engine page, you can edit and select Allow HTTP traffic and Allow HTTPS traffic under the Firewall section.

最后,我们按照此链接中的步骤为实例启用HTTP流量,以允许访问我们的Nginx Web服务器。 本质上,当您在Compute Engine页面上单击我们的mlflow-server实例时,您可以在“防火墙”部分下编辑并选择“ 允许HTTP通信”和“ 允许HTTPS通信 ”。

Now if you visit your external IP (leave out :5000, just external IP), you should be prompted for credentials. Key in the username and password that you set earlier and “Open Sesame”, your MLflow dashboard is back before your eyes again.

现在,如果您访问外部IP(请省略:5000 ,仅访问外部IP),则将提示您输入凭据。 键入您先前设置的用户名和密码,然后单击“打开芝麻”,MLflow仪表板又回到了您的视线。

5.修改代码以访问服务器 (5. Modify Code to Access Server)

In order for our scripts to log to the server, we need to modify our code by providing some credentials as environment variables.


a) Create and download the service account json


Follow the steps here to create new service account key.


b) Pip install google-cloud-storage locally


google-cloud-storage package is required to be installed on both the client and server in order to access Google Cloud Storage. We had installed the package on the server through our startup script so you just need to install it locally.

必须在客户端和服务器上都安装google-cloud-storage软件包,才能访问Google Cloud Storage。 我们已经通过启动脚本将软件包安装在服务器上,因此您只需要在本地安装它即可。

c) Set up credentials as environment variables


In your code, add the following in order for your script to access the server, replacing each of them accordingly:


  • <GOOGLE_APPLICATION_CREDENTIALS> : Path of downloaded service account key



import os# Set path to service account json file
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = <GOOGLE_APPLICATION_CREDENTIALS># Set username and password if authentication was added

d) Set external IP as MLflow tracking URI


Earlier in our example code, the mlflow.set_tracking_uri() was set to a local folder path. Now set it to <EXTERNAL_IP>:80, e.g. “”.

在我们的示例代码的mlflow.set_tracking_uri()mlflow.set_tracking_uri()设置为本地文件夹路径。 现在将其设置为<EXTERNAL_IP>:80,例如“ ”。


You can now easily collaborate with your teammate and log your models to the server. 👏 👏 👏

现在,您可以轻松地与队友协作并将模型记录到服务器。 👏

Our full example code can be found here for your testing convenience.


Through the guide above, we hope that you are now able to deploy MLflow both locally as well as on Google Cloud to manage your ML experiments. In addition, after your experimentation, MLflow will remain useful for monitoring your model after you have deployed it into production.

通过以上指南,我们希望您现在能够在本地以及在Google Cloud上部署MLflow来管理您的ML实验。 此外,在进行实验后,将MLflow部署到生产环境后,对监控模型仍然有用。

Thanks for reading and I hope the article was useful :) Please also feel free to comment with any questions or suggestions that you may have.


翻译自: https://towardsdatascience.com/managing-your-machine-learning-experiments-with-mlflow-1cd6ee21996e




