Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图)

本文主要是介绍Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这是俄罗斯高等经济学院的系列课程第一门,Introduction to Advanced Machine Learning,第一周编程作业。
这个作业一共六个任务,难易程度:容易。
1. 计算probability
2. 计算loss function
3. 计算stochastic gradient
4. 计算mini-batch gradient
5. 计算momentum gradient
6. 计算RMS prop gradient
从3到6,收敛应该越来越快,越来越稳定。

Programming assignment (Linear models, Optimization)

In this programming assignment you will implement a linear classifier and train it using stochastic gradient descent modifications and numpy.

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import sys
sys.path.append("..")
import grading

Two-dimensional classification

To make things more intuitive, let’s solve a 2D classification problem with synthetic data.

with open('train.npy', 'rb') as fin:X = np.load(fin)with open('target.npy', 'rb') as fin:y = np.load(fin)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired, s=20)
plt.show()

!这里写图片描述

Task

Features

As you can notice the data above isn’t linearly separable. Since that we should add features (or use non-linear model). Note that decision line between two classes have form of circle, since that we can add quadratic features to make the problem linearly separable. The idea under this displayed on image below:

def expand(X):"""Adds quadratic features. This expansion allows your linear model to make non-linear separation.For each sample (row in matrix), compute an expanded row:[feature0, feature1, feature0^2, feature1^2, feature0*feature1, 1]:param X: matrix of features, shape [n_samples,2]:returns: expanded features of shape [n_samples,6]"""X_expanded = np.ones((X.shape[0], 6))X_expanded[:,0] = X[:,0]X_expanded[:,1] = X[:,1]X_expanded[:,2] = X[:,0] * X[:,0]X_expanded[:,3] = X[:,1] * X[:,1]X_expanded[:,4] = X[:,0] * X[:,1]return X_expanded
X_expanded = expand(X)
print(X_expanded)
[[ 1.20798057  0.0844994   1.45921706  0.00714015  0.10207364  1.        ][ 0.76121787  0.72510869  0.57945265  0.52578261  0.5519657   1.        ][ 0.55256189  0.51937292  0.30532464  0.26974823  0.28698568  1.        ]..., [-1.22224754  0.45743421  1.49388906  0.20924606 -0.55909785  1.        ][ 0.43973452 -1.47275142  0.19336645  2.16899674 -0.64761963  1.        ][ 1.4928118   1.15683375  2.22848708  1.33826433  1.72693508  1.        ]]

Here are some tests for your implementation of expand function.

# simple test on random numbersdummy_X = np.array([[0,0],[1,0],[2.61,-1.28],[-0.59,2.1]])# call your expand function
dummy_expanded = expand(dummy_X)# what it should have returned:   x0       x1       x0^2     x1^2     x0*x1    1
dummy_expanded_ans = np.array([[ 0.    ,  0.    ,  0.    ,  0.    ,  0.    ,  1.    ],[ 1.    ,  0.    ,  1.    ,  0.    ,  0.    ,  1.    ],[ 2.61  , -1.28  ,  6.8121,  1.6384, -3.3408,  1.    ],[-0.59  ,  2.1   ,  0.3481,  4.41  , -1.239 ,  1.    ]])#tests
assert isinstance(dummy_expanded,np.ndarray), "please make sure you return numpy array"
assert dummy_expanded.shape == dummy_expanded_ans.shape, "please make sure your shape is correct"
assert np.allclose(dummy_expanded,dummy_expanded_ans,1e-3), "Something's out of order with features"print("Seems legit!")
Seems legit!

Logistic regression

To classify objects we will obtain probability of object belongs to class ‘1’. To predict probability we will use output of linear model and logistic function:

a(x;w)=w,x a ( x ; w ) = ⟨ w , x ⟩

P(y=1x,w)=11+exp(w,x)=σ(w,x) P ( y = 1 | x , w ) = 1 1 + exp ⁡ ( − ⟨ w , x ⟩ ) = σ ( ⟨ w , x ⟩ )

def probability(X, w):"""Given input features and weightsreturn predicted probabilities of y==1 given x, P(y=1|x), see description aboveDon't forget to use expand(X) function (where necessary) in this and subsequent functions.:param X: feature matrix X of shape [n_samples,6] (expanded):param w: weight vector w of shape [6] for each of the expanded features:returns: an array of predicted probabilities in [0,1] interval."""# TODO:<your code here>prob = 1/(1+np.exp(-np.dot(X,w)))
dummy_weights = np.linspace(-1, 1, 6)
ans_part1 = probability(X_expanded[:1, :], dummy_weights)[0]
## GRADED PART, DO NOT CHANGE!
grader.set_answer("xU7U4", ans_part1)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

In logistic regression the optimal parameters w w are found by cross-entropy minimization:

L(w)=1i=1[yilogP(yi|xi,w)+(1yi)log(1P(yi|xi,w))]

def compute_loss(X, y, w):"""Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,and weight vector w [6], compute scalar loss function using formula above."""# TODO:<your code here>prob = probability(X,w)n_sample = X.shape[0]loss = -sum(y * np.log(prob) + (1-y) * np.log(1-prob))/n_sample
# use output of this cell to fill answer field 
ans_part2 = compute_loss(X_expanded, y, dummy_weights)
## GRADED PART, DO NOT CHANGE!
grader.set_answer("HyTF6", ans_part2)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

Since we train our model with gradient descent, we should compute gradients.

To be specific, we need a derivative of loss function over each weight [6 of them].

wL=... ∇ w L = . . .

We won’t be giving you the exact formula this time — instead, try figuring out a derivative with pen and paper.

As usual, we’ve made a small test for you, but if you need more, feel free to check your math against finite differences (estimate how L L changes if you shift w by 105 10 − 5 or so).

def compute_grad(X, y, w):"""Given feature matrix X [n_samples,6], target vector [n_samples] of 1/0,and weight vector w [6], compute vector [6] of derivatives of L over each weights."""# X [n,d] n examples, d features# y [n,] n examples, outputs# w [d,] d features# grad[d]# np.dot(X.T, dz) [d,n][n,] = [d,]# TODO<your code here>a = probability(X,w)dz = a - y #[n,]grad = -1.0 / X.shape[0] * np.dot(X.T, dz)# because the minus here, the following update is positive, instead of negative.
# use output of this cell to fill answer field 
ans_part3 = np.linalg.norm(compute_grad(X_expanded, y, dummy_weights))
## GRADED PART, DO NOT CHANGE!
grader.set_answer("uNidL", ans_part3)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

Here’s an auxiliary function that visualizes the predictions:

from IPython import displayh = 0.01
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))def visualize(X, y, w, history):"""draws classifier prediction with matplotlib magic"""Z = probability(expand(np.c_[xx.ravel(), yy.ravel()]), w)Z = Z.reshape(xx.shape)plt.subplot(1, 2, 1)plt.contourf(xx, yy, Z, alpha=0.8)plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)plt.xlim(xx.min(), xx.max())plt.ylim(yy.min(), yy.max())plt.subplot(1, 2, 2)plt.plot(history)plt.grid()ymin, ymax = plt.ylim()plt.ylim(0, ymax)display.clear_output(wait=True)plt.show()
visualize(X, y, dummy_weights, [0.5, 0.5, 0.25])

Training

In this section we’ll use the functions you wrote to train our classifier using stochastic gradient descent.

You can try change hyperparameters like batch size, learning rate and so on to find the best one, but use our hyperparameters when fill answers.

Mini-batch SGD

Stochastic gradient descent just takes a random example on each iteration, calculates a gradient of the loss on it and makes a step:

wt=wt1η1mj=1mwL(wt,xij,yij) w t = w t − 1 − η 1 m ∑ j = 1 m ∇ w L ( w t , x i j , y i j )

# please use np.random.seed(42), eta=0.1, n_iter=100 and batch_size=4 for deterministic resultsnp.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])eta= 0.1 # learning raten_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>grad = compute_grad(X_expanded[ind, :],y[ind],w)w  = w + eta * grad
visualize(X, y, w, loss)
plt.clf()
# use output of this cell to fill answer field ans_part4 = compute_loss(X_expanded, y, w)

这里写图片描述

## GRADED PART, DO NOT CHANGE!
grader.set_answer("ToK7N", ans_part4)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

SGD with momentum

Momentum is a method that helps accelerate SGD in the relevant direction and dampens oscillations as can be seen in image below. It does this by adding a fraction α α of the update vector of the past time step to the current update vector.


νt=ανt1+η1mj=1mwL(wt,xij,yij) ν t = α ν t − 1 + η 1 m ∑ j = 1 m ∇ w L ( w t , x i j , y i j )

wt=wt1νt w t = w t − 1 − ν t


# please use np.random.seed(42), eta=0.05, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)
w = np.array([0, 0, 0, 0, 0, 1])eta = 0.05 # learning rate
alpha = 0.9 # momentum
nu = np.zeros_like(w)n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12, 5))for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>nu = alpha * nu + eta *  compute_grad(X_expanded[ind, :],y[ind],w)w  = w + nu
visualize(X, y, w, loss)
plt.clf()
# use output of this cell to fill answer field ans_part5 = compute_loss(X_expanded, y, w)

这里写图片描述

## GRADED PART, DO NOT CHANGE!
grader.set_answer("GBdgZ", ans_part5)
# you can make submission with answers so far to check yourself at this stage
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

RMSprop

Implement RMSPROP algorithm, which use squared gradients to adjust learning rate:

Gtj=αGt1j+(1α)g2tj G j t = α G j t − 1 + ( 1 − α ) g t j 2

wtj=wt1jηGtj+εgtj w j t = w j t − 1 − η G j t + ε g t j

# please use np.random.seed(42), eta=0.1, alpha=0.9, n_iter=100 and batch_size=4 for deterministic results
np.random.seed(42)w = np.array([0, 0, 0, 0, 0, 1.])eta = 0.1 # learning rate
alpha = 0.9 # moving average of gradient norm squared
g2 = np.zeros_like(w)
eps = 1e-8n_iter = 100
batch_size = 4
loss = np.zeros(n_iter)
plt.figure(figsize=(12,5))
for i in range(n_iter):ind = np.random.choice(X_expanded.shape[0], batch_size)loss[i] = compute_loss(X_expanded, y, w)if i % 10 == 0:visualize(X_expanded[ind, :], y[ind], w, loss)# TODO:<your code here>grad = compute_grad(X_expanded[ind, :],y[ind],w)g2 = alpha * g2 + (1-alpha) * grad ** 2w  = w + eta/np.sqrt(g2 + eps) * grad
visualize(X, y, w, loss)
plt.clf()

这里写图片描述

# use output of this cell to fill answer field 
ans_part6 = compute_loss(X_expanded, y, w)
## GRADED PART, DO NOT CHANGE!
grader.set_answer("dLdHG", ans_part6)
grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)

这篇关于Introduction to Advanced Machine Learning, 第一周, week01_pa(hse-aml/intro-to-dl,简单注释,答案,附图)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/264446

相关文章

利用Python编写一个简单的聊天机器人

《利用Python编写一个简单的聊天机器人》这篇文章主要为大家详细介绍了如何利用Python编写一个简单的聊天机器人,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 使用 python 编写一个简单的聊天机器人可以从最基础的逻辑开始,然后逐步加入更复杂的功能。这里我们将先实现一个简单的

使用IntelliJ IDEA创建简单的Java Web项目完整步骤

《使用IntelliJIDEA创建简单的JavaWeb项目完整步骤》:本文主要介绍如何使用IntelliJIDEA创建一个简单的JavaWeb项目,实现登录、注册和查看用户列表功能,使用Se... 目录前置准备项目功能实现步骤1. 创建项目2. 配置 Tomcat3. 项目文件结构4. 创建数据库和表5.

使用PyQt5编写一个简单的取色器

《使用PyQt5编写一个简单的取色器》:本文主要介绍PyQt5搭建的一个取色器,一共写了两款应用,一款使用快捷键捕获鼠标附近图像的RGB和16进制颜色编码,一款跟随鼠标刷新图像的RGB和16... 目录取色器1取色器2PyQt5搭建的一个取色器,一共写了两款应用,一款使用快捷键捕获鼠标附近图像的RGB和16

四种简单方法 轻松进入电脑主板 BIOS 或 UEFI 固件设置

《四种简单方法轻松进入电脑主板BIOS或UEFI固件设置》设置BIOS/UEFI是计算机维护和管理中的一项重要任务,它允许用户配置计算机的启动选项、硬件设置和其他关键参数,该怎么进入呢?下面... 随着计算机技术的发展,大多数主流 PC 和笔记本已经从传统 BIOS 转向了 UEFI 固件。很多时候,我们也

基于Qt开发一个简单的OFD阅读器

《基于Qt开发一个简单的OFD阅读器》这篇文章主要为大家详细介绍了如何使用Qt框架开发一个功能强大且性能优异的OFD阅读器,文中的示例代码讲解详细,有需要的小伙伴可以参考一下... 目录摘要引言一、OFD文件格式解析二、文档结构解析三、页面渲染四、用户交互五、性能优化六、示例代码七、未来发展方向八、结论摘要

MyBatis框架实现一个简单的数据查询操作

《MyBatis框架实现一个简单的数据查询操作》本文介绍了MyBatis框架下进行数据查询操作的详细步骤,括创建实体类、编写SQL标签、配置Mapper、开启驼峰命名映射以及执行SQL语句等,感兴趣的... 基于在前面几章我们已经学习了对MyBATis进行环境配置,并利用SqlSessionFactory核

csu 1446 Problem J Modified LCS (扩展欧几里得算法的简单应用)

这是一道扩展欧几里得算法的简单应用题,这题是在湖南多校训练赛中队友ac的一道题,在比赛之后请教了队友,然后自己把它a掉 这也是自己独自做扩展欧几里得算法的题目 题意:把题意转变下就变成了:求d1*x - d2*y = f2 - f1的解,很明显用exgcd来解 下面介绍一下exgcd的一些知识点:求ax + by = c的解 一、首先求ax + by = gcd(a,b)的解 这个

hdu2289(简单二分)

虽说是简单二分,但是我还是wa死了  题意:已知圆台的体积,求高度 首先要知道圆台体积怎么求:设上下底的半径分别为r1,r2,高为h,V = PI*(r1*r1+r1*r2+r2*r2)*h/3 然后以h进行二分 代码如下: #include<iostream>#include<algorithm>#include<cstring>#include<stack>#includ

usaco 1.3 Prime Cryptarithm(简单哈希表暴搜剪枝)

思路: 1. 用一个 hash[ ] 数组存放输入的数字,令 hash[ tmp ]=1 。 2. 一个自定义函数 check( ) ,检查各位是否为输入的数字。 3. 暴搜。第一行数从 100到999,第二行数从 10到99。 4. 剪枝。 代码: /*ID: who jayLANG: C++TASK: crypt1*/#include<stdio.h>bool h

uva 10387 Billiard(简单几何)

题意是一个球从矩形的中点出发,告诉你小球与矩形两条边的碰撞次数与小球回到原点的时间,求小球出发时的角度和小球的速度。 简单的几何问题,小球每与竖边碰撞一次,向右扩展一个相同的矩形;每与横边碰撞一次,向上扩展一个相同的矩形。 可以发现,扩展矩形的路径和在当前矩形中的每一段路径相同,当小球回到出发点时,一条直线的路径刚好经过最后一个扩展矩形的中心点。 最后扩展的路径和横边竖边恰好组成一个直