svm和梯度下降_解决svm随机梯度下降和铰链损失

2023-10-20 18:59

本文主要是介绍svm和梯度下降_解决svm随机梯度下降和铰链损失,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

svm和梯度下降

介绍 (Introduction)

This post considers the very basics of the SVM problem in the context of hard margin classification and linearly separable data. There is minimal discussion of soft margins and no discussion of kernel tricks, the dual formulation, or more advanced solving techniques. Some prior knowledge of linear algebra, calculus, and machine learning objectives is necessary.

这篇文章在硬边距分类和线性可分离数据的背景下考虑了SVM问题的基本知识。 关于软边距的讨论很少,没有关于内核技巧,对偶公式化或更高级的求解技术的讨论。 必须具备线性代数,微积分和机器学习目标的一些先验知识。

1.支持向量机 (1. Support Vector Machines)

The first thing to understand about SVMs is what exactly a “support vector” is. To understand this, it is also important to understand what the goal of SVM is, as it is slightly different from Logistic Regression and other non-parametric techniques.

了解SVM的第一件事就是“支持向量”的确切含义。 要了解这一点,了解SVM的目标也很重要,因为它与Logistic回归和其他非参数技术略有不同。

SVM aims to draw a decision boundary through linearly separable classes such that the boundary is as robust as possible. This means that the position of the boundary is determined by those points which lie nearest to it. The decision boundary is a line or hyperplane that has as large a distance as possible from the nearest training instance of either class, as shown in the plot below.

SVM旨在通过线性可分离的类来绘制决策边界,以使边界尽可能鲁棒。 这意味着边界的位置由最接近边界的那些点确定。 决策边界是一条线或超平面,它与任一类的最近训练实例之间的距离应尽可能大,如下图所示。

While there is an infinite number of lines that could divide the red and blue classes, the line drawn here has been calculated by finding the largest margin (distance between the line and the nearest points) that exists. The observations that are used to define this margin can be seen where the dashed lines intersect red or blue points. These are referred to as support vectors because they can be represented as geometric vectors that are “supporting”, or holding in place, the boundary line. These points can be visualized as geometric vectors on the following plot:

尽管有无限多的线可以划分红色和蓝色类,但此处绘制的线是通过找到存在的最大边距(线与最近点之间的距离)来计算的。 可以在虚线与红色或蓝色点相交的地方看到用于定义此边距的观察结果。 这些被称为支持向量,因为它们可以表示为“支持”边界线或将边界线保持在适当位置的几何向量。 这些点可以在下面的图中可视化为几何矢量:

Image for post
The supporting vectors
支持向量

Support vector is just another way of referring to a point that lies nearest to the boundary. Note that geometric vectors are drawn from the origin of the vector space. The use of support vectors in drawing decision boundaries is effective because they maximize the decision margin (i.e. define a margin such that the nearest point in either class is as far away from the dividing hyperplane as possible). This helps the boundary generalize to new data, assuming the training data is representative.

支持向量只是引用最接近边界的点的另一种方式。 注意,几何矢量是从矢量空间的原点绘制的。 支持向量在绘制决策边界中的使用是有效的,因为它们最大化了决策余量(即定义一个余量,以使任一类中的最近点都尽可能远离分割超平面)。 假设训练数据具有代表性,这有助于将边界推广到新数据。

This case of SVM classification would be referred to as “hard margin” because there are no points that fall inside the decision margin. In many cases of SVM classification, it makes more sense to use a “soft margin”, which would give “slack” to some points and allow them to encroach on the dividing region. This is useful because it is less sensitive to anomalies and class instances that overlap one another.

支持向量机分类的这种情况将被称为“硬边际”,因为没有点落在决策边际之内。 在SVM分类的许多情况下,使用“软边距”会更有意义,这会给某些点“松弛”并允许它们侵占划分区域。 这很有用,因为它对异常和彼此重叠的类实例不太敏感。

Now that we understand what support vectors are, let’s use them. The SVM algorithm is phrased as a constrained optimization problem; specifically one that uses inequality constraints. The problem takes the following form:

现在我们了解了什么是支持向量,让我们使用它们。 SVM算法被表述为约束优化问题。 特别是使用不平等约束的 该问题采用以下形式:

Image for post

Here, β is a vector of coefficients and y ϵ {-1,1}. The dependent variable takes the form -1 or 1 instead of the usual 0 or 1 here so that we may formulate the “hinge” loss function used in solving the problem:

此处, β是系数和y ϵ {-1,1}的向量。 因变量采用-1或1的形式,而不是通常的0或1的形式,以便我们可以公式化用于解决问题的“铰链”损失函数:

Image for post

Here, the constraint has been moved into the objective function and is being regularized by the parameter C. Generally, a lower value of C will give a softer margin. Minimizing this loss function will yield a set of coefficients β =[β0, β1, … , βk] that look similar to a linear regression equation. Predictions are made according to the criteria:

此处,约束已移至目标函数,并通过参数C进行正则化。 通常,较低的C值将提供较软的余量。 最小化此损失函数将产生一组系数β= [β0,β1,…,βk] ,看起来与线性回归方程相似。 根据以下条件进行预测:

Image for post

Put simply: if the predicted value is less than 0, the observation lies on one side of the decision boundary; if it is greater than or equal to 0, the observation lies on the other side of the boundary.

简而言之:如果预测值小于0,则观察值位于决策边界的一侧; 如果它大于或等于0,则观察值位于边界的另一侧。

Now that we have defined the hinge loss function and the SVM optimization problem, let’s discuss one way of solving it.

现在我们已经定义了铰链损失函数和SVM优化问题,让我们讨论一种解决方法。

2.随机梯度下降 (2. Stochastic Gradient Descent)

There are several ways of solving optimization problems. One commonly used method in machine learning, mainly for its fast implementation, is called Gradient Descent. Gradient Descent is an iterative learning process where an objective function is minimized according to the direction of steepest ascent so that the best coefficients for modeling may be converged upon. While there are different versions of Gradient Descent that vary depending on how many training instances are used at each iteration, we will discuss Stochastic Gradient Descent (SGD).

有几种解决优化问题的方法。 机器学习中一种主要用于快速实现的常用方法称为梯度下降。 梯度下降是一种迭代学习过程,其中,根据最陡峭上升的方向将目标函数最小化,以便可以将用于建模的最佳系数收敛。 虽然有不同版本的Gradient Descent(梯度下降),具体取决于每次迭代使用多少个训练实例,但我们将讨论随机梯度下降(SGD)。

A gradient is a vector that contains every partial derivative of a function. Partial derivatives define how the slope of a function changes with respect to an infinitesimal change in one variable. A vector of all partial derivatives will define how the slope of a function changes with respect to changes in each of the variables. All of these derivatives taken together indicate which direction, with respect to the origin of the vector space, the function has the steepest slope in.

梯度是包含函数的每个偏导数的向量。 偏导数定义函数的斜率相对于一个变量的无穷小变化如何变化。 所有偏导数的向量将定义函数的斜率相对于每个变量的变化如何变化。 所有这些导数加起来表示相对于向量空间的原点,函数的斜率最大。

SGD works by initializing a set of coefficients with random values, calculating the gradient of the loss function at a given point in the dataset, and updating those coefficients by taking a “step” of a defined size (given by the parameter η) in the opposite direction of the gradient. The algorithm iteratively updates the coefficients such that they are moving opposite the direction of steepest ascent (away from the maximum of the loss function) and toward the minimum, approximating a solution for the optimization problem. The algorithm follows this general form:

SGD的工作原理是初始化一组具有随机值的系数,计算数据集中给定点处的损失函数的梯度,并通过在参数集中采用定义大小(由参数η赋予)的“步长”来更新这些系数。渐变的相反方向。 该算法迭代地更新系数,以使它们与最陡的上升方向(远离损失函数的最大值)相反并朝最小值的方向移动,从而逼近了优化问题的解决方案。 该算法遵循以下一般形式:

Image for post

Here, η is the specified learning rate, n_epochs is the number of times the algorithm looks over the full dataset, f(β, yi, xi) is the loss function, and gradient is the collection of partial derivatives for every βi in the loss function evaluated at random instances of X and y.

在这里, η是指定的学习率, n_epochs是算法查看整个数据集的次数, f(β,yi,xi)是损失函数,并且梯度是损失中每个βi的偏导数的集合在Xy的随机实例处评估的函数。

SGD operates by using one randomly selected observation from the dataset at a time (different from regular Gradient Descent, which uses the entire dataset together). This is often preferred because it reduces the chances of the algorithm getting stuck in local minima instead of finding the global minimum. The stochastic nature of the points at which the function is evaluated allows the algorithm to “jump” out of any local minima that may exist. The figure below plots the trace of β as the algorithm iterates, eventually centering around the true value of the coefficients (denoted as a black point).

SGD通过一次从数据集中使用一个随机选择的观测值来进行操作(不同于常规的“梯度下降”,后者将整个数据集一起使用)。 这通常是优选的,因为它减少了算法陷入局部最小值而不是找到全局最小值的机会。 函数评估点的随机性质允许算法从可能存在的任何局部最小值中“跳出”。 下图绘制了算法迭代时β的轨迹,最终以系数的真实值(表示为黑点)为中心。

Image for post
Constant learning rate vs. time decaying learning rate
恒定学习率与时间衰减学习率

With a constant learning rate (the left-hand plot), each step the algorithm takes toward the minimum is of the same length. For this reason, the algorithm gets close to but never converges on the true value. With a time decaying learning rate (the right-hand plot), the steps get shorter as the algorithm continues to iterate. As the steps get smaller and smaller, the algorithm converges closer and closer to the true value. Centering and scaling data beforehand is another important step for convergence.

在学习率恒定的情况下(左侧图),算法朝着最小值迈出的每一步都具有相同的长度。 因此,该算法接近但从未收敛于真实值。 在学习速率随时间递减的情况下(右侧图),随着算法继续迭代,步骤变得更短。 随着步长越来越小,算法收敛到越来越接近真实值。 预先居中和缩放数据是收敛的另一个重要步骤。

With the SVM objective function in place and the process of SGD defined, we may now put the two together to perform classification.

有了SVM目标函数并定义了SGD的过程之后,我们现在可以将两者放在一起进行分类。

3.使用SGD优化SVM (3. Optimizing the SVM with SGD)

To use Stochastic Gradient Descent on Support Vector Machines, we must find the gradient of the hinge loss function. This will be easier if we rewrite the equation like this:

要在支持向量机上使用随机梯度下降,我们必须找到铰链损失函数的梯度。 如果我们这样重写方程式,这将变得更加容易:

Image for post

Taking the partial derivative of the bottom equation with respect to β yields the gradient:

取底方程相对于β的偏导数可得出梯度:

Image for post

Now, because we are only calculating one instance at a time with Stochastic Gradient Descent, the gradient used in the algorithm is:

现在,由于我们每次仅使用随机梯度下降计算一次实例,因此算法中使用的梯度为:

Image for post

Here, β =[β0, β1, … , βk]. With this gradient, the SGD algorithm described in section 2 must be modified slightly:

在此, β= [β0,β1,…,βk]。 使用此梯度,必须对第2节中描述的SGD算法进行一些修改:

Image for post

Here, C is the regularization parameter, η is the learning rate, and β is initialized as a vector of random values for coefficients. The figure below plots the results of this algorithm run on the data presented in section 1. The left-hand plot shows the trace of the two β coefficients as SGD iterates, and the right-hand plot shows the associated hinge loss of the model at each iteration.

在此, C是正则化参数, η是学习率, β初始化为系数的随机值向量。 下图根据第1节中提供的数据绘制了该算法的结果。左图显示了SGD迭代时两个β系数的轨迹,右图显示了模型在以下位置的相关铰链损耗每次迭代。

Image for post
Trace of coefficients and associated hinge loss
系数和相关铰链损耗的痕迹

Note how on the first several iterations, the trace is far away from its convergent point and the loss is extremely high. As the algorithm continues, the trace converges and the loss falls to near 0. This SGD process has C set to 100, n_epochs set to 1000, and a learning schedule defined below, where t is the iteration number:

请注意,在前几次迭代中,走线是如何远离其收敛点的,而损耗却非常高。 随着算法的继续,跟踪收敛,并且损失降到接近0。此SGD过程将C设置为100, n_epochs设置为1000,并且在下面定义了学习计划,其中t是迭代次数:

Image for post

The estimated decision boundary is displayed below.

估计的决策边界显示在下面。

Image for post
Estimated decision boundary
估计决策边界

With this dataset, the SGD process appears to have done an excellent job of estimating the SVM decision boundary. We can confirm this solution by checking if the first and second order conditions of the optimization problem are satisfied. First, we define the objective function of the 2-variable hinge loss used:

使用此数据集,SGD过程似乎在估计SVM决策边界方面做得非常出色。 我们可以通过检查是否满足优化问题的一阶和二阶条件来确认该解决方案。 首先,我们定义所使用的2变量铰链损耗的目标函数:

Image for post

The first order conditions of this function are:

此函数的一阶条件为:

Image for post

And the second order conditions are:

第二阶条件是:

Image for post

In evaluating the second order conditions at the coefficients determined by SGD (β1=1.252 and β2 =-0.464), we find that our coefficients do minimize the objective function:

在以SGD确定的系数( β1= 1.252和β2 = -0.464)评估二阶条件时,我们发现我们的系数确实使目标函数最小化:

Image for post

Evaluating the first order conditions at our coefficients shows that we actually have not found a perfect solution to the optimization problem:

用系数评估一阶条件表明,实际上我们还没有找到优化问题的理想解决方案:

Image for post

The failure of our coefficients to satisfy the first order conditions is due to the nature of SGD. The algorithm does not always converge on the true solution, especially if the parameters are not tuned perfectly. Almost always, SGD only approximates the true values of the solution.

我们的系数不能满足一阶条件是由于SGD的性质。 该算法并不总是收敛于真实的解,尤其是在参数没有得到完美调整的情况下。 SGD几乎总是只近似解决方案的真实值。

Looking at the first order conditions and the plot of the decision boundary, we can conclude that the process used here still does a fairly good job — all things considered. In the context of machine learning, our solution has high accuracy and the decision boundary is robust. Despite an imperfect solution in the mathematical sense, SGD has derived a solid model.

查看一阶条件和决策边界图,我们可以得出结论,此处考虑的所有过程仍然可以很好地完成工作。 在机器学习的背景下,我们的解决方案具有很高的准确性,并且决策边界很稳健。 尽管在数学意义上解决方案不完善,但SGD仍推导出了实体模型。

结论 (Conclusion)

Many people have written very impressive blog posts about Support Vector Machines, Stochastic Gradient Descent, and different types of loss functions. As an individual studying data science myself, I find it extremely helpful that such a wealth of information is offered by those eager to share their knowledge. Whenever discouraged by one post that seems too complicated, there are always others to turn to that remedy my confusion.

许多人撰写了非常令人印象深刻的博客文章,内容涉及支持向量机,随机梯度下降以及不同类型的损失函数。 作为一个自己学习数据科学的人,我发现非常有帮助的是那些渴望共享其知识的人可以提供如此丰富的信息。 每当因一篇似乎过于复杂的文章而灰心时,总会有其他人转向该补救措施,以免引起我的困惑。

That being said, I recognize that I am far from the first on Towards Data Science to write about SVM problems and likely will not be the last. However, I’m throwing my hat in the ring in hopes that my perspective may be just what another aspiring data scientist is looking for.

话虽这么说,我认识到我距离撰写数据支持向量机问题的“迈向数据科学”还很远,而且可能不会是最后一个。 但是,我满怀希望,希望我的观点可能正是另一位有抱负的数据科学家正在寻找的观点。

While this post does not touch on very advanced topics in SVM, exploration of such topics can be found in countless other blog posts, textbooks, and scholarly articles. I hope that my contribution to the expanse of resources on Support Vector Machines has been helpful. If not, I have linked more resources that I have found to be useful below.

尽管本文不涉及SVM中非常高级的主题,但可以在无数其他博客文章,教科书和学术文章中找到对此类主题的探索。 我希望我对支持向量机资源的扩展能有所帮助。 如果没有,我已经链接了更多我认为在下面有用的资源。

也可以看看 (See Also)

http://www.mlfactor.com/svm.html

http://www.mlfactor.com/svm.html

https://towardsdatascience.com/svm-implementation-from-scratch-python-2db2fc52e5c2

https://towardsdatascience.com/svm-implementation-from-scratch-python-2db2fc52e5c2

https://svivek.com/teaching/lectures/slides/svm/svm-sgd.pdf

https://svivek.com/teaching/lectures/slides/svm/svm-sgd.pdf

https://towardsdatascience.com/support-vector-machine-python-example-d67d9b63f1c8

https://towardsdatascience.com/support-vector-machine-python-example-d67d9b63f1c8

https://web.mit.edu/6.034/wwwbob/svm.pdf

https://web.mit.edu/6.034/wwwbob/svm.pdf

翻译自: https://towardsdatascience.com/solving-svm-stochastic-gradient-descent-and-hinge-loss-8e8b4dd91f5b

svm和梯度下降


http://www.taodudu.cc/news/show-8013862.html

相关文章:

  • 2015年职位预测竞赛_2015年的10个网络预测
  • Android源码解析之日志系统驱动程序Logger
  • 安卓日志架构(一)
  • android logger,Android Logger
  • CodeForces 140C New Year Snowmen(堆)
  • New Year Snowmen CodeForces - 140C
  • 计算机房在一楼英语怎么写,四年级下册英语一到六单元知识点
  • 操作系统计算机英语作文,操作系统的英文介绍
  • 英语作文写写学校计算机房,我的教室英文写作五篇
  • 我们二楼有一间计算机房的英语,人教版小学英语四年级下册:Unit1 My school 我的学校...
  • 我们的计算机房在一楼用英语怎么写,介绍我们教学大楼的英语作文
  • 教学楼紧挨着计算机房用英语怎么说,教学楼的英文作文
  • 计算机房在四楼的英语怎么说,电信综合楼,telecom integrated building,音标,读音,翻译,英文例句,英语词典...
  • 创建全国主要城市经纬度map的java代码
  • 集合排序总结
  • Java的Set集合
  • java字符串 集合_Java-字符串、集合
  • java 字符串集合_字符串和集合
  • 集合框架(集合概述)
  • java 8 集合_java8 集合处理
  • Java常用集合(Collection、List、Set、Map)
  • 集合set简单介绍
  • JAVA 基础(第二十天) 泛型 Collection List Set
  • Jmeter中线程组和setUP线程组、tearDown线程组的区别
  • 【组会整理及心得】BiFormer、SICNet、IceNet
  • 开组会及ppt讲解(深度学习方向)
  • Python构造学生数据表
  • mysql补考的学生_大学期末考试,老师和学生谁更怕补考?
  • [附源码]java毕业设计校园网学生成绩查询系统
  • 毕业生自传(冲锋陷阵)
  • 这篇关于svm和梯度下降_解决svm随机梯度下降和铰链损失的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



    http://www.chinasem.cn/article/249096

    相关文章

    使用TomCat,service输出台出现乱码的解决

    《使用TomCat,service输出台出现乱码的解决》本文介绍了解决Tomcat服务输出台中文乱码问题的两种方法,第一种方法是修改`logging.properties`文件中的`prefix`和`... 目录使用TomCat,service输出台出现乱码问题1解决方案问题2解决方案总结使用TomCat,

    解决Spring运行时报错:Consider defining a bean of type ‘xxx.xxx.xxx.Xxx‘ in your configuration

    《解决Spring运行时报错:Considerdefiningabeanoftype‘xxx.xxx.xxx.Xxx‘inyourconfiguration》该文章主要讲述了在使用S... 目录问题分析解决方案总结问题Description:Parameter 0 of constructor in x

    解决IDEA使用springBoot创建项目,lombok标注实体类后编译无报错,但是运行时报错问题

    《解决IDEA使用springBoot创建项目,lombok标注实体类后编译无报错,但是运行时报错问题》文章详细描述了在使用lombok的@Data注解标注实体类时遇到编译无误但运行时报错的问题,分析... 目录问题分析问题解决方案步骤一步骤二步骤三总结问题使用lombok注解@Data标注实体类,编译时

    Java循环创建对象内存溢出的解决方法

    《Java循环创建对象内存溢出的解决方法》在Java中,如果在循环中不当地创建大量对象而不及时释放内存,很容易导致内存溢出(OutOfMemoryError),所以本文给大家介绍了Java循环创建对象... 目录问题1. 解决方案2. 示例代码2.1 原始版本(可能导致内存溢出)2.2 修改后的版本问题在

    大数据小内存排序问题如何巧妙解决

    《大数据小内存排序问题如何巧妙解决》文章介绍了大数据小内存排序的三种方法:数据库排序、分治法和位图法,数据库排序简单但速度慢,对设备要求高;分治法高效但实现复杂;位图法可读性差,但存储空间受限... 目录三种方法:方法概要数据库排序(http://www.chinasem.cn对数据库设备要求较高)分治法(常

    Vue项目中Element UI组件未注册的问题原因及解决方法

    《Vue项目中ElementUI组件未注册的问题原因及解决方法》在Vue项目中使用ElementUI组件库时,开发者可能会遇到一些常见问题,例如组件未正确注册导致的警告或错误,本文将详细探讨这些问题... 目录引言一、问题背景1.1 错误信息分析1.2 问题原因二、解决方法2.1 全局引入 Element

    linux报错INFO:task xxxxxx:634 blocked for more than 120 seconds.三种解决方式

    《linux报错INFO:taskxxxxxx:634blockedformorethan120seconds.三种解决方式》文章描述了一个Linux最小系统运行时出现的“hung_ta... 目录1.问题描述2.解决办法2.1 缩小文件系统缓存大小2.2 修改系统IO调度策略2.3 取消120秒时间限制3

    C#使用HttpClient进行Post请求出现超时问题的解决及优化

    《C#使用HttpClient进行Post请求出现超时问题的解决及优化》最近我的控制台程序发现有时候总是出现请求超时等问题,通常好几分钟最多只有3-4个请求,在使用apipost发现并发10个5分钟也... 目录优化结论单例HttpClient连接池耗尽和并发并发异步最终优化后优化结论我直接上优化结论吧,

    解决systemctl reload nginx重启Nginx服务报错:Job for nginx.service invalid问题

    《解决systemctlreloadnginx重启Nginx服务报错:Jobfornginx.serviceinvalid问题》文章描述了通过`systemctlstatusnginx.se... 目录systemctl reload nginx重启Nginx服务报错:Job for nginx.javas

    Mysql DATETIME 毫秒坑的解决

    《MysqlDATETIME毫秒坑的解决》本文主要介绍了MysqlDATETIME毫秒坑的解决,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着... 今天写代码突发一个诡异的 bug,代码逻辑大概如下。1. 新增退款单记录boolean save = s