Why is L1 regularization supposed to lead to sparsity than L2?

本文主要是介绍Why is L1 regularization supposed to lead to sparsity than L2?，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Just because of its geometric shape:

Here is some intentionally non-rigorous intuition: suppose you have a linear system Ax=b for which you know there exists a sparse solution x∗ , and that you want to get at x∗ by solving the optimization problem min∥x∥1 subject to Ax=b . All solutions to the linear system can be expressed as x∗+v where v is any vector in the null space of A . You can think of the optimization problem as expanding the ℓ1 ball until it intersects the subspace {x∗+v} and then returning any point in the intersection (this is guaranteed to have minimal cost w.r.t. the ∥x∥1 objective function). But recall that the ℓ1 ball looks roughly like a diamond! Intuitively, this means that as you expand the ℓ1 ball, the only way to get a non-sparse solution when the ball intersects the solution subspace is for the “side” of the diamond to touch the subspace. But this only happens if the subspace is parallel to the ball, which happens with very low probability, so the solution will almost certainly be sparse (of course, if the ball is an actual ball from the ℓ2 objective, then this argument doesn’t apply).

Consider an optimization problem:
min J(x) subject to Ax=b.
(Ax=b is a system of equations. by this system we include the data samples in our formulation)

if you use L0 norm instead of J(x), you'll get the sparse solution.
but L0 norm is not convex. that means it leads to hard optimization problem.

one may want to use Lp norm (0<p<1). Lp norm is also generates sparse solution,though it is not convex. To understand why L1 norm leads to sparse solution, it may be helpful to see the following graph that shows the behavior of the kernel of the norms:

as p tends to zero, |x|^p approaches the indicator function, which is zero for x=0 and 1 elsewhere.

L1 is the tightest surrogate of L0 that leads to sparse solution and it is convex. so, one can use convex opt tools to solve the optimization problem.

这篇关于Why is L1 regularization supposed to lead to sparsity than L2?的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！