l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm

2024-06-14 12:32
文章标签 norm l2 l0 l1 infinity

本文主要是介绍l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

I’m working on things related to norm a lot lately and it is time to talk about it. In this post we are going to discuss about a whole family of norm.

What is a norm?

Mathematically a norm is a total size or length of all vectors in a vector space  or matrices. For simplicity, we can say that the higher the norm is, the bigger the (value in) matrix or vector is. Norm may come in many forms and many names, including these popular name: Euclidean distance, Mean-squared Error, etc.

Most of the time you will see the norm appears in a equation like this:

\left \| x \right \| where x can be a vector or a matrix.

For example, a Euclidean norm of a vector a = \begin{bmatrix}  3  \\  -2  \\  1  \end{bmatrix} is \left \| a \right \|_2=\sqrt{3^2+(-2)^2+1^2}=3.742 which is the size of vector a

The above example shows how to compute a Euclidean norm, or formally called an l_2-norm. There are many other types of norm that beyond our explanation here, actually for every single real number, there is a norm correspond to it (Notice the emphasised word real number, that means it not limited to only integer.)

Formally the l_p-norm of x is defined as:

\left \| x \right \|_p = \sqrt[p]{\sum_{i}\left | x_i \right |^p}  where p \epsilon \mathbb{R}

That’s it! A p-th-root of a summation of all elements to the p-th power is what we call a norm.

The interesting point is even though every l_p-norm is all look  very similar to each other, their mathematical properties are very different and thus their application are dramatically different too. Hereby we are going to look into some of these norms in details.

l0-norm 

The first norm we are going to discuss is a l_0-norm. By definition, l_0-norm of x is

\left \| x \right \|_0 = \sqrt[0]{\sum_{i}x_i^0}

Strictly speaking, l_0-norm is not actually a norm. It is a cardinality function which has its definition in the form of l_p-norm, though many people call it a norm. It is a bit tricky to work with because there is a presence of zeroth-power and zeroth-root in it. Obviously any x>0 will become one, but the problems of the definition of zeroth-power and especially zeroth-root is messing things around here. So in reality, most mathematicians and engineers use this definition of l_0-norm instead:

\left \| x \right \|_0 = \#(i | x_i \neq 0)

that is a total number of non-zero elements in a vector.

Because it is a number of non-zero element, there is so many applications that use l_0-norm. Lately it is even more in focus because of the rise of the Compressive Sensing scheme, which is try to find the sparsest solution of the under-determined linear system. The sparsest solution means the solution which has fewest non-zero entries, i.e. the lowest l_0-norm. This problem is usually regarding as a optimisation problem of l_0-norm or l_0-optimisation.

l0-optimisation

Many application, including Compressive Sensing, try to minimise the l_0-norm of a vector corresponding to some constraints, hence called “l_0-minimisation”. A standard minimisation problem is formulated as:

min \left \| x \right \|_0 subject to Ax = b

However, doing so is not an easy task. Because the lack of l_0-norm’s mathematical representation, l_0-minimisation is regarded by computer scientist as an NP-hard problem, simply says that it’s too complex and almost impossible to solve.

In many case, l_0-minimisation problem is relaxed to be higher-order norm problem such as l_1-minimisation and l_2-minimisation.

l1-norm

Following the definition of norm, l_1-norm of x is defined as

\left \| x \right \|_1 = \sum_{i} \left | x_i \right |

This norm is quite common among the norm family. It has many name and many forms among various fields, namely Manhattan norm is it’s nickname. If the l_1-norm is computed for a difference between two vectors or matrices, that is

SAD(x_1,x_2) = \left \| x_1-x_2 \right \|_1 = \sum \left | x_{1_i}-x_{2_i} \right |

it is called Sum of Absolute Difference (SAD) among computer vision scientists.

In more general case of signal difference measurement, it may be scaled to a unit vector by:

MAE(x_1,x_2) = \frac{1}{n} \left \| x_1-x_2 \right \|_1 = \frac {1} {n} \sum \left | x_{1_i} - x_{2_i} \right | where n is a size of x.

which is known as Mean-Absolute Error (MAE).

l2-norm

The most popular of all norm is the l_2-norm. It is used in almost every field of engineering and science as a whole. Following the basic definition, l_2-norm is defined as

\left \| x \right \|_2 = \sqrt{\sum_{i}x_i^2}

l_2-norm is well known as a Euclidean norm, which is used as a standard quantity for measuring a vector difference. As in l_1-norm, if the Euclidean norm is computed for a vector difference, it is known as a Euclidean distance:

\left \| x_1-x_2 \right \|_2 = \sqrt{\sum_i (x_{1_i}-x_{2_i})^2}

or in its squared form, known as a Sum of Squared Difference (SSD) among Computer Vision scientists:

SSD(x_1,x_2) = \left \| x_1-x_2 \right \|_2^2 = \sum_i (x_{1_i}-x_{2_i})^2

It’s most well known application in the signal processing field is the Mean-Squared Error (MSE) measurement, which is used to compute a similarity, a quality, or a  correlation between two signals. MSE is

MSE(x_1,x_2) = \frac{1}{n} \left \| x_1-x_2 \right \|_2^2 = \frac{1}{n} \sum_i (x_{1_i}-x_{2_i})^2

As previously discussed in l_0-optimisation section, because of many issues from both a computational view and a mathematical view, many l_0-optimisation problems relax themselves to become l_1– and l_2-optimisation instead. Because of this, we will now discuss about the optimisation of l_2.

l2-optimisation

As in l_0-optimisation case, the problem of minimising l_2-norm is formulated by

min \left \| x \right \|_2 subject to Ax = b

Assume that the constraint matrix A has full rank, this problem is now a underdertermined system which has infinite solutions. The goal in this case is to draw out the best solution, i.e. has lowest l_2-norm, from these infinitely many solutions. This could be a very tedious work if it was to be computed directly. Luckily it is a mathematical trick that can help us a lot in this work.

By using a trick of Lagrange multipliers, we can then define a Lagrangian

\mathfrak{L}(\boldsymbol{x}) = \left \| \boldsymbol{x} \right \|_2^2+\lambda^{T}(\boldsymbol{Ax}-\boldsymbol{b})

where \lambda is the introduced Lagrange multipliers. Take derivative of this equation equal to zero to find a optimal solution and get

\hat{\boldsymbol{x}}_{opt} = -\frac{1}{2} \boldsymbol{A}^{T} \lambda

plug this solution into the constraint to get

\boldsymbol{A}\hat{\boldsymbol{x}}_{opt} = -\frac{1}{2}\boldsymbol{AA}^{T}\lambda=\boldsymbol{b}

\lambda=-2(\boldsymbol{AA}^{T})^{-1}\boldsymbol{b}

and finally

\hat{\boldsymbol{x}}_{opt}=\boldsymbol{A}^{T} (\boldsymbol{AA}^{T})^{-1} \boldsymbol{b}=\boldsymbol{A}^{+} \boldsymbol{b}

By using this equation, we can now instantly compute an optimal solution of the l_2-optimisation problem. This equation is well known as the Moore-Penrose Pseudoinverse and the problem itself is usually known as Least Square problem, Least Square regression, or Least Square optimisation.

However, even though the solution of Least Square method is easy to compute, it’s not necessary be the best solution. Because of the smooth nature of l_2-norm itself,  it is hard to find a single, best solution for the problem.

In contrary, the l_1-optimisation can provide much better result than this solution.

l1-optimisation

As usual, the l_1-minimisation problem is formulated as

min \left \| x \right \|_1 subject to Ax = b

Because the nature of l_1-norm is not smooth as in the l_2-norm case, the solution of this problem is much better and more unique than the l_2-optimisation.

However, even though the problem of l_1-minimisation has almost the same form as the l_2-minimisation, it’s much harder to solve. Because this problem doesn’t have a smooth function, the trick we used to solve l_2-problem is no longer valid.  The only way left to find its solution is to search for it directly. Searching for the solution means that we have to compute every single possible solution to find the best one from the pool of “infinitely many” possible solutions.

Since there is no easy way to find the solution for this problem mathematically, the usefulness of l_1-optimisation is very limited for decades. Until recently, the advancement of computer with high computational power allows us to “sweep” through all the solutions. By using many helpful algorithms, namely the Convex Optimisation algorithm such as linear programming, or non-linear programming, etc. it’s now possible to find the best solution to this  question. Many applications that rely on l_1-optimisation, including the Compressive Sensing, are now possible.

There are many toolboxes  for l_1-optimisation available nowadays.  These toolboxes usually use different approaches and/or algorithms to solve the same question. The example of these toolboxes are l1-magic, SparseLab, ISAL1,

Now that we have discussed many members of norm family, starting from l_0-norm, l_1-norm, and l_2-norm. It’s time to move on to the next one. As we discussed in the very beginning that there can be any l-whatever norm following the same basic definition of norm, it’s going to take a lot of time to talk about all of them. Fortunately, apart from l_0-, l_1– , and l_2-norm, the rest of them usually uncommon and therefore don’t have so many interesting things to look at. So we’re going to look at the extreme case of norm which is a l_{\infty}-norm (l-infinity norm).

l-infinity norm

As always, the definition for l_{\infty}-norm is

\left \| x \right \|_{\infty} = \sqrt[\infty]{\sum_i x_i^{\infty}}

Now this definition looks tricky again, but actually it is quite strait forward. Consider the vector \boldsymbol{x}, let’s say if x_j is the highest entry in the vector  \boldsymbol{x}, by the property of the infinity itself, we can say that

x_j^{\infty}\gg x_i^{\infty} \forall i \neq j

 then

\sum_i x_i^{\infty} = x_j^{\infty}

then

\left \| x \right \|_{\infty} = \sqrt[\infty]{\sum_i x_i^{\infty}} = \sqrt[\infty]{x_j^{\infty}} = \left | x_j \right |

Now we can simply say that the l_{\infty}-norm is

\left \| x \right \|_{\infty} = max(\left | x_i \right |)

that is the maximum entries’ magnitude of that vector. That surely demystified the meaning of l_{\infty}-norm

Now we have discussed the whole family of norm from l_0 to l_{\infty}, I hope that this discussion would help understanding the meaning of norm, its mathematical properties, and its real-world implication.

Reference and further reading:

Mathematical Norm – wikipedia 

Mathematical Norm – MathWorld

Michael Elad – “Sparse and Redundant Representations : From Theory to Applications in Signal and Image Processing” , Springer, 2010.

 Linear Programming – MathWorld

Compressive Sensing – Rice University

这篇关于l0-Norm, l1-Norm, l2-Norm, … , l-infinity Norm的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1060413

相关文章

【机器学习 sklearn】模型正则化L1-Lasso,L2-Ridge

#coding:utf-8from __future__ import divisionimport sysreload(sys)sys.setdefaultencoding('utf-8')import timestart_time = time.time()import pandas as pd# 输入训练样本的特征以及目标值,分别存储在变量X_train与y_train之中。

AI基础 L1 Introduction to Artificial Intelligence

什么是AI Chinese Room Thought Experiment 关于“强人工智能”的观点,即认为只要一个系统在行为上表现得像有意识,那么它就真的具有理解能力。  实验内容如下: 假设有一个不懂中文的英语说话者被关在一个房间里。房间里有一本用英文写的中文使用手册,可以指导他如何处理中文符号。当外面的中文母语者通过一个小窗口传递给房间里的人一些用中文写的问题时,房间里的人能够依

【Python机器学习】核心数、进程、线程、超线程、L1、L2、L3级缓存

如何知道自己电脑的CPU是几核的,打开任务管理器(同时按下:Esc键、SHIFT键、CTRL键) 然后,点击任务管理器左上角的性能选项,观察右下角中的内核:后面的数字,就是你CPU的核心数,下图中我的是16个核心的。 需要注意的是,下面的逻辑处理器:32 表示支持 32 线程(即超线程技术) 图中的进程:和线程:后面的数字代表什么 在你上传的图片中,“进程:180” 和 “线程:3251”

ASTER L2 表面反射率 SWIR 和 ASTER L2 表面反射率 VNIR V003

ASTER L2 Surface Reflectance SWIR and ASTER L2 Surface Reflectance VNIR V003 ASTER L2 表面反射率 SWIR 和 ASTER L2 表面反射率 VNIR V003 简介 ASTER 表面反射率 VNIR 和 SWIR (AST_07) 数据产品 (https://lpdaac.usgs.gov/documen

NASA:ASTER L2 表面辐射率(E(辐射率)和 T(地表温度)) V003数据集

ASTER L2 Surface Emissivity V003 ASTER L2 表面辐射率 V003 简介 ASTER L2 地表发射率是一种按需生成的产品((https://lpdaac.usgs.gov/documents/996/ASTER_Earthdata_Search_Order_Instructions.pdf)),利用 8 至 12 µm 光谱范围内的五个热红外(TIR)

Python 3.6 api-ms-win-crt-runtime-l1-1-0.dll丢失

问题: Python 3.6安装或者运行时出现丢失api-ms-win-crt-runtime-l1-1-0.dll异常: 解决办法: 下载安装VC运行库即可。 地址:https://www.microsoft.com/zh-cn/download/details.aspx?id=48145&e6b34bbe-475b-1abd-2c51-b5034bcdd6d2=True 点击

【Arm Cortex-X925】 -【第九章】-L2 内存系统

9. L2 内存系统 Cortex®-X925 核心的 L2 内存系统通过 CPU 桥接器将核心与 DynamIQ™ Shared Unit-120 连接。它包括私有的 L2 缓存。 L2 缓存是统一的,并且对集群中的每个 Cortex®-X925 核心都是私有的。 以下表格显示了 L2 内存系统的特点。 9.1 L2 缓存 集成的 L2 缓存处理来自指令和数据侧的指令和数据请求,以及

PTA L1-037 A除以B

L1-037 A除以B(10分) 真的是简单题哈 —— 给定两个绝对值不超过100的整数A和B,要求你按照“A/B=商”的格式输出结果。 输入格式: 输入在第一行给出两个整数A和B(−100≤A,B≤100),数字间以空格分隔。 输出格式: 在一行中输出结果:如果分母是正数,则输出“A/B=商”;如果分母是负数,则要用括号把分母括起来输出;如果分母为零,则输出的商应为Error。输出的商

【Arm Cortex-X925】 -【第七章】-L1 指令内存系统

7. L1 指令内存系统 Cortex-X925 核心的 L1 指令内存系统负责提取指令和预测分支。它包括 L1 指令缓存和 L1 指令转换后备缓冲区 (TLB)。L1 指令内存系统向解码器提供指令流。为了提高整体性能和降低功耗,L1 指令内存系统采用了动态分支预测和指令缓存技术。 下表显示了 L1 指令内存系统的特点。 注意 L1 指令 TLB 也位于 L1 指令内存系统中。然而,它是

c++ int n1 = l1 ? l1 ->val:0;三元运算符语句解释

这行 C++ 代码 int n1 = l1 ? l1->val : 0; 使用了三元运算符(也称为条件运算符),其基本语法是: condition ? expression_if_true : expression_if_false; 代码解析 条件判断: l1 是一个指针或对象。三元运算符的条件部分是 l1,这表示如果 l1 指向有效的对象(即 l1 不为 nullptr),条件为真;