MaxPooling的作用 and some tips about CNN

2023-10-22 13:58
文章标签 作用 cnn tips maxpooling

本文主要是介绍MaxPooling的作用 and some tips about CNN,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

摘抄记录:
MaxPooling
Another important concept of CNNs is max-pooling, which is a form of non-linear down-sampling. Max-pooling partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum value.

Max-pooling is useful in vision for two reasons:
By eliminating non-maximal values, it reduces computation for upper layers.

It provides a form of translation invariance. Imagine cascading a max-pooling layer with a convolutional layer. There are 8 directions in which one can translate the input image by a single pixel. If max-pooling is done over a 2x2 region, 3 out of these 8 possible configurations will produce exactly the same output at the convolutional layer. For max-pooling over a 3x3 window, this jumps to 5/8.

Since it provides additional robustness to position, max-pooling is a “smart” way of reducing the dimensionality of intermediate representations.

Tips and Tricks
Choosing Hyperparameters
CNNs are especially tricky to train, as they add even more hyper-parameters than a standard MLP. While the usual rules of thumb for learning rates and regularization constants still apply, the following should be kept in mind when optimizing CNNs.

Number of filters

When choosing the number of filters per layer, keep in mind that computing the activations of a single convolutional filter is much more expensive than with traditional MLPs !

Assume layer (l-1) contains K^{l-1} feature maps and M \times N pixel positions (i.e., number of positions times number of feature maps), and there are K^l filters at layer l of shape m \times n. Then computing a feature map (applying an m \times n filter at all (M-m) \times (N-n) pixel positions where the filter can be applied) costs (M-m) \times (N-n) \times m \times n \times K^{l-1}. The total cost is K^l times that. Things may be more complicated if not all features at one level are connected to all features at the previous one.

For a standard MLP, the cost would only be K^l \times K^{l-1} where there are K^l different neurons at level l. As such, the number of filters used in CNNs is typically much smaller than the number of hidden units in MLPs and depends on the size of the feature maps (itself a function of input image size and filter shapes).

Since feature map size decreases with depth, layers near the input layer will tend to have fewer filters while layers higher up can have much more. In fact, to equalize computation at each layer, the product of the number of features and the number of pixel positions is typically picked to be roughly constant across layers. To preserve the information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) to be non-decreasing from one layer to the next (of course we could hope to get away with less when we are doing supervised learning). The number of feature maps directly controls capacity and so that depends on the number of available examples and the complexity of the task.

Filter Shape

Common filter shapes found in the litterature vary greatly, usually based on the dataset. Best results on MNIST-sized images (28x28) are usually in the 5x5 range on the first layer, while natural image datasets (often with hundreds of pixels in each dimension) tend to use larger first-layer filters of shape 12x12 or 15x15.

The trick is thus to find the right level of “granularity” (i.e. filter shapes) in order to create abstractions at the proper scale, given a particular dataset.

Max Pooling Shape

Typical values are 2x2 or no max-pooling. Very large input images may warrant 4x4 pooling in the lower-layers. Keep in mind however, that this will reduce the dimension of the signal by a factor of 16, and may result in throwing away too much information.

Footnotes

[1] For clarity, we use the word “unit” or “neuron” to refer to the artificial neuron and “cell” to refer to the biological neuron.
Tips

If you want to try this model on a new dataset, here are a few tips that can help you get better results:

Whitening the data (e.g. with PCA)
Decay the learning rate in each epoch

http://deeplearning.net/tutorial/lenet.html

这篇关于MaxPooling的作用 and some tips about CNN的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/261879

相关文章

Android fill_parent、match_parent、wrap_content三者的作用及区别

这三个属性都是用来适应视图的水平或者垂直大小,以视图的内容或尺寸为基础的布局,比精确的指定视图的范围更加方便。 1、fill_parent 设置一个视图的布局为fill_parent将强制性的使视图扩展至它父元素的大小 2、match_parent 和fill_parent一样,从字面上的意思match_parent更贴切一些,于是从2.2开始,两个属性都可以使用,但2.3版本以后的建议使

maven发布项目到私服-snapshot快照库和release发布库的区别和作用及maven常用命令

maven发布项目到私服-snapshot快照库和release发布库的区别和作用及maven常用命令 在日常的工作中由于各种原因,会出现这样一种情况,某些项目并没有打包至mvnrepository。如果采用原始直接打包放到lib目录的方式进行处理,便对项目的管理带来一些不必要的麻烦。例如版本升级后需要重新打包并,替换原有jar包等等一些额外的工作量和麻烦。为了避免这些不必要的麻烦,通常我们

未来工作趋势:零工小程序在共享经济中的作用

经济在不断发展的同时,科技也在飞速发展。零工经济作为一种新兴的工作模式,正在全球范围内迅速崛起。特别是在中国,随着数字经济的蓬勃发展和共享经济模式的深入推广,零工小程序在促进就业、提升资源利用效率方面显示出了巨大的潜力和价值。 一、零工经济的定义及现状 零工经济是指通过临时性、自由职业或项目制的工作形式,利用互联网平台快速匹配供需双方的新型经济模式。这种模式打破了传统全职工作的界限,为劳动

Science|癌症中三级淋巴结构的免疫调节作用与治疗潜力|顶刊精析·24-09-08

小罗碎碎念 Science文献精析 今天精析的这一篇综述,于2022-01-07发表于Science,主要讨论了癌症中的三级淋巴结构(Tertiary Lymphoid Structures, TLS)及其在肿瘤免疫反应中的作用。 作者类型作者姓名单位名称(中文)通讯作者介绍第一作者Ton N. Schumacher荷兰癌症研究所通讯作者之一通讯作者Daniela S. Thomm

深度学习实战:如何利用CNN实现人脸识别考勤系统

1. 何为CNN及其在人脸识别中的应用 卷积神经网络(CNN)是深度学习中的核心技术之一,擅长处理图像数据。CNN通过卷积层提取图像的局部特征,在人脸识别领域尤其适用。CNN的多个层次可以逐步提取面部的特征,最终实现精确的身份识别。对于考勤系统而言,CNN可以自动从摄像头捕捉的视频流中检测并识别出员工的面部。 我们在该项目中采用了 RetinaFace 模型,它基于CNN的结构实现高效、精准的

j2EE通用jar包的作用

原文:http://blog.sina.com.cn/s/blog_610901710101kx37.html IKIKAnalyzer3.2.8.jar // 分词器 ant-junit4.jar // ant junit antlr-2.7.6.jar // 没有此包,hibernate不会执行hql语句。并且会报NoClassDefFoundError: antlr

【vue3|第28期】 Vue3 + Vue Router:探索路由重定向的使用与作用

日期:2024年9月8日 作者:Commas 签名:(ง •_•)ง 积跬步以致千里,积小流以成江海…… 注释:如果您觉在这里插入代码片得有所帮助,帮忙点个赞,也可以关注我,我们一起成长;如果有不对的地方,还望各位大佬不吝赐教,谢谢^ - ^ 1.01365 = 37.7834;0.99365 = 0.0255 1.02365 = 1377.4083;0.98365 = 0.0006 说

请解释Java Web应用中的前后端分离是什么?它有哪些好处?什么是Java Web中的Servlet过滤器?它有什么作用?

请解释Java Web应用中的前后端分离是什么?它有哪些好处? Java Web应用中的前后端分离 在Java Web应用中,前后端分离是一种开发模式,它将传统Web开发中紧密耦合的前端(用户界面)和后端(服务器端逻辑)代码进行分离,使得它们能够独立开发、测试、部署和维护。在这种模式下,前端通常通过HTTP请求与后端进行数据交换,后端则负责业务逻辑处理、数据库交互以及向前端提供RESTful

PRN(20201231):驾驶人驾驶决策机制遵循最小作用量原理

王建强, 郑讯佳, 黄荷叶. 驾驶人驾驶决策机制遵循最小作用量原理[J]. 中国公路学报, 2020, v.33;No.200(04):159-172. 观点: 为提升智能汽车的自主决策能力,使其能够学习人的决策智慧以适应复杂多变的道路交通环境,需要揭示驾驶人决策机制。 依据: 物理学中常用最小作用量原理解释自然界(包括物理和生物行为)极值现象。同时,最小作用量原理还用于解释蚂蚁在觅

glPushMatrix()和glPopMatrix()的作用

当你做了一些移动或旋转等变换后,使用glPushMatrix(); OpenGL 会把这个变换后的位置和角度保存起来。 然后你再随便做第二次移动或旋转变换,再用glPopMatrix(); OpenGL 就把刚刚保存的那个位置和角度恢复。 比如: glLoadIdentity(); glTranslatef(1,0,0);//向右移动(1,0,0) glPushMatrix(