2019独角兽企业重金招聘Python工程师标准>>>
箱型图如下所示:
计算过程:
(1)计算上四分位数(Q3),中位数,下四分位数(Q1)
(2)计算上四分位数和下四分位数之间的差值,即四分位数差(IQR,interquartile range)Q3-Q1
(3)绘制箱线图的上下范围,上限为上四分位数,下限为下四分位数。在箱子内部中位数的位置绘制横线。
(4)大于上四分位数1.5倍四分位数差的值,或者小于下四分位数1.5倍四分位数差的值,划为异常值(outliers)。
(5)异常值之外,最靠近上边缘和下边缘的两个值处,画横线,作为箱线图的触须。
(6)极端异常值,即超出四分位数差3倍距离的异常值,用实心点表示;较为温和的异常值,即处于1.5倍-3倍四分位数差之间的异常值,用空心点表示。
(7)为箱线图添加名称,数轴等
对于异常值处理可取消异常值,或当作缺失值来处理。
如下是代码, 其中 whis 参数对异常值进行限定。其含义是 大于 上四分位点的whis倍、小于下四分位点whis倍的数值则显示为异常点。
""" 数据质量分析 用箱型图进行异常值检测 """ import pandas as pd import matplotlib.pyplot as pltdef box_plot():"""绘制箱型图:return:"""feature = pd.read_csv('./dataset/feature')data = feature['licheng']# plt.boxplot(data, sym="o", whis=1.5)# plt.boxplot(data, sym="o", whis=0.01)plt.boxplot(data, sym="o", whis=1)plt.show()if __name__ == '__main__':box_plot()
licheng,youxi,bingbang 40920,8.326976,0.953952 14488,7.153469,1.673904 26052,1.441871,0.805124 75136,13.147394,0.428964 38344,1.669788,0.134296 72993,10.14174,1.0329549999999998 35948,6.830792,1.213192 42666,13.276369,0.54388 67497,8.631577,0.749278 35483,12.273169,1.508053 50242,3.723498,0.831917 63275,8.385879,1.669485 5569,4.875435,0.728658 51052,4.680098,0.625224 77372,15.29957,0.331351 43673,1.889461,0.191283 61364,7.516754,1.269164 69673,14.239195,0.261333 15669,0.0,1.250185 28488,10.528555,1.304844 6487,3.540265,0.822483 37708,2.991551,0.8339200000000001 22620,5.297865,0.638306 28782,6.593803,0.187108 19739,2.81676,1.686209 36788,12.458258,0.649617 5741,0.0,1.656418 28567,9.968648,0.731232 6808,1.364838,0.640103 41611,0.230453,1.151996 36661,11.865402,0.88281 43605,0.12046,1.352013 15360,8.545204,1.340429 63796,5.856649,0.16000599999999998 10743,9.665617999999998,0.778626 70808,9.778763,1.084103 72011,4.932976,0.632026 5914,2.216246,0.587095 14851,14.305635999999998,0.632317 33553,12.591889,0.686581 44952,3.424649,1.004504 17934,0.0,0.147573 27738,8.533823,0.205324 29290,9.829528,0.23862 42330,11.492186,0.263499 36429,3.570968,0.832254 39623,1.7712279999999998,0.207612 32404,3.513921,0.991854 27268,4.398172,0.975024 5477,4.276823,1.174874 14254,5.946014,1.614244 68613,13.79897,0.724375 41539,10.393591,1.6637240000000002 7917,3.007577,0.297302 21331,1.031938,0.486174 8338,4.7512120000000015,0.064693 5176,3.692269,1.655113 18983,10.448091,0.267652 68837,10.585786,0.329557 13438,1.604501,0.069064 48849,3.679497,0.961466 12285,3.795146,0.6966939999999999 7826,2.531885,1.659173 5565,9.73334,0.977746 10346,6.093067,1.413798 1823,7.712960000000002,1.054927 9744,11.470364,0.7604609999999999 16857,2.886529,0.934416 39336,10.054373,1.138351 65230,9.97247,0.881876 2463,2.335785,1.3661450000000002 27353,11.375155,1.528626 16191,0.0,0.6056189999999999 12258,4.126787,0.357501 42377,6.319522,1.058602 25607,8.680527,0.08695499999999999 77450,14.856391,1.129823 58732,2.454285,0.22238 46426,7.292202,0.548607 32688,8.745137,0.857348 64890,8.579001,0.683048 8554,2.507302,0.8691770000000001 28861,11.415476,1.505466 42050,4.83854,1.6808919999999998 32193,10.339507,0.583646 64895,6.573742,1.151433 2355,6.539397,0.462065 0,2.209159,0.723567 70406,11.196378,0.836326 57399,4.229595,0.128253 41732,9.505944,0.005273 11429,8.652725,1.348934 75270,17.101108,0.490712 5459,7.871839,0.717662 73520,8.262131,1.361646 40279,9.015635,1.658555 21540,9.215351,0.806762 17694,6.375007,0.033678 22329,2.262014,1.022169 46570,5.67711,0.7094689999999999 42403,11.293017,0.207976 33654,6.590043,1.353117 9171,4.71196,0.194167 28122,8.768099000000001,1.108041 34095,11.502519,0.545097 1774,4.682812,0.578112 40131,12.446578,0.300754 13994,12.908384,1.657722 77064,12.601108,0.974527 11210,3.929456,0.025466 6122,9.751503,1.18205 15341,3.043767,0.8881680000000001 44373,4.391522,0.8071 28454,11.695276,0.6790149999999999 63771,7.879742,0.154263 9217,5.613163,0.933632 69076,9.140172,0.8513 24489,4.258644,0.206892 16871,6.799831,1.221171 39776,8.752758,0.484418 5901,1.123033,1.180352 40987,10.833248,1.585426 7479,3.051618,0.026781 38768,5.308408999999998,0.030683 4933,1.841792,0.028099 32311,2.261978,1.605603 26501,11.573696,1.061347 37433,8.038764,1.08391 23503,10.734007,0.10371500000000003 68607,9.661909,0.350772 27742,9.00585,0.548737 11303,0.0,0.539131 0,5.75714,1.062373 32729,9.164656,1.624565 24619,1.31834,1.436243 42414,14.075597,0.695934 20210,10.10755,1.308398 33225,7.960292999999999,1.21976 54483,6.317292,0.018209 18475,12.664194,0.595653 33926,2.906644,0.581657 43865,2.388241,0.913938 26547,6.024471,0.486215 44404,7.226764,1.255329 16674,4.183997,1.27529 8123,11.850211,1.096981 42747,11.661797,1.167935 56054,3.574967,0.494666 10933,0.0,0.107475 18121,7.9376570000000015