pandas打印DataFrame的前几行、后几行样本和随机抽样

本文主要是介绍pandas打印DataFrame的前几行、后几行样本和随机抽样，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

在使用pandas对结构化数据进行探索性分析时，我们经常需要打印几条样本出来看看数据读取和处理是否正确，除了用iloc函数以索引区间的方式读取，pandas为我们提供了更简单的head和tail函数，这两个函数的使用方法和效果如下。

1、head函数

pandas中的head函数的使用方法如下：

import numpy as np
import pandas as pddf_data = pd.DataFrame(np.random.rand(10, 5))
# 打印全体数据
print(df_data)
# 打印前5条数据
print(df_data.head())
# 打印前2条数据
print(df_data.head(2))

结果如下：

          0         1         2         3         4
0  0.718092  0.899116  0.125582  0.978291  0.906551
1  0.173210  0.433652  0.620770  0.824463  0.037136
2  0.012881  0.683509  0.449453  0.859218  0.464844
3  0.243671  0.436372  0.518020  0.091002  0.631860
4  0.044998  0.688442  0.487132  0.230478  0.963189
5  0.453796  0.276635  0.458563  0.165412  0.763591
6  0.975923  0.890063  0.462410  0.168376  0.375301
7  0.694347  0.104572  0.511853  0.987440  0.852707
8  0.290416  0.476767  0.009371  0.360272  0.420603
9  0.813110  0.623151  0.358813  0.697292  0.0045720         1         2         3         4
0  0.718092  0.899116  0.125582  0.978291  0.906551
1  0.173210  0.433652  0.620770  0.824463  0.037136
2  0.012881  0.683509  0.449453  0.859218  0.464844
3  0.243671  0.436372  0.518020  0.091002  0.631860
4  0.044998  0.688442  0.487132  0.230478  0.9631890         1         2         3         4
0  0.718092  0.899116  0.125582  0.978291  0.906551
1  0.173210  0.433652  0.620770  0.824463  0.037136

我们进一步看一下pandas中head函数的源码可以发现它也是基于iloc函数实现的，且默认返回前5行数据：

    def head(self, n=5):"""Return the first `n` rows.This function returns the first `n` rows for the object basedon position. It is useful for quickly testing if your objecthas the right type of data in it.Parameters----------n : int, default 5Number of rows to select.Returns-------obj_head : same type as callerThe first `n` rows of the caller object.See Also--------DataFrame.tail: Returns the last `n` rows.Examples-------->>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',...                    'monkey', 'parrot', 'shark', 'whale', 'zebra']})>>> dfanimal0  alligator1        bee2     falcon3       lion4     monkey5     parrot6      shark7      whale8      zebraViewing the first 5 lines>>> df.head()animal0  alligator1        bee2     falcon3       lion4     monkeyViewing the first `n` lines (three in this case)>>> df.head(3)animal0  alligator1        bee2     falcon"""return self.iloc[:n]

2、tail函数

pandas中的tail函数的使用方法如下：

import numpy as np
import pandas as pddf_data = pd.DataFrame(np.random.rand(10, 5))
# 打印全体数据
print(df_data)
# 打印后5条数据
print(df_data.tail())
# 打印后2条数据
print(df_data.tail(2))

结果如下：

          0         1         2         3         4
0  0.400922  0.146871  0.141041  0.839072  0.113124
1  0.885119  0.519281  0.571275  0.304061  0.965502
2  0.309809  0.061705  0.911406  0.954084  0.296767
3  0.965902  0.351461  0.398504  0.664548  0.764347
4  0.602855  0.582827  0.534116  0.226877  0.539045
5  0.736614  0.200998  0.170951  0.200885  0.623913
6  0.410535  0.347231  0.934425  0.130389  0.104412
7  0.871398  0.788983  0.210943  0.519613  0.133114
8  0.353736  0.986401  0.385541  0.691156  0.025777
9  0.978579  0.827868  0.074246  0.846744  0.0637190         1         2         3         4
5  0.736614  0.200998  0.170951  0.200885  0.623913
6  0.410535  0.347231  0.934425  0.130389  0.104412
7  0.871398  0.788983  0.210943  0.519613  0.133114
8  0.353736  0.986401  0.385541  0.691156  0.025777
9  0.978579  0.827868  0.074246  0.846744  0.0637190         1         2         3         4
8  0.353736  0.986401  0.385541  0.691156  0.025777
9  0.978579  0.827868  0.074246  0.846744  0.063719

看一下源码可以发现tail函数和head函数类似，也是基于iloc函数实现的，默认也是返回后5条数据。

3、sample函数

我们可以进一步使用pandas中的sample函数从数据框中抽取指定数量的样本，代码如下：

import numpy as np
import pandas as pddf_data = pd.DataFrame(np.random.rand(10, 5))
# 打印全体数据
print(df_data)
# 打印随机抽取的3条数据
print(df_data.sample(n=3))

效果如下：

          0         1         2         3         4
0  0.939711  0.620093  0.246614  0.399083  0.683863
1  0.432783  0.514398  0.764729  0.734619  0.546725
2  0.602358  0.731698  0.329452  0.413731  0.483912
3  0.035878  0.473099  0.938656  0.438246  0.719304
4  0.639476  0.168669  0.886065  0.422071  0.108447
5  0.508343  0.838977  0.768282  0.155232  0.706890
6  0.963683  0.492637  0.890227  0.742109  0.058080
7  0.534936  0.163335  0.582532  0.519570  0.833517
8  0.574580  0.088736  0.331792  0.954629  0.896857
9  0.626263  0.933672  0.348024  0.383196  0.7778810         1         2         3         4
5  0.508343  0.838977  0.768282  0.155232  0.706890
7  0.534936  0.163335  0.582532  0.519570  0.833517
8  0.574580  0.088736  0.331792  0.954629  0.896857

这篇关于pandas打印DataFrame的前几行、后几行样本和随机抽样的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！