本文主要是介绍【跟着stackoverflow学Pandas】-How do I get the row count of a Pandas dataframe-获取DataFrame行数,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
最近做一个系列博客,跟着stackoverflow学Pandas。
专栏地址:http://blog.csdn.net/column/details/16726.html
以 pandas作为关键词,在stackoverflow中进行搜索,随后安照 votes 数目进行排序:
https://stackoverflow.com/questions/tagged/pandas?sort=votes&pageSize=15
How do I get the row count of a Pandas dataframe-获取DataFrame行数
###数据准备
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(1000,3), columns=['col1', 'col2', 'col3'])
df.iloc[::2,0] = np.nan
获取行数
df.shape # 得到df的行和列数
#(1000, 3)df['col1'].count() #去除了NaN的数据
# 500len(df.index)
# 1000len(df)
# 1000
时间测评
因为CPU采用了缓存优化,所以计算的时间并不是很准确,但是也有一定的代表性。
%timeit df.shape
#The slowest run took 169.99 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 947 ns per loop%timeit df['col1'].count()
#The slowest run took 50.63 times longer than the fastest. This could mean that an intermediate result is being cached.
#10000 loops, best of 3: 22.6 µs per loop%timeit len(df.index)
#The slowest run took 14.11 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 490 ns per loop%timeit len(df)
#The slowest run took 18.61 times longer than the fastest. This could mean that an intermediate result is being cached.
#1000000 loops, best of 3: 653 ns per loop
我们发现速度最快的是len(df.index)
方法, 其次是len(df)
最慢的是df['col1'].count()
,因为该函数需要去除NaN,当然结果也与其他结果不同,使用时需要格外注意。
这篇关于【跟着stackoverflow学Pandas】-How do I get the row count of a Pandas dataframe-获取DataFrame行数的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!