本文主要是介绍0 --- 前情提要(pandas),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
【问题1】分组聚合-----非时间类型
import pandas as pd
from matplotlib import pyplot as pltdf = pd.read_csv('./books-Copy1.csv')# step1:去掉"年份"中的缺失值
df1 = df[ pd.notnull(df['original_publication_year']) ]# step2:按照”年份“分组,再求评分的平均值
'''
注意:下面这3种方式是一样的。推荐第二种
(1)应该先groupby完后,再选“rating”列,最后求均值mean()。这个顺序更好
'''
data1 = df1.groupby( by=df['original_publication_year'] )['average_rating'].mean()data1 = df1.groupby( by=df['original_publication_year'] ).mean()['average_rating']data1 = df1['average_rating'].groupby( by=df['original_publication_year'] ).mean()print(data1)
original_publication_year
-1750.0 3.630000
-762.0 4.030000
-750.0 4.005000
-720.0 3.730000
-560.0 4.050000... 2013.0 4.0122972014.0 3.9853782015.0 3.9546412016.0 4.0275762017.0 4.100909
Name: average_rating, Length: 293, dtype: float64
【问题2】 分组聚合------时间类型步骤:
(1)先将时间字符串转化为时间类型 df['timeStamp'] = pd.to_datetime(df['timeStamp'])(2)再将该列设置为索引 df.set_index('timeStamp', inplace=True) # 默认删除原列。inplace=True,对df原地修改注意:只有设置为索引后,才能对时间类型进行分组聚合 (3)对时间序列进行分组聚合例如:按照“月”计数。取“title”列 count_by_month = df.resample('M').count()['title'] # 时间类型的分组聚合例如:按照“cate”列计数。取“title”列 grouped = df.groupby(by='cate').count()['title'] # 非时间类型的分组聚合
# 统计不同月份的电话次数的变化情况
import pandas as pd
from matplotlib import pyplot as pltdf = pd.read_csv('./911-Copy1.csv')# step1
df['timeStamp'] = pd.to_datetime(df['timeStamp']) # (1)将时间字符串转化为时间类型------- pd.to_datetime()
df.set_index('timeStamp', inplace=True) # (2)将该列设置为索引--------df.set_index ,默认删除原列
count_by_month = df.resample('M').count()['title'] # (3)时间类型的分组聚合------取“title”列 “title”-----表示不同类型的事件print(df.head())
print(count_by_month)# step2
_x = count_by_month.index
_y = count_by_month.values_x = [i.strftime('%Y%m%d') for i in _x] # 将_x格式化:只有“xx年xx月xx日”plt.figure(figsize=(20,8), dpi=80)
plt.plot(range(len(_x)), _y, label='title')
plt.xticks(range(len(_x)), _x, rotation=45)
plt.legend(loc='best') # plt.legend()-----设置图例。与参数label=“title”相对应
plt.show()
lat lng \
timeStamp
2015-12-10 17:10:52 40.297876 -75.581294
2015-12-10 17:29:21 40.258061 -75.264680
2015-12-10 14:39:21 40.121182 -75.351975
2015-12-10 16:47:36 40.116153 -75.343513
2015-12-10 16:56:52 40.251492 -75.603350 desc \
timeStamp
2015-12-10 17:10:52 REINDEER CT & DEAD END; NEW HANOVER; Station ...
2015-12-10 17:29:21 BRIAR PATH & WHITEMARSH LN; HATFIELD TOWNSHIP...
2015-12-10 14:39:21 HAWS AVE; NORRISTOWN; 2015-12-10 @ 14:39:21-St...
2015-12-10 16:47:36 AIRY ST & SWEDE ST; NORRISTOWN; Station 308A;...
2015-12-10 16:56:52 CHERRYWOOD CT & DEAD END; LOWER POTTSGROVE; S... zip title twp \
timeStamp
2015-12-10 17:10:52 19525.0 EMS: BACK PAINS/INJURY NEW HANOVER
2015-12-10 17:29:21 19446.0 EMS: DIABETIC EMERGENCY HATFIELD TOWNSHIP
2015-12-10 14:39:21 19401.0 Fire: GAS-ODOR/LEAK NORRISTOWN
2015-12-10 16:47:36 19401.0 EMS: CARDIAC EMERGENCY NORRISTOWN
2015-12-10 16:56:52 NaN EMS: DIZZINESS LOWER POTTSGROVE addr e
timeStamp
2015-12-10 17:10:52 REINDEER CT & DEAD END 1
2015-12-10 17:29:21 BRIAR PATH & WHITEMARSH LN 1
2015-12-10 14:39:21 HAWS AVE 1
2015-12-10 16:47:36 AIRY ST & SWEDE ST 1
2015-12-10 16:56:52 CHERRYWOOD CT & DEAD END 1
timeStamp
2015-12-31 7916
2016-01-31 13096
2016-02-29 11396
2016-03-31 11059
2016-04-30 11287
2016-05-31 11374
2016-06-30 11732
2016-07-31 12088
2016-08-31 11904
2016-09-30 11669
2016-10-31 12502
2016-11-30 12091
2016-12-31 12162
2017-01-31 11605
2017-02-28 10267
2017-03-31 11684
2017-04-30 11056
2017-05-31 11719
2017-06-30 12333
2017-07-31 11768
2017-08-31 11753
2017-09-30 11332
2017-10-31 12337
2017-11-30 11548
2017-12-31 12941
2018-01-31 13123
2018-02-28 11165
2018-03-31 14923
2018-04-30 11240
2018-05-31 12551
2018-06-30 12106
2018-07-31 12549
2018-08-31 12315
2018-09-30 12338
2018-10-31 12976
2018-11-30 14097
2018-12-31 12144
2019-01-31 12304
2019-02-28 11556
2019-03-31 12441
2019-04-30 11845
2019-05-31 12823
2019-06-30 12322
2019-07-31 13166
2019-08-31 12387
2019-09-30 11874
2019-10-31 13425
2019-11-30 12446
2019-12-31 12529
2020-01-31 12208
2020-02-29 11043
2020-03-31 9920
2020-04-30 8243
2020-05-31 7220
Freq: M, Name: title, dtype: int64
【问题3】关于索引
若“one”为列 :则a['one']
若“one”为索引:则a.loc['one']
这篇关于0 --- 前情提要(pandas)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!