本文主要是介绍pandas33 pivot重塑( tcy),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
1.函数
DataFrame.pivot(index = None,columns = None,values = None )
用途:# 根据列值重塑数据(生成“数据透视表”)。# 使用指定索引 / 列中的唯一值来形成生成的DataFrame的轴。# 此函数不支持数据聚合,多个值将导致列中的MultiIndex。
返回:# 由给定索引/列值组织的重新整形的DataFrame。
注意:# pivot只是一个快捷方式: 用set index创建层次化索引 , 再用unstack重塑。
参数:index:字符串或对象,可选# 用于制作新帧索引的列。如果为None,则使用现有索引。columns:字符串或对象# 用于制作新框架列的列。values:字符串,对象或前一个列表,列名列表;可选# 用于填充新框架值的列。如果未指定,将使用所有剩余列,结果将具有分层索引列。
ValueError异常:# 当有任何索引时,列组合具有多个值。需要聚合时的DataFrame.pivot_table。
2.实例
# 实例1:df = pd.DataFrame({'s1': ['ss1', 'ss1', 'ss1', 'ss2', 'ss2', 'ss2'],'s2': ['A', 'B', 'C', 'A', 'B', 'C'],'s3': [1, 2, 3, 4, 5, 6],'s4': ['a1', 'a2', 'a3', 'a4', 'a5', 'a6']})
# df.pivot(index='s1')#错误,必须指定columns参数
df.pivot(columns='s2')# df results1 s3 s4s1 s2 s3 s4 s2 A B C A B C A B C
0 ss1 A 1 a1 0 ss1 NaN NaN 1.0 NaN NaN a1 NaN NaN
1 ss1 B 2 a2 1 NaN ss1 NaN NaN 2.0 NaN NaN a2 NaN
2 ss1 C 3 a3 2 NaN NaN ss1 NaN NaN 3.0 NaN NaN a3
3 ss2 A 4 a4 3 ss2 NaN NaN 4.0 NaN NaN a4 NaN NaN
4 ss2 B 5 a5 4 NaN ss2 NaN NaN 5.0 NaN NaN a5 NaN
5 ss2 C 6 a6 5 NaN NaN ss2 NaN NaN 6.0 NaN NaN a6result1=df.pivot(index='s1', columns='s2')
result2=df.pivot(index='s1', columns='s2', values='s3')
result2=df.pivot(index='s1', columns='s2')['s3']#等价
result3=df.pivot(index='s1', columns='s2', values=['s3', 's4'])#result1 result2 result3s3 s4 s2 A B C s3 s4
s2 A B C A B C s1 s2 A B C A B C
s1 ss1 1 2 3 s1
ss1 1 2 3 a1 a2 a3 ss2 4 5 6 ss1 1 2 3 a1 a2 a3
ss2 4 5 6 a4 a5 a6 ss2 4 5 6 a4 a5 a6
# 实例2:如果有任何重复,则引发ValueError。df = pd.DataFrame({"s1": ['ss1', 'ss1', 'ss2', 'ss2'], "s2": ['A', 'A', 'B', 'C'],"s3": [1, 2, 3, 4]})dfs1 s2 s3
0 ss1 A 1
1 ss1 A 2
2 ss2 B 3
3 ss2 C 4df.pivot(index='s1', columns='s2', values='s3')#ValueError#索引和列参数的前两行是相同的错误抛出
# 实例3:df.pivot('date_time','item')==df.set_index(['date_time','item']).unstack('item')
# pivot只是一个快捷方式: 用set index创建层次化索引 , 再用unstack重塑。from io import StringIO
data='date item value\n' \'2019-03-01 00:00:01 s2 11\n' \'2019-03-02 00:00:02 s1 12\n' \'2019-03-03 00:00:03 s2 13\n' \'2019-03-04 00:00:04 s1 14\n' \'2019-03-05 00:00:05 s3 15\n' \'2019-03-06 00:00:06 s3 16\n' \'2019-03-07 00:00:07 s2 17'df=pd.read_csv(StringIO(data),sep='\s+',engine='python',header=None,parse_dates=[[0,1]],skiprows=1,infer_datetime_format=True,names=['date','time','item','value'])dfdate_time item value
0 2019-03-01 00:00:01 s2 11
1 2019-03-02 00:00:02 s1 12
2 2019-03-03 00:00:03 s2 13
3 2019-03-04 00:00:04 s1 14
4 2019-03-05 00:00:05 s3 15
5 2019-03-06 00:00:06 s3 16
6 2019-03-07 00:00:07 s2 17df.pivot('date_time','item','value')#参数值分别为行和列索引的列名,数据列的列名item s1 s2 s3
date_time
2019-03-01 00:00:01 NaN 11.0 NaN
2019-03-02 00:00:02 12.0 NaN NaN
2019-03-03 00:00:03 NaN 13.0 NaN
2019-03-04 00:00:04 14.0 NaN NaN
2019-03-05 00:00:05 NaN NaN 15.0
2019-03-06 00:00:06 NaN NaN 16.0
2019-03-07 00:00:07 NaN 17.0 NaNdf['value2'] = np.arange(90,90+len(df))
dfdate_time item value value2
0 2019-03-01 00:00:01 s2 11 90
1 2019-03-02 00:00:02 s1 12 91
2 2019-03-03 00:00:03 s2 13 92
3 2019-03-04 00:00:04 s1 14 93
4 2019-03-05 00:00:05 s3 15 94
5 2019-03-06 00:00:06 s3 16 95
6 2019-03-07 00:00:07 s2 17 96df.pivot('date_time','item')#如忽略最后一个参数, 得到有层次化索引的dfvalue value2
item s1 s2 s3 s1 s2 s3
date_time
2019-03-01 00:00:01 NaN 11.0 NaN NaN 90.0 NaN
2019-03-02 00:00:02 12.0 NaN NaN 91.0 NaN NaN
2019-03-03 00:00:03 NaN 13.0 NaN NaN 92.0 NaN
2019-03-04 00:00:04 14.0 NaN NaN 93.0 NaN NaN
2019-03-05 00:00:05 NaN NaN 15.0 NaN NaN 94.0
2019-03-06 00:00:06 NaN NaN 16.0 NaN NaN 95.0
2019-03-07 00:00:07 NaN 17.0 NaN NaN 96.0 NaNdf.set_index(['date_time','item']).unstack('item')#结果同上
# pivot只是一个快捷方式: 用set index创建层次化索引 , 再用unstack重塑。
这篇关于pandas33 pivot重塑( tcy)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!