本文主要是介绍Pandas-高级处理(二):连接与修补【concat(参数:axis、join、keys)、combine_first(根据index,df1的空值被df2替代)】,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
一、连接(concat):沿轴执行连接操作
pd.concat([data1, data2], axis=1):按照行或列进行连接操作:
- axis=0为列索引;
- axis=1为行索引;
比如我们将刚才处理好的one-hot编码与原数据连接
1、参数:axis
import pandas as pd# 连接:concats1 = pd.Series([1, 2, 3])
s2 = pd.Series([2, 3, 4])s3 = pd.Series([1, 2, 3], index=['a', 'c', 'h'])
s4 = pd.Series([2, 3, 4], index=['b', 'e', 'd'])print("s1 = \n", s1)
print('-' * 50)
print("s2 = \n", s2)
print('-' * 50)
print("s3 = \n", s3)
print('-' * 50)
print("s4 = \n", s4)
print('-' * 200)# 默认axis=0,行+行
data1 = pd.concat([s1, s2])
print("data1 = pd.concat([s1,s2]) = \n", data1)
print('-' * 200)data2 = pd.concat([s3, s4]).sort_index()
print("data2 = pd.concat([s3,s4]).sort_index() = \n", data2)
print('-' * 200)# axis=1,列+列,成为一个Dataframe
data3 = pd.concat([s3, s4], axis=1)
print("data3 = pd.concat([s3,s4], axis=1) = \n", data3)
print('-' * 200)
打印结果:
s1 = 0 1
1 2
2 3
dtype: int64
--------------------------------------------------
s2 = 0 2
1 3
2 4
dtype: int64
--------------------------------------------------
s3 = a 1
c 2
h 3
dtype: int64
--------------------------------------------------
s4 = b 2
e 3
d 4
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data1 = pd.concat([s1,s2]) = 0 1
1 2
2 3
0 2
1 3
2 4
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data2 = pd.concat([s3,s4]).sort_index() = a 1
b 2
c 2
d 4
e 3
h 3
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data3 = pd.concat([s3,s4], axis=1) = 0 1
a 1.0 NaN
c 2.0 NaN
h 3.0 NaN
b NaN 2.0
e NaN 3.0
d NaN 4.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Process finished with exit code 0
2、参数:join
import pandas as pd# 连接方式:joins5 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s6 = pd.Series([2, 3, 4], index=['b', 'c', 'd'])
print("s5 = \n", s5)
print('-' * 50)
print("s6 = \n", s6)
print('-' * 200)data1 = pd.concat([s5, s6], axis=1)
print("data1 = pd.concat([s5,s6], axis= 1) = \n", data1)
print('-' * 200)# join:{'inner','outer'},默认为“outer”。如何处理其他轴上的索引。outer为联合和inner为交集。
data3 = pd.concat([s5, s6], axis=1, join='inner')
print("data3 = pd.concat([s5,s6], axis= 1, join='inner') = \n", data3)
print('-' * 200)
打印结果:
s5 = a 1
b 2
c 3
dtype: int64
--------------------------------------------------
s6 = b 2
c 3
d 4
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data1 = pd.concat([s5,s6], axis= 1) = 0 1
a 1.0 NaN
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data3 = pd.concat([s5,s6], axis= 1, join='inner') = 0 1
b 2 2
c 3 3
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Process finished with exit code 0
3、参数:keys
import pandas as pd# 连接方式:joins5 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s6 = pd.Series([2, 3, 4], index=['b', 'c', 'd'])
print("s5 = \n", s5)
print('-' * 50)
print("s6 = \n", s6)
print('-' * 200)# 覆盖列名
# keys:序列,默认值无。使用传递的键作为最外层构建层次索引
sre1 = pd.concat([s5, s6], keys=['one', 'two'])
print("sre1 = \n{0} \ntype(sre1) = {1}".format(sre1, type(sre1)))
print('-' * 50)
print("sre.index = \n", sre1.index)
print('-' * 200)# axis = 1, 覆盖列名
sre2 = pd.concat([s5, s6], axis=1)
print("sre2 = \n{0} \ntype(sre2) = {1}".format(sre2, type(sre2)))
print('-' * 50)
sre3 = pd.concat([s5, s6], axis=1, keys=['one', 'two'])
print("sre3 = \n{0} \ntype(sre3) = {1}".format(sre3, type(sre3)))
打印结果:
s5 = a 1
b 2
c 3
dtype: int64
--------------------------------------------------
s6 = b 2
c 3
d 4
dtype: int64
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sre1 =
one a 1b 2c 3
two b 2c 3d 4
dtype: int64
type(sre1) = <class 'pandas.core.series.Series'>
--------------------------------------------------
sre.index = MultiIndex([('one', 'a'),('one', 'b'),('one', 'c'),('two', 'b'),('two', 'c'),('two', 'd')],)
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sre2 = 0 1
a 1.0 NaN
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
type(sre2) = <class 'pandas.core.frame.DataFrame'>
--------------------------------------------------
sre3 = one two
a 1.0 NaN
b 2.0 2.0
c 3.0 3.0
d NaN 4.0
type(sre3) = <class 'pandas.core.frame.DataFrame'>Process finished with exit code 0
二、修补(combine_first)
-
根据index,df1的空值被df2替代
-
如果df2的index多于df1,则更新到df1上,比如index=[‘a’,1]
-
update,直接df2覆盖df1,相同index位置
import numpy as np
import pandas as pd# 修补 pd.combine_first()df1 = pd.DataFrame([[np.nan, 3., 5.], [-4.6, np.nan, np.nan], [np.nan, 7., np.nan]])
df2 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]], index=[1, 2])
df3 = pd.DataFrame([[-42.6, np.nan, -8.2], [-5., 1.6, 4]], index=['a', 1])
print("df1 = \n", df1)
print('-' * 50)
print("df2 = \n", df2)
print('-' * 200)# 根据index,df1的空值被df2替代
data1 = df1.combine_first(df2)
print("data1 = \n", data1)
print('-' * 200)# 如果df2的index多于df1,则更新到df1上,比如index=['a',1]
data2 = df1.combine_first(df3)
print("data2 = \n", data2)
print('-' * 200)# update,直接df2覆盖df1,相同index位置
df1.update(df2)
print("df1 = \n", df1)
打印结果:
df1 = 0 1 2
0 NaN 3.0 5.0
1 -4.6 NaN NaN
2 NaN 7.0 NaN
--------------------------------------------------
df2 = 0 1 2
1 -42.6 NaN -8.2
2 -5.0 1.6 4.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data1 = 0 1 2
0 NaN 3.0 5.0
1 -4.6 NaN -8.2
2 -5.0 7.0 4.0
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
data2 = 0 1 2
0 NaN 3.0 5.0
1 -4.6 1.6 4.0
2 NaN 7.0 NaN
a -42.6 NaN -8.2
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
df1 = 0 1 2
0 NaN 3.0 5.0
1 -42.6 NaN -8.2
2 -5.0 1.6 4.0Process finished with exit code 0
这篇关于Pandas-高级处理(二):连接与修补【concat(参数:axis、join、keys)、combine_first(根据index,df1的空值被df2替代)】的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!