本文主要是介绍机器学习 - Pandas 练习, 常见功能查阅,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
机器学习记录
Pandas
Pandas 官网
安装 pandas 库:
conda install -y pandas openpyxl
pd.show_versions()
查看版本信息
import os
import pandas as pd
import requestsPATH = r"/your_local_dir"r = requests.get('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')with open(PATH + 'iris.data', 'w') as f:f.write(r.text)os.chdir(PATH)# 读取 csv 文件
df = pd.read_csv(PATH + 'iris.data', names=['sepal length', 'sepal width', 'petal length', 'petal width', 'class'])# 读取前几条数据
print(df.head())# 读取指定列
print(df['sepal length'])# 读取指定行和列
print(df.loc[3:5, ['sepal length', 'petal length', 'class']])# 读取指定行和列
print(df.iloc[3:5, [0, 2, 4]])# 读取列名
print(df.columns)# class 列去重后的值
print(df['class'].unique())# 统计每列的数量
print(df.count())# 根据 class 进行筛选, 序号还是原来的
print(df[df['class'] == 'Iris-versicolor'])print(df[df['class'] == 'Iris-versicolor'].count())# 根据 class 进行筛选, 重新编号, 从 0 开始
versicolor = df[df['class'] == 'Iris-versicolor'].reset_index(drop=True)
print(versicolor)# 筛选数据
print(df[(df['class'] == 'Iris-versicolor') & (df['petal width'] > 1.5)])# 各列统计信息
print(df.describe())# 指定统计的百分比
print(df.describe(percentiles=[.20, .40, .80, .90, .95]))# Pearson 相关系数
print(df.corr())print(df.corr(method="spearman"))print(df.corr(method="kendall"))
.describe()
对数据集进行统计描述, 包括计数, 均值, 标准差, 最小值, 最大值和四分位数等.
只对数值型和时间型数据有效, 对于文本型数据, 只会给出计数, 唯一值, 最大频率和数据类型等信息.
返回值是一个 DataFrame, 每列是一个统计指标, 每行是数据集中的一种数值型或时间型变量.
.corr()
用于计算 DataFrame 中列之间的相关系数(correlation coefficient)的方法. 相关系数可以衡量两个变量之间的线性关系强度, 取值范围是-1到1, 值越接近1表示两个变量越正相关, 值越接近-1表示两个变量越负相关, 值为0表示两个变量之间没有线性关系.
pandas corr 方法语法如下:
DataFrame.corr(method='pearson', min_periods=1)
其中, 参数 method 指定所使用的相关系数计算方法, 可以是 pearson(默认), spearman(Spearman’s) 或 kendall(Kendall’s). 参数 min_periods 为最小的非空值数量, 当非空值数量小于 min_periods 时, 返回 NaN.
皮尔逊积矩相关系数公式(Pearson product-moment correlation coefficient, PPMCC)
该公式计算两个变量之间的线性关系, 其值介于 -1 和 +1 之间, 表示两个变量之间的相关性强度和方向. 具体公式为:
r = (n * Σ(xy) - Σ x * Σ y) / sqrt((n * Σ(x^2) - (Σ x)^2) * (n * Σ(y^2) - (Σ y)^2))
其中, x 和 y 分别表示两个变量的值, Σ 表示求和符号, n 表示样本个数, r 表示相关系数.
Kendall
Kendall 相关系统是用于度量数据集中的相关性和相似性的数学和统计工具. 其中最常用的是 Kendall 相关系数和 Kendall tau 相关系数. 这两种相关系数都是用于比较两组顺序或排名数据的相似性的方法.
- Kendall 相关系数是一种度量两组数据(X 和 Y)之间的相关性的方法. 它的范围在 -1 和 1 之间, 其中 -1 表示完全的反相关, 0 表示无关, 1 表示完全相关.
- Kendall tau 相关系数是一种度量两组数据(X 和 Y)之间的相似性的方法. 它的范围在 -1 和 1 之间, 其中 -1 表示完全的相反, 0 表示无关, 1 表示完全相同.
Kendall’s tau
可以用来测试排名数据中变量之间的关系. 它的公式为:
τ = (2/ n(n-1)) Σ Σ sign(x_i - x_j) sign(y_i - y_j)
其中, n 是样本大小, x_i 和 y_i 是第 i 个观察值的排名, i ≠ j. sign()是符号函数, 如果其参数为正则返回1, 否则返回-1.
该公式计算每个组合的符号和符号积, 并将它们加起来. 如果两个变量是同向的(即它们都按相同的方向移动), 则它们的符号积将是1, 否则它将是-1. 最终的结果将在-1到1之间, 其中-1表示完全相反的排名, 而1表示完全相同的排名.
Spearman’s
一种衡量两个变量相关性的方法, 也称为 Spearman’s 秩相关系数.
在 Spearman’s 中, 每个变量的值都被转换成它们的排序值(例如, 从最小值到最大值的排名). 然后, 比较这些排名, 以确定变量之间的关系强度.
Spearman’s 秩相关系数的公式如下:
ρ = 1 - (6Σ d² / n(n²-1))
其中,
- ρ 是 Spearman’s 秩相关系数
- d 是排名之间的差异
- n 是双方比较的对象数量
Spearman’s 秩相关系数的范围是 -1 到 1, 其值的符号表示相关性的方向.
- 如果 Spearman’s 秩相关系数等于 1, 则两个变量完全正相关;
- 如果等于 -1, 则两个变量完全负相关;
- 如果等于 0, 则两个变量没有相关性.
Map
适用于序列数据
修改 class 的内容, 使其值为 map 后的简称:
df['class'] = df['class'].map({'Iris-setosa': 'SET', 'Iris-virginica': 'VIR', 'Iris-versicolor': 'VER'})
选择一个不同的名称, 就可以保留原来的 class, 同时增加新的一列, 新的一列的值就是 class 的简称:
df['class_short'] = df['class'].map({'Iris-setosa': 'SET', 'Iris-virginica': 'VIR', 'Iris-versicolor': 'VER'})
通过函数来完成:
df['wide length'] = df['petal length'].map(lambda v: 2 if v >= 2 else 0)
Apply
可用于数据框 和 序列
通过函数来完成, 根据一列的数据来创建新的一列:
df['wide petal'] = df['petal width'].apply(lambda v: 1 if v >= 1.3 else 0)
在数据框上使用 apply (map 没有的功能):
df['petal area'] = df.apply(lambda r: r['petal length'] * r['petal width'], axis=1)
ApplyMap
对数据框里所有的数据单元执行一个函数.
常见用法: 根据一定的条件标准来转变或格式化每一个单元.
对于整个数据框的数据, 如果是 float 类型, 则取其对数, 否则使用原值:
df.applymap(lambda v: np.log(v) if isinstance(v, float) else v)
Groupby
基于某些选择的类别对数据进行分组
根据 class 进行分组, 求平均值:
df.groupby('class').mean(numeric_only=True)
numeric_only
- default: False- False: 只能使用数字
- True: 允许非数字
根据 class 进行分组, 统计信息:
df.groupby('class').describe()
根据 petal width 分类, 类别去重后得到数组.
df.groupby('petal width')['class'].unique().to_frame()
根据 class 分类, 计算 petal width 的最小值, 最大值, 以及间距
df.groupby('class')['petal width'].agg([('max', np.max), ('min', np.min),('delta', lambda x: x.max() - x.min()),
])
数据
git clone https://github.com/KeithGalli/pandas.git
练习
import pandas as pd
data_dir = "/data_dir"
df = pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv')
df.shape
(800, 12)
df.head()
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
# 查看列的数据类型
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 795 to 799
Data columns (total 12 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 # 5 non-null int64 1 Name 5 non-null object2 Type 1 5 non-null object3 Type 2 5 non-null object4 HP 5 non-null int64 5 Attack 5 non-null int64 6 Defense 5 non-null int64 7 Sp. Atk 5 non-null int64 8 Sp. Def 5 non-null int64 9 Speed 5 non-null int64 10 Generation 5 non-null int64 11 Legendary 5 non-null bool
dtypes: bool(1), int64(8), object(3)
memory usage: 577.0+ bytes
df = pd.read_excel(f'{data_dir}/pandas/pokemon_data.xlsx')
df.shape
(800, 12)
df.head()
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
def read():global dfdf = pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv')
read()
df.columns
Index(['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk','Sp. Def', 'Speed', 'Generation', 'Legendary'],dtype='object')
df.Name
0 Bulbasaur
1 Ivysaur
2 Venusaur
3 VenusaurMega Venusaur
4 Charmander...
795 Diancie
796 DiancieMega Diancie
797 HoopaHoopa Confined
798 HoopaHoopa Unbound
799 Volcanion
Name: Name, Length: 800, dtype: object
df['Name']
0 Bulbasaur
1 Ivysaur
2 Venusaur
3 VenusaurMega Venusaur
4 Charmander...
795 Diancie
796 DiancieMega Diancie
797 HoopaHoopa Confined
798 HoopaHoopa Unbound
799 Volcanion
Name: Name, Length: 800, dtype: object
df[['Name', 'HP', 'Speed']]
Name | HP | Speed | |
---|---|---|---|
0 | Bulbasaur | 45 | 45 |
1 | Ivysaur | 60 | 60 |
2 | Venusaur | 80 | 80 |
3 | VenusaurMega Venusaur | 80 | 80 |
4 | Charmander | 39 | 65 |
... | ... | ... | ... |
795 | Diancie | 50 | 50 |
796 | DiancieMega Diancie | 50 | 110 |
797 | HoopaHoopa Confined | 80 | 70 |
798 | HoopaHoopa Unbound | 80 | 80 |
799 | Volcanion | 80 | 70 |
800 rows × 3 columns
# 第 [n] 行的数据
df.iloc[1]
# 2
Name Ivysaur
Type 1 Grass
Type 2 Poison
HP 60
Attack 62
Defense 63
Sp. Atk 80
Sp. Def 80
Speed 60
Generation 1
Legendary False
Name: 1, dtype: object
# 第 [n, m) 行的数据
df.iloc[0:4]
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
# 第 [n] 行, [m] 列的数据
df.iloc[2, 2]
'Grass'
# 行遍历
# for index, row in df.iterrows():
# # print(index, row)
# print(index, row['Name'])
0 Bulbasaur
1 Ivysaur
2 Venusaur
3 VenusaurMega Venusaur
4 Charmander
5 Charmeleon
...
796 DiancieMega Diancie
797 HoopaHoopa Confined
798 HoopaHoopa Unbound
799 Volcanion
# 根据字段过滤数据
df.loc[df['Type 1'] == "Grass"]
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
48 | 43 | Oddish | Grass | Poison | 45 | 50 | 55 | 75 | 65 | 30 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
718 | 650 | Chespin | Grass | NaN | 56 | 61 | 65 | 48 | 45 | 38 | 6 | False |
719 | 651 | Quilladin | Grass | NaN | 61 | 78 | 95 | 56 | 58 | 57 | 6 | False |
720 | 652 | Chesnaught | Grass | Fighting | 88 | 107 | 122 | 74 | 75 | 64 | 6 | False |
740 | 672 | Skiddo | Grass | NaN | 66 | 65 | 48 | 62 | 57 | 52 | 6 | False |
741 | 673 | Gogoat | Grass | NaN | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
70 rows × 12 columns
# 统计, 只对"数值"类型统计
# count:非空值的数量。
# mean:平均值。
# std:标准差。
# min:最小值。
# 25%:下四分位数。
# 50%:中位数(下四分位数和上四分位数的平均值)。
# 75%:上四分位数。
# max:最大值。
df.describe()
# | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | |
---|---|---|---|---|---|---|---|---|
count | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.000000 | 800.00000 |
mean | 362.813750 | 69.258750 | 79.001250 | 73.842500 | 72.820000 | 71.902500 | 68.277500 | 3.32375 |
std | 208.343798 | 25.534669 | 32.457366 | 31.183501 | 32.722294 | 27.828916 | 29.060474 | 1.66129 |
min | 1.000000 | 1.000000 | 5.000000 | 5.000000 | 10.000000 | 20.000000 | 5.000000 | 1.00000 |
25% | 184.750000 | 50.000000 | 55.000000 | 50.000000 | 49.750000 | 50.000000 | 45.000000 | 2.00000 |
50% | 364.500000 | 65.000000 | 75.000000 | 70.000000 | 65.000000 | 70.000000 | 65.000000 | 3.00000 |
75% | 539.250000 | 80.000000 | 100.000000 | 90.000000 | 95.000000 | 90.000000 | 90.000000 | 5.00000 |
max | 721.000000 | 255.000000 | 190.000000 | 230.000000 | 194.000000 | 230.000000 | 180.000000 | 6.00000 |
# 排序
df.sort_values('Name')
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
510 | 460 | Abomasnow | Grass | Ice | 90 | 92 | 75 | 92 | 85 | 60 | 4 | False |
511 | 460 | AbomasnowMega Abomasnow | Grass | Ice | 90 | 132 | 105 | 132 | 105 | 30 | 4 | False |
68 | 63 | Abra | Psychic | NaN | 25 | 20 | 15 | 105 | 55 | 90 | 1 | False |
392 | 359 | Absol | Dark | NaN | 65 | 130 | 60 | 75 | 60 | 75 | 3 | False |
393 | 359 | AbsolMega Absol | Dark | NaN | 65 | 150 | 60 | 115 | 60 | 115 | 3 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
632 | 571 | Zoroark | Dark | NaN | 60 | 105 | 60 | 120 | 60 | 105 | 5 | False |
631 | 570 | Zorua | Dark | NaN | 40 | 65 | 40 | 80 | 40 | 65 | 5 | False |
46 | 41 | Zubat | Poison | Flying | 40 | 45 | 35 | 30 | 40 | 55 | 1 | False |
695 | 634 | Zweilous | Dark | Dragon | 72 | 85 | 70 | 65 | 70 | 58 | 5 | False |
794 | 718 | Zygarde50% Forme | Dragon | Ground | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True |
800 rows × 12 columns
# 排序: 倒序
df.sort_values('Name', ascending=False)
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
794 | 718 | Zygarde50% Forme | Dragon | Ground | 108 | 100 | 121 | 81 | 95 | 95 | 6 | True |
695 | 634 | Zweilous | Dark | Dragon | 72 | 85 | 70 | 65 | 70 | 58 | 5 | False |
46 | 41 | Zubat | Poison | Flying | 40 | 45 | 35 | 30 | 40 | 55 | 1 | False |
631 | 570 | Zorua | Dark | NaN | 40 | 65 | 40 | 80 | 40 | 65 | 5 | False |
632 | 571 | Zoroark | Dark | NaN | 60 | 105 | 60 | 120 | 60 | 105 | 5 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
393 | 359 | AbsolMega Absol | Dark | NaN | 65 | 150 | 60 | 115 | 60 | 115 | 3 | False |
392 | 359 | Absol | Dark | NaN | 65 | 130 | 60 | 75 | 60 | 75 | 3 | False |
68 | 63 | Abra | Psychic | NaN | 25 | 20 | 15 | 105 | 55 | 90 | 1 | False |
511 | 460 | AbomasnowMega Abomasnow | Grass | Ice | 90 | 132 | 105 | 132 | 105 | 30 | 4 | False |
510 | 460 | Abomasnow | Grass | Ice | 90 | 92 | 75 | 92 | 85 | 60 | 4 | False |
800 rows × 12 columns
# 排序: 多字段排序
df.sort_values(['Type 1', 'HP'])
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
316 | 292 | Shedinja | Bug | Ghost | 1 | 90 | 45 | 30 | 30 | 40 | 3 | False |
230 | 213 | Shuckle | Bug | Rock | 20 | 10 | 230 | 10 | 230 | 5 | 2 | False |
462 | 415 | Combee | Bug | Flying | 30 | 30 | 42 | 30 | 42 | 70 | 4 | False |
603 | 543 | Venipede | Bug | Poison | 30 | 45 | 59 | 30 | 39 | 57 | 5 | False |
314 | 290 | Nincada | Bug | Ground | 31 | 45 | 90 | 30 | 30 | 40 | 3 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
142 | 131 | Lapras | Water | Ice | 130 | 85 | 80 | 85 | 95 | 60 | 1 | False |
145 | 134 | Vaporeon | Water | NaN | 130 | 65 | 60 | 110 | 95 | 65 | 1 | False |
350 | 320 | Wailmer | Water | NaN | 130 | 70 | 35 | 70 | 35 | 60 | 3 | False |
655 | 594 | Alomomola | Water | NaN | 165 | 75 | 80 | 40 | 45 | 65 | 5 | False |
351 | 321 | Wailord | Water | NaN | 170 | 90 | 45 | 90 | 45 | 60 | 3 | False |
800 rows × 12 columns
# 排序: 多字段排序, 指定每个字段的排序顺序
df.sort_values(['Type 1', 'HP'], ascending=[0, 1])
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
139 | 129 | Magikarp | Water | NaN | 20 | 10 | 55 | 15 | 20 | 80 | 1 | False |
381 | 349 | Feebas | Water | NaN | 20 | 15 | 20 | 10 | 55 | 80 | 3 | False |
97 | 90 | Shellder | Water | NaN | 30 | 65 | 100 | 45 | 25 | 40 | 1 | False |
106 | 98 | Krabby | Water | NaN | 30 | 105 | 90 | 25 | 25 | 50 | 1 | False |
125 | 116 | Horsea | Water | NaN | 30 | 40 | 70 | 70 | 25 | 60 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
232 | 214 | HeracrossMega Heracross | Bug | Fighting | 80 | 185 | 115 | 40 | 105 | 75 | 2 | False |
678 | 617 | Accelgor | Bug | NaN | 80 | 70 | 40 | 100 | 60 | 145 | 5 | False |
734 | 666 | Vivillon | Bug | Flying | 80 | 52 | 50 | 90 | 50 | 89 | 6 | False |
698 | 637 | Volcarona | Bug | Fire | 85 | 60 | 65 | 135 | 105 | 100 | 5 | False |
520 | 469 | Yanmega | Bug | Flying | 86 | 76 | 86 | 116 | 56 | 95 | 4 | False |
800 rows × 12 columns
df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']
df
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 318 |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 405 |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 525 |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 625 |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 309 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True | 600 |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True | 700 |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True | 600 |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True | 680 |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True | 600 |
800 rows × 13 columns
df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']
# df.drop 不会修改 df, 需要赋值. 如果删除的 列 不存在, 则会报错
df = df.drop(columns=['Total'])df.head()
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
# 当 axis=0 时,对每一列进行求和,返回一行结果。
# 当 axis=1 时,对每一行进行求和,返回一列结果。
df['Total'] = df.iloc[:, 4:10].sum(axis=1)df.head()
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Total | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | 318 |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | 405 |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | 525 |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | 625 |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | 309 |
read()df['Total'] = df.iloc[:, 4:10].sum(axis=1)# 输出指定列
df = df[['Total', 'HP', 'Defense']]df.head(5)
Total | HP | Defense | |
---|---|---|---|
0 | 318 | 45 | 49 |
1 | 405 | 60 | 63 |
2 | 525 | 80 | 83 |
3 | 625 | 80 | 123 |
4 | 309 | 39 | 43 |
read()df['Total'] = df.iloc[:, 4:10].sum(axis=1)cols = list(df.columns)
print(cols)# 输出指定列
df = df[cols[0:4] + [cols[-1]] + cols[4:-1]]df.head(5)
['#', 'Name', 'Type 1', 'Type 2', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary', 'Total']
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
# 写文件
# df.to_csv('/tmp/modified.csv', index=False)
# df.to_excel('/tmp/modified.xlsx', index=False)
# 多个字段过滤
df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison')]
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
48 | 43 | Oddish | Grass | Poison | 320 | 45 | 50 | 55 | 75 | 65 | 30 | 1 | False |
49 | 44 | Gloom | Grass | Poison | 395 | 60 | 65 | 70 | 85 | 75 | 40 | 1 | False |
50 | 45 | Vileplume | Grass | Poison | 490 | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
75 | 69 | Bellsprout | Grass | Poison | 300 | 50 | 75 | 35 | 70 | 30 | 40 | 1 | False |
76 | 70 | Weepinbell | Grass | Poison | 390 | 65 | 90 | 50 | 85 | 45 | 55 | 1 | False |
77 | 71 | Victreebel | Grass | Poison | 490 | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
344 | 315 | Roselia | Grass | Poison | 400 | 50 | 60 | 45 | 100 | 80 | 65 | 3 | False |
451 | 406 | Budew | Grass | Poison | 280 | 40 | 30 | 35 | 50 | 70 | 55 | 4 | False |
452 | 407 | Roserade | Grass | Poison | 515 | 60 | 70 | 65 | 125 | 105 | 90 | 4 | False |
651 | 590 | Foongus | Grass | Poison | 294 | 69 | 55 | 45 | 55 | 55 | 15 | 5 | False |
652 | 591 | Amoonguss | Grass | Poison | 464 | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
# 多个字段过滤
df.loc[(df['Type 1'] == 'Grass') | (df['Type 2'] == 'Poison')]
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
16 | 13 | Weedle | Bug | Poison | 195 | 40 | 35 | 30 | 20 | 20 | 50 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
718 | 650 | Chespin | Grass | NaN | 313 | 56 | 61 | 65 | 48 | 45 | 38 | 6 | False |
719 | 651 | Quilladin | Grass | NaN | 405 | 61 | 78 | 95 | 56 | 58 | 57 | 6 | False |
720 | 652 | Chesnaught | Grass | Fighting | 530 | 88 | 107 | 122 | 74 | 75 | 64 | 6 | False |
740 | 672 | Skiddo | Grass | NaN | 350 | 66 | 65 | 48 | 62 | 57 | 52 | 6 | False |
741 | 673 | Gogoat | Grass | NaN | 531 | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
89 rows × 13 columns
new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]# new_df.to_csv('/tmp/filtered.csv', index=False)# "index" 是 df 的 index. 会重新为 new_df 生成index
new_df = new_df.reset_index()new_df
index | # | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
1 | 3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
2 | 50 | 45 | Vileplume | Grass | Poison | 490 | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
3 | 77 | 71 | Victreebel | Grass | Poison | 490 | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
4 | 652 | 591 | Amoonguss | Grass | Poison | 464 | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]# 会重新为 new_df 生成index, 删除 df 保留下来的 index 列
new_df = new_df.reset_index(drop=True)new_df
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
1 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
2 | 45 | Vileplume | Grass | Poison | 490 | 75 | 80 | 85 | 110 | 90 | 50 | 1 | False |
3 | 71 | Victreebel | Grass | Poison | 490 | 80 | 105 | 65 | 100 | 70 | 70 | 1 | False |
4 | 591 | Amoonguss | Grass | Poison | 464 | 114 | 85 | 70 | 85 | 80 | 30 | 5 | False |
df.loc[df['Name'].str.contains('Mega')]
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
7 | 6 | CharizardMega Charizard X | Fire | Dragon | 634 | 78 | 130 | 111 | 130 | 85 | 100 | 1 | False |
8 | 6 | CharizardMega Charizard Y | Fire | Flying | 634 | 78 | 104 | 78 | 159 | 115 | 100 | 1 | False |
12 | 9 | BlastoiseMega Blastoise | Water | NaN | 630 | 79 | 103 | 120 | 135 | 115 | 78 | 1 | False |
19 | 15 | BeedrillMega Beedrill | Bug | Poison | 495 | 65 | 150 | 40 | 15 | 80 | 145 | 1 | False |
23 | 18 | PidgeotMega Pidgeot | Normal | Flying | 579 | 83 | 80 | 80 | 135 | 80 | 121 | 1 | False |
71 | 65 | AlakazamMega Alakazam | Psychic | NaN | 590 | 55 | 50 | 65 | 175 | 95 | 150 | 1 | False |
87 | 80 | SlowbroMega Slowbro | Water | Psychic | 590 | 95 | 75 | 180 | 130 | 80 | 30 | 1 | False |
102 | 94 | GengarMega Gengar | Ghost | Poison | 600 | 60 | 65 | 80 | 170 | 95 | 130 | 1 | False |
124 | 115 | KangaskhanMega Kangaskhan | Normal | NaN | 590 | 105 | 125 | 100 | 60 | 100 | 100 | 1 | False |
137 | 127 | PinsirMega Pinsir | Bug | Flying | 600 | 65 | 155 | 120 | 65 | 90 | 105 | 1 | False |
141 | 130 | GyaradosMega Gyarados | Water | Dark | 640 | 95 | 155 | 109 | 70 | 130 | 81 | 1 | False |
154 | 142 | AerodactylMega Aerodactyl | Rock | Flying | 615 | 80 | 135 | 85 | 70 | 95 | 150 | 1 | False |
163 | 150 | MewtwoMega Mewtwo X | Psychic | Fighting | 780 | 106 | 190 | 100 | 154 | 100 | 130 | 1 | True |
164 | 150 | MewtwoMega Mewtwo Y | Psychic | NaN | 780 | 106 | 150 | 70 | 194 | 120 | 140 | 1 | True |
168 | 154 | Meganium | Grass | NaN | 525 | 80 | 82 | 100 | 83 | 100 | 80 | 2 | False |
196 | 181 | AmpharosMega Ampharos | Electric | Dragon | 610 | 90 | 95 | 105 | 165 | 110 | 45 | 2 | False |
224 | 208 | SteelixMega Steelix | Steel | Ground | 610 | 75 | 125 | 230 | 55 | 95 | 30 | 2 | False |
229 | 212 | ScizorMega Scizor | Bug | Steel | 600 | 70 | 150 | 140 | 65 | 100 | 75 | 2 | False |
232 | 214 | HeracrossMega Heracross | Bug | Fighting | 600 | 80 | 185 | 115 | 40 | 105 | 75 | 2 | False |
248 | 229 | HoundoomMega Houndoom | Dark | Fire | 600 | 75 | 90 | 90 | 140 | 90 | 115 | 2 | False |
268 | 248 | TyranitarMega Tyranitar | Rock | Dark | 700 | 100 | 164 | 150 | 95 | 120 | 71 | 2 | False |
275 | 254 | SceptileMega Sceptile | Grass | Dragon | 630 | 70 | 110 | 75 | 145 | 85 | 145 | 3 | False |
279 | 257 | BlazikenMega Blaziken | Fire | Fighting | 630 | 80 | 160 | 80 | 130 | 80 | 100 | 3 | False |
283 | 260 | SwampertMega Swampert | Water | Ground | 635 | 100 | 150 | 110 | 95 | 110 | 70 | 3 | False |
306 | 282 | GardevoirMega Gardevoir | Psychic | Fairy | 618 | 68 | 85 | 65 | 165 | 135 | 100 | 3 | False |
327 | 302 | SableyeMega Sableye | Dark | Ghost | 480 | 50 | 85 | 125 | 85 | 115 | 20 | 3 | False |
329 | 303 | MawileMega Mawile | Steel | Fairy | 480 | 50 | 105 | 125 | 55 | 95 | 50 | 3 | False |
333 | 306 | AggronMega Aggron | Steel | NaN | 630 | 70 | 140 | 230 | 60 | 80 | 50 | 3 | False |
336 | 308 | MedichamMega Medicham | Fighting | Psychic | 510 | 60 | 100 | 85 | 80 | 85 | 100 | 3 | False |
339 | 310 | ManectricMega Manectric | Electric | NaN | 575 | 70 | 75 | 80 | 135 | 80 | 135 | 3 | False |
349 | 319 | SharpedoMega Sharpedo | Water | Dark | 560 | 70 | 140 | 70 | 110 | 65 | 105 | 3 | False |
354 | 323 | CameruptMega Camerupt | Fire | Ground | 560 | 70 | 120 | 100 | 145 | 105 | 20 | 3 | False |
366 | 334 | AltariaMega Altaria | Dragon | Fairy | 590 | 75 | 110 | 110 | 110 | 105 | 80 | 3 | False |
387 | 354 | BanetteMega Banette | Ghost | NaN | 555 | 64 | 165 | 75 | 93 | 83 | 75 | 3 | False |
393 | 359 | AbsolMega Absol | Dark | NaN | 565 | 65 | 150 | 60 | 115 | 60 | 115 | 3 | False |
397 | 362 | GlalieMega Glalie | Ice | NaN | 580 | 80 | 120 | 80 | 120 | 80 | 100 | 3 | False |
409 | 373 | SalamenceMega Salamence | Dragon | Flying | 700 | 95 | 145 | 130 | 120 | 90 | 120 | 3 | False |
413 | 376 | MetagrossMega Metagross | Steel | Psychic | 700 | 80 | 145 | 150 | 105 | 110 | 110 | 3 | False |
418 | 380 | LatiasMega Latias | Dragon | Psychic | 700 | 80 | 100 | 120 | 140 | 150 | 110 | 3 | True |
420 | 381 | LatiosMega Latios | Dragon | Psychic | 700 | 80 | 130 | 100 | 160 | 120 | 110 | 3 | True |
426 | 384 | RayquazaMega Rayquaza | Dragon | Flying | 780 | 105 | 180 | 100 | 180 | 100 | 115 | 3 | True |
476 | 428 | LopunnyMega Lopunny | Normal | Fighting | 580 | 65 | 136 | 94 | 54 | 96 | 135 | 4 | False |
494 | 445 | GarchompMega Garchomp | Dragon | Ground | 700 | 108 | 170 | 115 | 120 | 95 | 92 | 4 | False |
498 | 448 | LucarioMega Lucario | Fighting | Steel | 625 | 70 | 145 | 88 | 140 | 70 | 112 | 4 | False |
511 | 460 | AbomasnowMega Abomasnow | Grass | Ice | 594 | 90 | 132 | 105 | 132 | 105 | 30 | 4 | False |
527 | 475 | GalladeMega Gallade | Psychic | Fighting | 618 | 68 | 165 | 95 | 65 | 115 | 110 | 4 | False |
591 | 531 | AudinoMega Audino | Normal | Fairy | 545 | 103 | 60 | 126 | 80 | 126 | 50 | 5 | False |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 700 | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True |
import re
df.loc[df['Type 1'].str.contains('Fire|grass', regex=True, flags=re.I)]
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 318 | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 405 | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 525 | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 625 | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 309 | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
735 | 667 | Litleo | Fire | Normal | 369 | 62 | 50 | 58 | 73 | 54 | 72 | 6 | False |
736 | 668 | Pyroar | Fire | Normal | 507 | 86 | 68 | 72 | 109 | 66 | 106 | 6 | False |
740 | 672 | Skiddo | Grass | NaN | 350 | 66 | 65 | 48 | 62 | 57 | 52 | 6 | False |
741 | 673 | Gogoat | Grass | NaN | 531 | 123 | 100 | 62 | 97 | 81 | 68 | 6 | False |
799 | 721 | Volcanion | Fire | Water | 600 | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
122 rows × 13 columns
import re
df.loc[df['Name'].str.contains('^pi[a-z]', regex=True, flags=re.I)]
# | Name | Type 1 | Type 2 | Total | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | 16 | Pidgey | Normal | Flying | 251 | 40 | 45 | 40 | 35 | 35 | 56 | 1 | False |
21 | 17 | Pidgeotto | Normal | Flying | 349 | 63 | 60 | 55 | 50 | 50 | 71 | 1 | False |
22 | 18 | Pidgeot | Normal | Flying | 479 | 83 | 80 | 75 | 70 | 70 | 101 | 1 | False |
23 | 18 | PidgeotMega Pidgeot | Normal | Flying | 579 | 83 | 80 | 80 | 135 | 80 | 121 | 1 | False |
30 | 25 | Pikachu | Electric | NaN | 320 | 35 | 55 | 40 | 50 | 50 | 90 | 1 | False |
136 | 127 | Pinsir | Bug | NaN | 500 | 65 | 125 | 100 | 55 | 70 | 85 | 1 | False |
137 | 127 | PinsirMega Pinsir | Bug | Flying | 600 | 65 | 155 | 120 | 65 | 90 | 105 | 1 | False |
186 | 172 | Pichu | Electric | NaN | 205 | 20 | 40 | 15 | 35 | 35 | 60 | 2 | False |
219 | 204 | Pineco | Bug | NaN | 290 | 50 | 65 | 90 | 35 | 35 | 15 | 2 | False |
239 | 221 | Piloswine | Ice | Ground | 450 | 100 | 100 | 80 | 60 | 60 | 50 | 2 | False |
438 | 393 | Piplup | Water | NaN | 314 | 53 | 51 | 53 | 61 | 56 | 40 | 4 | False |
558 | 499 | Pignite | Fire | Fighting | 418 | 90 | 93 | 55 | 70 | 55 | 55 | 5 | False |
578 | 519 | Pidove | Normal | Flying | 264 | 50 | 55 | 50 | 36 | 30 | 43 | 5 | False |
read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', 'Type 1'] = 'Flamer'df
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Flamer | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Flamer | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Flamer | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Flamer | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
800 rows × 12 columns
read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', ['Type 1', 'Lendary']] = 'TEST VALUE'df
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Lendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | TEST VALUE | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | TEST VALUE |
1 | 2 | Ivysaur | TEST VALUE | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | TEST VALUE |
2 | 3 | Venusaur | TEST VALUE | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | TEST VALUE |
3 | 3 | VenusaurMega Venusaur | TEST VALUE | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | TEST VALUE |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True | NaN |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True | NaN |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True | NaN |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True | NaN |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True | NaN |
800 rows × 13 columns
read()
# 第一个参数满足条件的, 设置第二个参数的列为后面的值
df.loc[df['Type 1'] == 'Grass', ['Type 1', 'Lendary']] = ['TEST 1', 'TEST 2']df
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | Lendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | TEST 1 | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False | TEST 2 |
1 | 2 | Ivysaur | TEST 1 | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False | TEST 2 |
2 | 3 | Venusaur | TEST 1 | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False | TEST 2 |
3 | 3 | VenusaurMega Venusaur | TEST 1 | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False | TEST 2 |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True | NaN |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True | NaN |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True | NaN |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True | NaN |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True | NaN |
800 rows × 13 columns
read()# 分组 求平均数
df.groupby(['Type 1']).mean(numeric_only=True).sort_values('Defense', ascending=False)
# | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|
Type 1 | |||||||||
Steel | 442.851852 | 65.222222 | 92.703704 | 126.370370 | 67.518519 | 80.629630 | 55.259259 | 3.851852 | 0.148148 |
Rock | 392.727273 | 65.363636 | 92.863636 | 100.795455 | 63.340909 | 75.477273 | 55.909091 | 3.454545 | 0.090909 |
Dragon | 474.375000 | 83.312500 | 112.125000 | 86.375000 | 96.843750 | 88.843750 | 83.031250 | 3.875000 | 0.375000 |
Ground | 356.281250 | 73.781250 | 95.750000 | 84.843750 | 56.468750 | 62.750000 | 63.906250 | 3.156250 | 0.125000 |
Ghost | 486.500000 | 64.437500 | 73.781250 | 81.187500 | 79.343750 | 76.468750 | 64.343750 | 4.187500 | 0.062500 |
Water | 303.089286 | 72.062500 | 74.151786 | 72.946429 | 74.812500 | 70.517857 | 65.964286 | 2.857143 | 0.035714 |
Ice | 423.541667 | 72.000000 | 72.750000 | 71.416667 | 77.541667 | 76.291667 | 63.458333 | 3.541667 | 0.083333 |
Grass | 344.871429 | 67.271429 | 73.214286 | 70.800000 | 77.500000 | 70.428571 | 61.928571 | 3.357143 | 0.042857 |
Bug | 334.492754 | 56.884058 | 70.971014 | 70.724638 | 53.869565 | 64.797101 | 61.681159 | 3.217391 | 0.000000 |
Dark | 461.354839 | 66.806452 | 88.387097 | 70.225806 | 74.645161 | 69.516129 | 76.161290 | 4.032258 | 0.064516 |
Poison | 251.785714 | 67.250000 | 74.678571 | 68.821429 | 60.428571 | 64.392857 | 63.571429 | 2.535714 | 0.000000 |
Fire | 327.403846 | 69.903846 | 84.769231 | 67.769231 | 88.980769 | 72.211538 | 74.442308 | 3.211538 | 0.096154 |
Psychic | 380.807018 | 70.631579 | 71.456140 | 67.684211 | 98.403509 | 86.280702 | 81.491228 | 3.385965 | 0.245614 |
Electric | 363.500000 | 59.795455 | 69.090909 | 66.295455 | 90.022727 | 73.704545 | 84.500000 | 3.272727 | 0.090909 |
Flying | 677.750000 | 70.750000 | 78.750000 | 66.250000 | 94.250000 | 72.500000 | 102.500000 | 5.500000 | 0.500000 |
Fighting | 363.851852 | 69.851852 | 96.777778 | 65.925926 | 53.111111 | 64.703704 | 66.074074 | 3.370370 | 0.000000 |
Fairy | 449.529412 | 74.117647 | 61.529412 | 65.705882 | 78.529412 | 84.705882 | 48.588235 | 4.117647 | 0.058824 |
Normal | 319.173469 | 77.275510 | 73.469388 | 59.846939 | 55.816327 | 63.724490 | 71.551020 | 3.051020 | 0.020408 |
read()# 分组 求和
df.groupby(['Type 1']).sum(numeric_only=True).sort_values('HP', ascending=False)
# | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|
Type 1 | |||||||||
Water | 33946 | 8071 | 8305 | 8170 | 8379 | 7898 | 7388 | 320 | 4 |
Normal | 31279 | 7573 | 7200 | 5865 | 5470 | 6245 | 7012 | 299 | 2 |
Grass | 24141 | 4709 | 5125 | 4956 | 5425 | 4930 | 4335 | 235 | 3 |
Psychic | 21706 | 4026 | 4073 | 3858 | 5609 | 4918 | 4645 | 193 | 14 |
Bug | 23080 | 3925 | 4897 | 4880 | 3717 | 4471 | 4256 | 222 | 0 |
Fire | 17025 | 3635 | 4408 | 3524 | 4627 | 3755 | 3871 | 167 | 5 |
Rock | 17280 | 2876 | 4086 | 4435 | 2787 | 3321 | 2460 | 152 | 4 |
Dragon | 15180 | 2666 | 3588 | 2764 | 3099 | 2843 | 2657 | 124 | 12 |
Electric | 15994 | 2631 | 3040 | 2917 | 3961 | 3243 | 3718 | 144 | 4 |
Ground | 11401 | 2361 | 3064 | 2715 | 1807 | 2008 | 2045 | 101 | 4 |
Dark | 14302 | 2071 | 2740 | 2177 | 2314 | 2155 | 2361 | 125 | 2 |
Ghost | 15568 | 2062 | 2361 | 2598 | 2539 | 2447 | 2059 | 134 | 2 |
Fighting | 9824 | 1886 | 2613 | 1780 | 1434 | 1747 | 1784 | 91 | 0 |
Poison | 7050 | 1883 | 2091 | 1927 | 1692 | 1803 | 1780 | 71 | 0 |
Steel | 11957 | 1761 | 2503 | 3412 | 1823 | 2177 | 1492 | 104 | 4 |
Ice | 10165 | 1728 | 1746 | 1714 | 1861 | 1831 | 1523 | 85 | 2 |
Fairy | 7642 | 1260 | 1046 | 1117 | 1335 | 1440 | 826 | 70 | 1 |
Flying | 2711 | 283 | 315 | 265 | 377 | 290 | 410 | 22 | 2 |
read()# 分组 求数量
df.groupby(['Type 1']).count().sort_values('HP', ascending=False)
# | Name | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|
Type 1 | |||||||||||
Water | 112 | 112 | 53 | 112 | 112 | 112 | 112 | 112 | 112 | 112 | 112 |
Normal | 98 | 98 | 37 | 98 | 98 | 98 | 98 | 98 | 98 | 98 | 98 |
Grass | 70 | 70 | 37 | 70 | 70 | 70 | 70 | 70 | 70 | 70 | 70 |
Bug | 69 | 69 | 52 | 69 | 69 | 69 | 69 | 69 | 69 | 69 | 69 |
Psychic | 57 | 57 | 19 | 57 | 57 | 57 | 57 | 57 | 57 | 57 | 57 |
Fire | 52 | 52 | 24 | 52 | 52 | 52 | 52 | 52 | 52 | 52 | 52 |
Electric | 44 | 44 | 17 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 |
Rock | 44 | 44 | 35 | 44 | 44 | 44 | 44 | 44 | 44 | 44 | 44 |
Ghost | 32 | 32 | 22 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
Ground | 32 | 32 | 19 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
Dragon | 32 | 32 | 21 | 32 | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
Dark | 31 | 31 | 21 | 31 | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
Poison | 28 | 28 | 13 | 28 | 28 | 28 | 28 | 28 | 28 | 28 | 28 |
Fighting | 27 | 27 | 7 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 |
Steel | 27 | 27 | 22 | 27 | 27 | 27 | 27 | 27 | 27 | 27 | 27 |
Ice | 24 | 24 | 11 | 24 | 24 | 24 | 24 | 24 | 24 | 24 | 24 |
Fairy | 17 | 17 | 2 | 17 | 17 | 17 | 17 | 17 | 17 | 17 | 17 |
Flying | 4 | 4 | 2 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
read()# 每行都设置 count = 1
df['count'] = 1# 分组 求数量
df.groupby(['Type 1']).count()['count']
Type 1
Bug 69
Dark 31
Dragon 32
Electric 44
Fairy 17
Fighting 27
Fire 52
Flying 4
Ghost 32
Grass 70
Ground 32
Ice 24
Normal 98
Poison 28
Psychic 57
Rock 44
Steel 27
Water 112
Name: count, dtype: int64
read()# 每行都设置 count = 1
df['count'] = 1# 分组 求数量
df.groupby(['Type 1', 'Type 2']).count()['count']
Type 1 Type 2
Bug Electric 2Fighting 2Fire 2Flying 14Ghost 1..
Water Ice 3Poison 3Psychic 5Rock 4Steel 1
Name: count, Length: 136, dtype: int64
# 读取大文件
# 分批读取数据, chunksize 行数
for df in pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv', chunksize=5):
# print('CHUNK DF:::')
# print(df['Name']);
# 读取大文件
# 分批读取数据, chunksize 行数new_df = pd.DataFrame()
for df in pd.read_csv(f'{data_dir}/pandas/pokemon_data.csv', chunksize=5):new_df = pd.concat([new_df, df])
# results = df.groupby(['Type 1']).count()
# new_df = pd.concat([new_df, results])new_df
# | Name | Type 1 | Type 2 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | Generation | Legendary | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Bulbasaur | Grass | Poison | 45 | 49 | 49 | 65 | 65 | 45 | 1 | False |
1 | 2 | Ivysaur | Grass | Poison | 60 | 62 | 63 | 80 | 80 | 60 | 1 | False |
2 | 3 | Venusaur | Grass | Poison | 80 | 82 | 83 | 100 | 100 | 80 | 1 | False |
3 | 3 | VenusaurMega Venusaur | Grass | Poison | 80 | 100 | 123 | 122 | 120 | 80 | 1 | False |
4 | 4 | Charmander | Fire | NaN | 39 | 52 | 43 | 60 | 50 | 65 | 1 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
795 | 719 | Diancie | Rock | Fairy | 50 | 100 | 150 | 100 | 150 | 50 | 6 | True |
796 | 719 | DiancieMega Diancie | Rock | Fairy | 50 | 160 | 110 | 160 | 110 | 110 | 6 | True |
797 | 720 | HoopaHoopa Confined | Psychic | Ghost | 80 | 110 | 60 | 150 | 130 | 70 | 6 | True |
798 | 720 | HoopaHoopa Unbound | Psychic | Dark | 80 | 160 | 60 | 170 | 130 | 80 | 6 | True |
799 | 721 | Volcanion | Fire | Water | 80 | 110 | 120 | 130 | 90 | 70 | 6 | True |
800 rows × 12 columns
read()# 分组 求平均数
df.groupby(['Type 1'])['Defense'].mean().plot()
read()# 分组 求平均数
df.groupby(['Type 1'])['Defense'].mean().plot(kind='bar')
这篇关于机器学习 - Pandas 练习, 常见功能查阅的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!