本文主要是介绍【深耕 Python】Data Science with Python 数据科学(11)pandas 数据处理(二),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
写在前面
关于数据科学环境的建立,可以参考我的博客:
【深耕 Python】Data Science with Python 数据科学(1)环境搭建
往期数据科学博文:
【深耕 Python】Data Science with Python 数据科学(2)jupyter-lab和numpy数组
【深耕 Python】Data Science with Python 数据科学(3)Numpy 常量、函数和线性空间
【深耕 Python】Data Science with Python 数据科学(4)(书337页)练习题及解答
【深耕 Python】Data Science with Python 数据科学(5)Matplotlib可视化(1)
【深耕 Python】Data Science with Python 数据科学(6)Matplotlib可视化(2)
【深耕 Python】Data Science with Python 数据科学(7)书352页练习题
【深耕 Python】Data Science with Python 数据科学(8)pandas数据结构:Series和DataFrame
【深耕 Python】Data Science with Python 数据科学(9)书361页练习题
【深耕 Python】Data Science with Python 数据科学(10)pandas 数据处理(一)
代码说明: 由于实机运行的原因,可能省略了某些导入(import)语句。
本期,继续对诺奖获得者(laureates.csv)进行分析。
Python Code Snippet 1
1957年,Chen Ning Yang和Tsung-Dao Lee的获奖信息。
print(nobel.loc[nobel["surname"].str.contains("Yang", na=False)])
print(nobel.loc[nobel["surname"].str.contains("Lee", na=False)])
# 杨振宁id firstname surname born died bornCountry bornCountryCode \
68 68 Chen Ning Yang 1922-09-22 0000-00-00 China CN bornCity diedCountry diedCountryCode diedCity gender year category \
68 Hofei Anhwei NaN NaN NaN male 1957 physics overallMotivation share \
68 NaN 2 motivation \
68 "for their penetrating investigation of the so... name city country
68 Institute for Advanced Study Princeton NJ USA
# 李政道id firstname surname born died bornCountry \
69 69 Tsung-Dao Lee 1926-11-24 0000-00-00 China
148 149 David M. Lee 1931-01-20 0000-00-00 USA
263 265 Yuan T. Lee 1936-11-19 0000-00-00 Taiwan bornCountryCode bornCity diedCountry diedCountryCode diedCity gender \
69 CN Shanghai NaN NaN NaN male
148 US Rye NY NaN NaN NaN male
263 TW Hsinchu NaN NaN NaN male year category overallMotivation share \
69 1957 physics NaN 2
148 1996 physics NaN 3
263 1986 chemistry NaN 3 motivation \
69 "for their penetrating investigation of the so...
148 "for their discovery of superfluidity in heliu...
263 "for their contributions concerning the dynami... name city country
69 Columbia University New York NY USA
148 Cornell University Ithaca NY USA
263 University of California Berkeley CA USA
Python Code Snippet 2
理查德·费曼(Richard Feynman)和居里夫妇(Marie Curie, Pierre Curie)的获奖信息:
print(nobel.loc[nobel["surname"].str.contains("Feynman", na=False)])
print(len(nobel.loc[nobel["surname"].str.contains("Feynman", na=False)]))
curies = nobel.loc[nobel["surname"].str.contains("Curie", na=False)]
print(curies)
print(curies[["firstname", "surname"]])
# 理查德费曼获奖信息id firstname surname born died bornCountry \
86 86 Richard P. Feynman 1918-05-11 1988-02-15 USA bornCountryCode bornCity diedCountry diedCountryCode diedCity \
86 US New York NY USA US Los Angeles CA gender year category overallMotivation share \
86 male 1965 physics NaN 3 motivation \
86 "for their fundamental work in quantum electro... name city country
86 California Institute of Technology (Caltech) Pasadena CA USA# 仅有1个姓氏为费曼的获奖者
1# 居里夫妇id firstname surname born died \
4 5 Pierre Curie 1859-05-15 1906-04-19
5 6 Marie Curie 1867-11-07 1934-07-04
6 6 Marie Curie 1867-11-07 1934-07-04
191 194 Irène Joliot-Curie 1897-09-12 1956-03-17 bornCountry bornCountryCode bornCity diedCountry \
4 France FR Paris France
5 Russian Empire (now Poland) PL Warsaw France
6 Russian Empire (now Poland) PL Warsaw France
191 France FR Paris France diedCountryCode diedCity gender year category overallMotivation \
4 FR Paris male 1903 physics NaN
5 FR Sallanches female 1903 physics NaN
6 FR Sallanches female 1911 chemistry NaN
191 FR Paris female 1935 chemistry NaN share motivation \
4 4 "in recognition of the extraordinary services ...
5 4 "in recognition of the extraordinary services ...
6 1 "in recognition of her services to the advance...
191 2 "in recognition of their synthesis of new radi... name city country
4 École municipale de physique et de chimie indu... Paris France
5 NaN NaN NaN
6 Sorbonne University Paris France
191 Institut du Radium Paris France# 仅显示姓氏和名字firstname surname
4 Pierre Curie
5 Marie Curie
6 Marie Curie
191 Irène Joliot-Curie
Python Code Snippet 3
学者的获奖次数统计。
print(nobel.groupby(["firstname", "surname"]).size())
print(nobel.groupby(["firstname", "surname"]).size().sort_values())
laureates = nobel.groupby(["id", "firstname", "surname"])
sizes = laureates.size()
print(sizes[sizes > 1])
# 第三栏中的数字为此人的获奖次数
firstname surname
A. Michael Spence 1
Aage N. Bohr 1
Aaron Ciechanover 1Klug 1
Abdulrazak Gurnah 1..
Youyou Tu 1
Yuan T. Lee 1
Yves Chauvin 1
Zhores Alferov 1
Élie Ducommun 1
Length: 941, dtype: int64 # 共941位获奖者,数据类型为int64# 将获奖次数升序排序后输出
firstname surname
A. Michael Spence 1
Nicolay G. Basov 1
Niels Bohr 1
Niels K. Jerne 1
Niels Ryberg Finsen 1..
Élie Ducommun 1
Linus Pauling 2
John Bardeen 2
Frederick Sanger 2
Marie Curie 2
Length: 941, dtype: int64# 仅输出获奖次数大于1的学者(该数据截止至2021年,2022年新产生了一位双料得主K. Barry Sharpless)
id firstname surname
6 Marie Curie 2 # 物理学奖和化学奖
66 John Bardeen 2 # 物理学奖*2
217 Linus Pauling 2 # 化学奖和和平奖
222 Frederick Sanger 2 # 化学奖*2
dtype: int64
参考文献 Reference
《Learn Enough Python to be Dangerous——Software Development, Flask Web Apps, and Beginning Data Science with Python》, Michael Hartl, Boston, Pearson, 2023.
这篇关于【深耕 Python】Data Science with Python 数据科学(11)pandas 数据处理(二)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!