本文主要是介绍【Kaggle数据分析实战练习】World University Rankings,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
数据集介绍
本次数据分析的数据集来自Kaggle的World University Rankings的cwurData.csv
。数据集共包含2201行14列(含标题行),官方给出的每列的描述如下:
- world_rank: world rank for university
- institution: name of university
- country: country of each university
- national_rank: rank of university within its country
- quality_of_education: rank for quality of education
- alumni_employment: rank for alumni employment
- quality_of_faculty: rank for quality of faculty
- publications: rank for publications
- influence: rank for influence
- citations: number of students at the university
- broad_impact: rank for broad impact (only available for 2014 and 2015)
- patents: rank for patents
- score: total score, used for determining world rank
- year: year of ranking (2012 to 2015)
初步探索数据集
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
df = pd.read_csv('./data/cwurData.csv', encoding='utf-8')
df.describe()
world_rank | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 2200.000000 | 2200.000000 | 2200.000000 | 2200.000000 | 2200.000000 | 2200.000000 | 2200.000000 | 2200.000000 | 2000.000000 | 2200.000000 | 2200.000000 | 2200.000000 |
mean | 459.590909 | 40.278182 | 275.100455 | 357.116818 | 178.888182 | 459.908636 | 459.797727 | 413.417273 | 496.699500 | 433.346364 | 47.798395 | 2014.318182 |
std | 304.320363 | 51.740870 | 121.935100 | 186.779252 | 64.050885 | 303.760352 | 303.331822 | 264.366549 | 286.919755 | 273.996525 | 7.760806 | 0.762130 |
min | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 43.360000 | 2012.000000 |
25% | 175.750000 | 6.000000 | 175.750000 | 175.750000 | 175.750000 | 175.750000 | 175.750000 | 161.000000 | 250.500000 | 170.750000 | 44.460000 | 2014.000000 |
50% | 450.500000 | 21.000000 | 355.000000 | 450.500000 | 210.000000 | 450.500000 | 450.500000 | 406.000000 | 496.000000 | 426.000000 | 45.100000 | 2014.000000 |
75% | 725.250000 | 49.000000 | 367.000000 | 478.000000 | 218.000000 | 725.000000 | 725.250000 | 645.000000 | 741.000000 | 714.250000 | 47.545000 | 2015.000000 |
max | 1000.000000 | 229.000000 | 367.000000 | 567.000000 | 218.000000 | 1000.000000 | 991.000000 | 812.000000 | 1000.000000 | 871.000000 | 100.000000 | 2015.000000 |
df.head()
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harvard University | USA | 1 | 7 | 9 | 1 | 1 | 1 | 1 | NaN | 5 | 100.00 | 2012 |
1 | 2 | Massachusetts Institute of Technology | USA | 2 | 9 | 17 | 3 | 12 | 4 | 4 | NaN | 1 | 91.67 | 2012 |
2 | 3 | Stanford University | USA | 3 | 17 | 11 | 5 | 4 | 2 | 2 | NaN | 15 | 89.50 | 2012 |
3 | 4 | University of Cambridge | United Kingdom | 1 | 10 | 24 | 4 | 16 | 16 | 11 | NaN | 50 | 86.17 | 2012 |
4 | 5 | California Institute of Technology | USA | 4 | 2 | 29 | 7 | 37 | 22 | 22 | NaN | 18 | 85.21 | 2012 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 14 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 world_rank 2200 non-null int64 1 institution 2200 non-null object 2 country 2200 non-null object 3 national_rank 2200 non-null int64 4 quality_of_education 2200 non-null int64 5 alumni_employment 2200 non-null int64 6 quality_of_faculty 2200 non-null int64 7 publications 2200 non-null int64 8 influence 2200 non-null int64 9 citations 2200 non-null int64 10 broad_impact 2000 non-null float6411 patents 2200 non-null int64 12 score 2200 non-null float6413 year 2200 non-null int64
dtypes: float64(2), int64(10), object(2)
memory usage: 240.8+ KB
set(df['year'])
{2012, 2013, 2014, 2015}
我们可以看到,这份数据集只有broad_impact
一列有缺失值,但缺失不是很多;year
一列的范围是2012-2015
,接下来我们可以依据年份分别对这些数据进行分析。
2012年数据分析
首先筛选出year
为2012
的数据data_2012
。
data_2012 = df[df['year'] == 2012].drop('year', axis=1)
data_2012
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harvard University | USA | 1 | 7 | 9 | 1 | 1 | 1 | 1 | NaN | 5 | 100.00 |
1 | 2 | Massachusetts Institute of Technology | USA | 2 | 9 | 17 | 3 | 12 | 4 | 4 | NaN | 1 | 91.67 |
2 | 3 | Stanford University | USA | 3 | 17 | 11 | 5 | 4 | 2 | 2 | NaN | 15 | 89.50 |
3 | 4 | University of Cambridge | United Kingdom | 1 | 10 | 24 | 4 | 16 | 16 | 11 | NaN | 50 | 86.17 |
4 | 5 | California Institute of Technology | USA | 4 | 2 | 29 | 7 | 37 | 22 | 22 | NaN | 18 | 85.21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 96 | University of Texas MD Anderson Cancer Center | USA | 58 | 101 | 101 | 101 | 95 | 46 | 66 | NaN | 100 | 43.88 |
96 | 97 | University of Nottingham | United Kingdom | 6 | 101 | 101 | 87 | 101 | 101 | 101 | NaN | 92 | 43.79 |
97 | 98 | University of Bristol | United Kingdom | 7 | 101 | 101 | 78 | 75 | 81 | 86 | NaN | 101 | 43.77 |
98 | 99 | Utrecht University | Netherlands | 2 | 100 | 101 | 101 | 65 | 101 | 60 | NaN | 101 | 43.47 |
99 | 100 | Mines ParisTech | France | 5 | 44 | 4 | 101 | 101 | 101 | 101 | NaN | 101 | 43.36 |
100 rows × 13 columns
data_2012.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 13 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 world_rank 100 non-null int64 1 institution 100 non-null object 2 country 100 non-null object 3 national_rank 100 non-null int64 4 quality_of_education 100 non-null int64 5 alumni_employment 100 non-null int64 6 quality_of_faculty 100 non-null int64 7 publications 100 non-null int64 8 influence 100 non-null int64 9 citations 100 non-null int64 10 broad_impact 0 non-null float6411 patents 100 non-null int64 12 score 100 non-null float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB
从上面的结果我们可以看出,broad_impact
一列全部缺失,在后面的分析中可以删除此列;其余数据列没有缺失值。
接下来查看2012年前100所世界名校的国家分布。
data_2012['country'].value_counts()
USA 58
United Kingdom 8
Japan 5
France 5
Israel 4
Switzerland 4
Canada 3
Germany 3
Australia 2
Netherlands 2
Italy 1
Norway 1
Denmark 1
South Korea 1
Finland 1
Sweden 1
Name: country, dtype: int64
可以看出,世界排名前100的高校榜单中,美国高校占据了其中58个席位,远远超过排名第二的英国(8所)。中国没有高校入选该榜单,不清楚是不是因为没有参与该榜单的排名。
删除broad_impact
一列,便于后续分析。
data_2012_copy = data_2012.copy()
data_2012_copy.drop('broad_impact', axis=1, inplace=True)
data_2012 = data_2012_copy
data_2012
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harvard University | USA | 1 | 7 | 9 | 1 | 1 | 1 | 1 | 5 | 100.00 |
1 | 2 | Massachusetts Institute of Technology | USA | 2 | 9 | 17 | 3 | 12 | 4 | 4 | 1 | 91.67 |
2 | 3 | Stanford University | USA | 3 | 17 | 11 | 5 | 4 | 2 | 2 | 15 | 89.50 |
3 | 4 | University of Cambridge | United Kingdom | 1 | 10 | 24 | 4 | 16 | 16 | 11 | 50 | 86.17 |
4 | 5 | California Institute of Technology | USA | 4 | 2 | 29 | 7 | 37 | 22 | 22 | 18 | 85.21 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 96 | University of Texas MD Anderson Cancer Center | USA | 58 | 101 | 101 | 101 | 95 | 46 | 66 | 100 | 43.88 |
96 | 97 | University of Nottingham | United Kingdom | 6 | 101 | 101 | 87 | 101 | 101 | 101 | 92 | 43.79 |
97 | 98 | University of Bristol | United Kingdom | 7 | 101 | 101 | 78 | 75 | 81 | 86 | 101 | 43.77 |
98 | 99 | Utrecht University | Netherlands | 2 | 100 | 101 | 101 | 65 | 101 | 60 | 101 | 43.47 |
99 | 100 | Mines ParisTech | France | 5 | 44 | 4 | 101 | 101 | 101 | 101 | 101 | 43.36 |
100 rows × 12 columns
因为分析national_rank
和score
对world_rank
的影响意义不大,所以删除这两列数据。
data_2012_world_rank = data_2012.copy()
data_2012_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2012_world_rank
world_rank | institution | country | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | patents | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Harvard University | USA | 7 | 9 | 1 | 1 | 1 | 1 | 5 |
1 | 2 | Massachusetts Institute of Technology | USA | 9 | 17 | 3 | 12 | 4 | 4 | 1 |
2 | 3 | Stanford University | USA | 17 | 11 | 5 | 4 | 2 | 2 | 15 |
3 | 4 | University of Cambridge | United Kingdom | 10 | 24 | 4 | 16 | 16 | 11 | 50 |
4 | 5 | California Institute of Technology | USA | 2 | 29 | 7 | 37 | 22 | 22 | 18 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 96 | University of Texas MD Anderson Cancer Center | USA | 101 | 101 | 101 | 95 | 46 | 66 | 100 |
96 | 97 | University of Nottingham | United Kingdom | 101 | 101 | 87 | 101 | 101 | 101 | 92 |
97 | 98 | University of Bristol | United Kingdom | 101 | 101 | 78 | 75 | 81 | 86 | 101 |
98 | 99 | Utrecht University | Netherlands | 100 | 101 | 101 | 65 | 101 | 60 | 101 |
99 | 100 | Mines ParisTech | France | 44 | 4 | 101 | 101 | 101 | 101 | 101 |
100 rows × 10 columns
利用matplotlib
和seaborn
绘制热力图,查看各个变量间的相关度。
plt.figure(figsize=(10, 10))
sns.heatmap(data_2012_world_rank.corr(), annot=True)
<AxesSubplot:>
可以看出,在2012年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations) 等。
2013年数据分析
2013年的榜单与2012年结构相差不大,分析流程也与上面的流程一致,分析过程不再赘述。
data_2013 = df[df['year'] == 2013].drop('year', axis=1)
data_2013
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | NaN | 7 | 100.00 |
101 | 2 | Stanford University | USA | 2 | 11 | 2 | 4 | 6 | 2 | 2 | NaN | 11 | 93.94 |
102 | 3 | University of Oxford | United Kingdom | 1 | 7 | 12 | 10 | 11 | 7 | 13 | NaN | 15 | 92.54 |
103 | 4 | Massachusetts Institute of Technology | USA | 3 | 2 | 16 | 2 | 16 | 3 | 3 | NaN | 1 | 91.45 |
104 | 5 | University of Cambridge | United Kingdom | 2 | 3 | 15 | 5 | 9 | 11 | 10 | NaN | 39 | 90.24 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
195 | 96 | Australian National University | Australia | 2 | 101 | 101 | 43 | 101 | 101 | 101 | NaN | 101 | 44.50 |
196 | 97 | University of Alberta | Canada | 4 | 101 | 101 | 101 | 68 | 101 | 92 | NaN | 81 | 44.50 |
197 | 98 | University of Helsinki | Finland | 1 | 69 | 101 | 81 | 74 | 79 | 71 | NaN | 101 | 44.39 |
198 | 99 | Paris Diderot University - Paris 7 | France | 5 | 28 | 101 | 72 | 101 | 87 | 101 | NaN | 101 | 44.36 |
199 | 100 | Georgia Institute of Technology | USA | 57 | 101 | 85 | 101 | 97 | 101 | 43 | NaN | 32 | 44.26 |
100 rows × 13 columns
data_2013.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 100 to 199
Data columns (total 13 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 world_rank 100 non-null int64 1 institution 100 non-null object 2 country 100 non-null object 3 national_rank 100 non-null int64 4 quality_of_education 100 non-null int64 5 alumni_employment 100 non-null int64 6 quality_of_faculty 100 non-null int64 7 publications 100 non-null int64 8 influence 100 non-null int64 9 citations 100 non-null int64 10 broad_impact 0 non-null float6411 patents 100 non-null int64 12 score 100 non-null float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB
data_2013['country'].value_counts()
USA 57
United Kingdom 7
Japan 6
France 5
Switzerland 4
Canada 4
Israel 4
Australia 2
Germany 2
South Korea 1
Russia 1
Denmark 1
Singapore 1
Netherlands 1
Finland 1
Norway 1
Sweden 1
Italy 1
Name: country, dtype: int64
2013年世界排名前100的高校来自18个国家和地区,在2012年未入选前100名的来自新加坡和俄罗斯的各一所高校入选2013年榜单。此外,仍然没有来自中国的高校入选该榜单。
data_2013_copy = data_2013.copy()
data_2013_copy.drop('broad_impact', axis=1, inplace=True)
data_2013 = data_2013_copy
data_2013
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
100 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 7 | 100.00 |
101 | 2 | Stanford University | USA | 2 | 11 | 2 | 4 | 6 | 2 | 2 | 11 | 93.94 |
102 | 3 | University of Oxford | United Kingdom | 1 | 7 | 12 | 10 | 11 | 7 | 13 | 15 | 92.54 |
103 | 4 | Massachusetts Institute of Technology | USA | 3 | 2 | 16 | 2 | 16 | 3 | 3 | 1 | 91.45 |
104 | 5 | University of Cambridge | United Kingdom | 2 | 3 | 15 | 5 | 9 | 11 | 10 | 39 | 90.24 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
195 | 96 | Australian National University | Australia | 2 | 101 | 101 | 43 | 101 | 101 | 101 | 101 | 44.50 |
196 | 97 | University of Alberta | Canada | 4 | 101 | 101 | 101 | 68 | 101 | 92 | 81 | 44.50 |
197 | 98 | University of Helsinki | Finland | 1 | 69 | 101 | 81 | 74 | 79 | 71 | 101 | 44.39 |
198 | 99 | Paris Diderot University - Paris 7 | France | 5 | 28 | 101 | 72 | 101 | 87 | 101 | 101 | 44.36 |
199 | 100 | Georgia Institute of Technology | USA | 57 | 101 | 85 | 101 | 97 | 101 | 43 | 32 | 44.26 |
100 rows × 12 columns
data_2013_world_rank = data_2013.copy()
data_2013_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2013_world_rank
world_rank | institution | country | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | patents | |
---|---|---|---|---|---|---|---|---|---|---|
100 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 7 |
101 | 2 | Stanford University | USA | 11 | 2 | 4 | 6 | 2 | 2 | 11 |
102 | 3 | University of Oxford | United Kingdom | 7 | 12 | 10 | 11 | 7 | 13 | 15 |
103 | 4 | Massachusetts Institute of Technology | USA | 2 | 16 | 2 | 16 | 3 | 3 | 1 |
104 | 5 | University of Cambridge | United Kingdom | 3 | 15 | 5 | 9 | 11 | 10 | 39 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
195 | 96 | Australian National University | Australia | 101 | 101 | 43 | 101 | 101 | 101 | 101 |
196 | 97 | University of Alberta | Canada | 101 | 101 | 101 | 68 | 101 | 92 | 81 |
197 | 98 | University of Helsinki | Finland | 69 | 101 | 81 | 74 | 79 | 71 | 101 |
198 | 99 | Paris Diderot University - Paris 7 | France | 28 | 101 | 72 | 101 | 87 | 101 | 101 |
199 | 100 | Georgia Institute of Technology | USA | 101 | 85 | 101 | 97 | 101 | 43 | 32 |
100 rows × 10 columns
plt.figure(figsize=(10, 10))
sns.heatmap(data_2013_world_rank.corr(), annot=True)
<AxesSubplot:>
在2013年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations)、出版物 (Publications) 等,整体与2012年榜单一致。
2014年数据分析
首先依旧是筛选2014年的数据。
data_2014 = df[df['year'] == 2014].drop('year', axis=1)
data_2014
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
200 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1.0 | 2 | 100.00 |
201 | 2 | Stanford University | USA | 2 | 11 | 2 | 4 | 5 | 3 | 3 | 4.0 | 6 | 99.09 |
202 | 3 | Massachusetts Institute of Technology | USA | 3 | 3 | 11 | 2 | 15 | 2 | 2 | 2.0 | 1 | 98.69 |
203 | 4 | University of Cambridge | United Kingdom | 1 | 2 | 10 | 5 | 10 | 9 | 12 | 13.0 | 48 | 97.64 |
204 | 5 | University of Oxford | United Kingdom | 2 | 7 | 12 | 10 | 11 | 12 | 11 | 12.0 | 16 | 97.51 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1195 | 996 | National Dong Hwa University | Taiwan | 24 | 355 | 478 | 210 | 901 | 934 | 800 | 989.0 | 737 | 44.24 |
1196 | 997 | National Taipei University of Technology | Taiwan | 25 | 355 | 478 | 210 | 867 | 987 | 800 | 994.0 | 737 | 44.24 |
1197 | 998 | Shaanxi Normal University | China | 82 | 355 | 478 | 210 | 956 | 965 | 800 | 994.0 | 737 | 44.23 |
1198 | 999 | National University of Defense Technology | China | 83 | 355 | 478 | 210 | 860 | 973 | 800 | 999.0 | 637 | 44.21 |
1199 | 1000 | Yanbian University | China | 84 | 355 | 478 | 210 | 890 | 790 | 800 | 1000.0 | 737 | 44.18 |
1000 rows × 13 columns
data_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 200 to 1199
Data columns (total 13 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 world_rank 1000 non-null int64 1 institution 1000 non-null object 2 country 1000 non-null object 3 national_rank 1000 non-null int64 4 quality_of_education 1000 non-null int64 5 alumni_employment 1000 non-null int64 6 quality_of_faculty 1000 non-null int64 7 publications 1000 non-null int64 8 influence 1000 non-null int64 9 citations 1000 non-null int64 10 broad_impact 1000 non-null float6411 patents 1000 non-null int64 12 score 1000 non-null float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB
2014年的榜单相较于前两年,世界排名从100增加到了1000,另外broad_impact
一列也不再有缺失值,不知道这一列的加入是否会对决定世界排名的因素有影响。
data_2014['country'].value_counts()
USA 229
China 84
Japan 74
United Kingdom 64
Germany 55
France 50
Italy 47
Spain 41
South Korea 34
Canada 32
Australia 27
Taiwan 25
Brazil 18
India 15
Netherlands 13
Austria 12
Sweden 11
Belgium 10
Turkey 10
Finland 9
Poland 9
Switzerland 9
Ireland 8
Iran 8
Greece 7
Portugal 7
Israel 7
New Zealand 6
Hong Kong 6
Hungary 6
Denmark 5
South Africa 5
Norway 5
Czech Republic 5
Chile 4
Argentina 4
Egypt 4
Saudi Arabia 4
Thailand 3
Russia 3
Malaysia 3
Slovenia 2
Singapore 2
Mexico 2
Colombia 2
Estonia 1
Cyprus 1
United Arab Emirates 1
Uganda 1
Lebanon 1
Romania 1
Croatia 1
Serbia 1
Lithuania 1
Bulgaria 1
Uruguay 1
Slovak Republic 1
Iceland 1
Puerto Rico 1
Name: country, dtype: int64
2014年,世界排名前1000的高校中,美国仍然占了相当大的比例 (22.9%)。排名第二的是中国(大陆地区),有84所高校入选榜单。排名第三至第五的国家分别是日本、英国、德国。中国香港也有6所高校入选。
下面我们来查看一下有哪些中国(大陆地区)高校入选了2014年榜单,并且查看这些高校的世界排名。
data_2014_China = data_2014[data_2014['country'] == 'China']
data_2014_China
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
254 | 55 | Peking University | China | 1 | 355 | 35 | 210 | 65 | 155 | 250 | 155.0 | 7 | 55.30 |
286 | 87 | Tsinghua University | China | 2 | 294 | 63 | 210 | 79 | 192 | 134 | 162.0 | 16 | 52.60 |
388 | 189 | Fudan University | China | 3 | 355 | 126 | 210 | 120 | 264 | 310 | 230.0 | 100 | 48.14 |
394 | 195 | Shanghai Jiao Tong University | China | 4 | 325 | 149 | 210 | 102 | 250 | 250 | 234.0 | 138 | 48.02 |
405 | 206 | Zhejiang University | China | 5 | 355 | 293 | 210 | 86 | 318 | 493 | 290.0 | 94 | 47.76 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1189 | 990 | Harbin Medical University | China | 80 | 355 | 478 | 210 | 900 | 862 | 800 | 979.0 | 737 | 44.26 |
1190 | 991 | Zhejiang Normal University | China | 81 | 355 | 478 | 210 | 905 | 932 | 800 | 979.0 | 737 | 44.26 |
1197 | 998 | Shaanxi Normal University | China | 82 | 355 | 478 | 210 | 956 | 965 | 800 | 994.0 | 737 | 44.23 |
1198 | 999 | National University of Defense Technology | China | 83 | 355 | 478 | 210 | 860 | 973 | 800 | 999.0 | 637 | 44.21 |
1199 | 1000 | Yanbian University | China | 84 | 355 | 478 | 210 | 890 | 790 | 800 | 1000.0 | 737 | 44.18 |
84 rows × 13 columns
for i in range(84):print('{:4d}: {}'.format(data_2014_China['world_rank'].tolist()[i], data_2014_China['institution'].tolist()[i]))
55: Peking University87: Tsinghua University189: Fudan University195: Shanghai Jiao Tong University206: Zhejiang University217: Nanjing University242: Dalian University of Technology270: University of Science and Technology of China292: Sun Yat-sen University349: Nankai University354: Xiamen University370: Tongji University385: Beijing Normal University394: Tianjin University399: Xi'an Jiaotong University408: Huazhong University of Science and Technology412: Southeast University435: Jilin University438: Wuhan University457: Shandong University465: Central South University475: Harbin Institute of Technology498: South China University of Technology515: Sichuan University528: East China University of Science and Technology560: Beihang University571: Lanzhou University588: Hunan University599: University of Science and Technology Beijing610: Central China Normal University669: Northeast Normal University672: East China Normal University685: Beijing University of Chemical Technology700: China Agricultural University710: Nanjing University of Science and Technology713: Soochow University (Suzhou)715: Fuzhou University722: Shanghai University731: Donghua University751: Wuhan University of Technology761: Peking Union Medical College782: Second Military Medical University790: University of Electronic Science and Technology of China815: Chongqing University816: Beijing Institute of Technology818: Zhengzhou University824: Capital Medical University842: Nanjing Normal University846: Jinan University865: Nanjing University of Technology870: Ocean University of China877: Jiangnan University883: Huazhong Agricultural University886: Shanghai Normal University896: Nanjing Agricultural University907: Northeastern University (China)912: Northwest University (China)917: Nanjing Medical University921: Southwest University932: Fourth Military Medical University935: Nanjing University of Aeronautics and Astronautics941: Jiangsu University942: Third Military Medical University943: South China Normal University945: Xiangtan University946: Yangzhou University947: Northwestern Polytechnical University955: Southern Medical University960: Beijing Jiaotong University968: Hunan Normal University973: South China Agricultural University975: China Pharmaceutical University980: Xidian University981: Zhejiang University of Technology982: China Medical University (PRC)985: Beijing University of Technology986: Guangxi University987: Northwest A&F University988: Shanxi University990: Harbin Medical University991: Zhejiang Normal University998: Shaanxi Normal University999: National University of Defense Technology
1000: Yanbian University
plt.figure(figsize=(10, 10))
plt.hist(data_2014_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2014)')
plt.ylabel('Count')
plt.show()
从上图可以看出,入选榜单的84所中国(大陆地区)高校大多集中在800-1000名的位置,位于前200名的高校数量并不多。
data_2014_world_rank = data_2014.copy()
data_2014_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2014_world_rank
world_rank | institution | country | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | |
---|---|---|---|---|---|---|---|---|---|---|---|
200 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1.0 | 2 |
201 | 2 | Stanford University | USA | 11 | 2 | 4 | 5 | 3 | 3 | 4.0 | 6 |
202 | 3 | Massachusetts Institute of Technology | USA | 3 | 11 | 2 | 15 | 2 | 2 | 2.0 | 1 |
203 | 4 | University of Cambridge | United Kingdom | 2 | 10 | 5 | 10 | 9 | 12 | 13.0 | 48 |
204 | 5 | University of Oxford | United Kingdom | 7 | 12 | 10 | 11 | 12 | 11 | 12.0 | 16 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1195 | 996 | National Dong Hwa University | Taiwan | 355 | 478 | 210 | 901 | 934 | 800 | 989.0 | 737 |
1196 | 997 | National Taipei University of Technology | Taiwan | 355 | 478 | 210 | 867 | 987 | 800 | 994.0 | 737 |
1197 | 998 | Shaanxi Normal University | China | 355 | 478 | 210 | 956 | 965 | 800 | 994.0 | 737 |
1198 | 999 | National University of Defense Technology | China | 355 | 478 | 210 | 860 | 973 | 800 | 999.0 | 637 |
1199 | 1000 | Yanbian University | China | 355 | 478 | 210 | 890 | 790 | 800 | 1000.0 | 737 |
1000 rows × 11 columns
plt.figure(figsize=(10, 10))
sns.heatmap(data_2014_world_rank.corr(), annot=True)
<AxesSubplot:>
在2014年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等。broad_impact
一列数据的加入使得决定性因素发生了变化,而其他几项因素的相关度相较于前两年也有所改变。可能这也和数据量从100增加到了1000有关。
2015年数据分析
2015年的数据与2014年结构类似,共有1000条数据,并且没有缺失值,分析流程与2014年数据基本一致,下面不再赘述。
data_2015 = df[df['year'] == 2015].drop('year', axis=1)
data_2015
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1200 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1.0 | 3 | 100.00 |
1201 | 2 | Stanford University | USA | 2 | 9 | 2 | 4 | 5 | 3 | 3 | 4.0 | 10 | 98.66 |
1202 | 3 | Massachusetts Institute of Technology | USA | 3 | 3 | 11 | 2 | 15 | 2 | 2 | 2.0 | 1 | 97.54 |
1203 | 4 | University of Cambridge | United Kingdom | 1 | 2 | 10 | 5 | 11 | 6 | 12 | 13.0 | 48 | 96.81 |
1204 | 5 | University of Oxford | United Kingdom | 2 | 7 | 13 | 10 | 7 | 12 | 7 | 9.0 | 15 | 96.46 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2195 | 996 | University of the Algarve | Portugal | 7 | 367 | 567 | 218 | 926 | 845 | 812 | 969.0 | 816 | 44.03 |
2196 | 997 | Alexandria University | Egypt | 4 | 236 | 566 | 218 | 997 | 908 | 645 | 981.0 | 871 | 44.03 |
2197 | 998 | Federal University of Ceará | Brazil | 18 | 367 | 549 | 218 | 830 | 823 | 812 | 975.0 | 824 | 44.03 |
2198 | 999 | University of A Coruña | Spain | 40 | 367 | 567 | 218 | 886 | 974 | 812 | 975.0 | 651 | 44.02 |
2199 | 1000 | China Pharmaceutical University | China | 83 | 367 | 567 | 218 | 861 | 991 | 812 | 981.0 | 547 | 44.02 |
1000 rows × 13 columns
data_2015.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 1200 to 2199
Data columns (total 13 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 world_rank 1000 non-null int64 1 institution 1000 non-null object 2 country 1000 non-null object 3 national_rank 1000 non-null int64 4 quality_of_education 1000 non-null int64 5 alumni_employment 1000 non-null int64 6 quality_of_faculty 1000 non-null int64 7 publications 1000 non-null int64 8 influence 1000 non-null int64 9 citations 1000 non-null int64 10 broad_impact 1000 non-null float6411 patents 1000 non-null int64 12 score 1000 non-null float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB
data_2015['country'].value_counts()
USA 229
China 83
Japan 74
United Kingdom 65
Germany 55
France 49
Italy 47
Spain 40
South Korea 36
Canada 33
Australia 27
Taiwan 21
Brazil 18
India 16
Netherlands 13
Austria 12
Sweden 11
Belgium 10
Turkey 10
Poland 9
Switzerland 9
Finland 9
Iran 8
Ireland 8
Portugal 7
Greece 7
Israel 7
Hong Kong 6
New Zealand 6
Hungary 6
South Africa 5
Czech Republic 5
Denmark 5
Russia 5
Norway 5
Egypt 4
Saudi Arabia 4
Chile 4
Thailand 3
Argentina 3
Malaysia 3
Romania 2
Slovenia 2
Mexico 2
Colombia 2
Singapore 2
Cyprus 1
Estonia 1
United Arab Emirates 1
Uganda 1
Lebanon 1
Croatia 1
Serbia 1
Lithuania 1
Bulgaria 1
Uruguay 1
Slovak Republic 1
Iceland 1
Puerto Rico 1
Name: country, dtype: int64
前五名依然是美国、中国(大陆地区)、日本、英国、德国,与2014年一致。
data_2015_China = data_2015[data_2015['country'] == 'China']
data_2015_China
world_rank | institution | country | national_rank | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1255 | 56 | Peking University | China | 1 | 182 | 38 | 218 | 52 | 131 | 182 | 125.0 | 20 | 54.26 |
1277 | 78 | Tsinghua University | China | 2 | 309 | 73 | 218 | 63 | 158 | 65 | 156.0 | 30 | 52.21 |
1379 | 180 | Shanghai Jiao Tong University | China | 3 | 335 | 143 | 218 | 81 | 267 | 212 | 209.0 | 198 | 47.96 |
1390 | 191 | Zhejiang University | China | 4 | 367 | 309 | 218 | 71 | 290 | 368 | 265.0 | 106 | 47.66 |
1394 | 195 | Fudan University | China | 5 | 367 | 203 | 218 | 112 | 252 | 368 | 204.0 | 123 | 47.56 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2183 | 984 | Zhejiang University of Technology | China | 79 | 367 | 567 | 218 | 858 | 991 | 812 | 958.0 | 672 | 44.04 |
2189 | 990 | Henan Normal University | China | 80 | 367 | 567 | 218 | 959 | 991 | 812 | 958.0 | 871 | 44.04 |
2190 | 991 | Xidian University | China | 81 | 367 | 542 | 218 | 830 | 974 | 812 | 984.0 | 434 | 44.03 |
2192 | 993 | Southwest Jiaotong University | China | 82 | 367 | 327 | 218 | 937 | 962 | 812 | 998.0 | 861 | 44.03 |
2199 | 1000 | China Pharmaceutical University | China | 83 | 367 | 567 | 218 | 861 | 991 | 812 | 981.0 | 547 | 44.02 |
83 rows × 13 columns
for i in range(83):print('{:4d}: {}'.format(data_2015_China['world_rank'].tolist()[i], data_2015_China['institution'].tolist()[i]))
56: Peking University78: Tsinghua University180: Shanghai Jiao Tong University191: Zhejiang University195: Fudan University239: University of Science and Technology of China244: Nanjing University277: Sun Yat-sen University305: Dalian University of Technology313: Xiamen University315: Nankai University331: Tianjin University389: Xi'an Jiaotong University391: Jilin University396: Beijing Normal University400: Huazhong University of Science and Technology410: Harbin Institute of Technology413: Shandong University415: Wuhan University420: Tongji University423: Peking Union Medical College438: Central South University439: South China University of Technology443: Southeast University477: East China University of Science and Technology479: Sichuan University496: Renmin University of China514: Lanzhou University530: University of Science and Technology Beijing553: Hunan University572: Central China Normal University586: East China Normal University615: Soochow University (Suzhou)627: China Agricultural University640: Northeast Normal University642: Beijing University of Chemical Technology659: Donghua University677: Wuhan University of Technology684: Shanghai University705: Beihang University706: Fuzhou University749: Beijing Institute of Technology774: Chongqing University776: Second Military Medical University783: University of Electronic Science and Technology of China788: Capital Medical University795: Zhengzhou University804: Nanjing University of Science and Technology810: Nanjing University of Technology828: Jiangnan University846: Ocean University of China858: Huazhong Agricultural University864: Nanjing Normal University866: Nanjing Agricultural University871: Northeastern University (China)877: Northwest University (China)883: Hefei University of Technology886: Nanjing Medical University888: Shanghai Normal University889: Southwest University895: Nanjing University of Aeronautics and Astronautics899: Fourth Military Medical University904: China University of Geosciences (Wuhan)924: South China Normal University943: Yangzhou University947: Jiangsu University948: Northwestern Polytechnical University949: Xiangtan University953: Harbin Engineering University955: Jinan University957: Third Military Medical University963: Southern Medical University969: South China Agricultural University972: Hunan Normal University975: Shenzhen University976: Tianjin Medical University977: Beijing University of Technology982: China Medical University (PRC)984: Zhejiang University of Technology990: Henan Normal University991: Xidian University993: Southwest Jiaotong University
1000: China Pharmaceutical University
plt.figure(figsize=(10, 10))
plt.hist(data_2015_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2015)')
plt.ylabel('Count')
plt.show()
plt.figure(figsize=(10, 10))
sns.histplot(data_2014_China['world_rank'].tolist(), color='blue', label='2014')
sns.histplot(data_2015_China['world_rank'].tolist(), color='green', label='2015')
plt.legend()
plt.show()
2015年榜单共有83所来自中国(大陆地区)的高校入选,相较于2014年减少了1所。从上面的图表来看,大多数仍然分布于800-1000名,但相较于2014年,中国(大陆地区)高校的排名总体上呈现前进趋势。
data_2015_world_rank = data_2015.copy()
data_2015_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2015_world_rank
world_rank | institution | country | quality_of_education | alumni_employment | quality_of_faculty | publications | influence | citations | broad_impact | patents | |
---|---|---|---|---|---|---|---|---|---|---|---|
1200 | 1 | Harvard University | USA | 1 | 1 | 1 | 1 | 1 | 1 | 1.0 | 3 |
1201 | 2 | Stanford University | USA | 9 | 2 | 4 | 5 | 3 | 3 | 4.0 | 10 |
1202 | 3 | Massachusetts Institute of Technology | USA | 3 | 11 | 2 | 15 | 2 | 2 | 2.0 | 1 |
1203 | 4 | University of Cambridge | United Kingdom | 2 | 10 | 5 | 11 | 6 | 12 | 13.0 | 48 |
1204 | 5 | University of Oxford | United Kingdom | 7 | 13 | 10 | 7 | 12 | 7 | 9.0 | 15 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2195 | 996 | University of the Algarve | Portugal | 367 | 567 | 218 | 926 | 845 | 812 | 969.0 | 816 |
2196 | 997 | Alexandria University | Egypt | 236 | 566 | 218 | 997 | 908 | 645 | 981.0 | 871 |
2197 | 998 | Federal University of Ceará | Brazil | 367 | 549 | 218 | 830 | 823 | 812 | 975.0 | 824 |
2198 | 999 | University of A Coruña | Spain | 367 | 567 | 218 | 886 | 974 | 812 | 975.0 | 651 |
2199 | 1000 | China Pharmaceutical University | China | 367 | 567 | 218 | 861 | 991 | 812 | 981.0 | 547 |
1000 rows × 11 columns
plt.figure(figsize=(10, 10))
sns.heatmap(data_2015_world_rank.corr(), annot=True)
<AxesSubplot:>
在2015年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等,与2014年一致。
这篇关于【Kaggle数据分析实战练习】World University Rankings的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!