【Kaggle数据分析实战练习】World University Rankings

2024-01-09 21:50

本文主要是介绍【Kaggle数据分析实战练习】World University Rankings,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

数据集介绍

本次数据分析的数据集来自Kaggle的World University Rankings的cwurData.csv。数据集共包含2201行14列(含标题行),官方给出的每列的描述如下:

  • world_rank: world rank for university
  • institution: name of university
  • country: country of each university
  • national_rank: rank of university within its country
  • quality_of_education: rank for quality of education
  • alumni_employment: rank for alumni employment
  • quality_of_faculty: rank for quality of faculty
  • publications: rank for publications
  • influence: rank for influence
  • citations: number of students at the university
  • broad_impact: rank for broad impact (only available for 2014 and 2015)
  • patents: rank for patents
  • score: total score, used for determining world rank
  • year: year of ranking (2012 to 2015)

初步探索数据集

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
df = pd.read_csv('./data/cwurData.csv', encoding='utf-8')
df.describe()
world_ranknational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscoreyear
count2200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002000.0000002200.0000002200.0000002200.000000
mean459.59090940.278182275.100455357.116818178.888182459.908636459.797727413.417273496.699500433.34636447.7983952014.318182
std304.32036351.740870121.935100186.77925264.050885303.760352303.331822264.366549286.919755273.9965257.7608060.762130
min1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.00000043.3600002012.000000
25%175.7500006.000000175.750000175.750000175.750000175.750000175.750000161.000000250.500000170.75000044.4600002014.000000
50%450.50000021.000000355.000000450.500000210.000000450.500000450.500000406.000000496.000000426.00000045.1000002014.000000
75%725.25000049.000000367.000000478.000000218.000000725.000000725.250000645.000000741.000000714.25000047.5450002015.000000
max1000.000000229.000000367.000000567.000000218.0000001000.000000991.000000812.0000001000.000000871.000000100.0000002015.000000
df.head()
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscoreyear
01Harvard UniversityUSA1791111NaN5100.002012
12Massachusetts Institute of TechnologyUSA291731244NaN191.672012
23Stanford UniversityUSA317115422NaN1589.502012
34University of CambridgeUnited Kingdom110244161611NaN5086.172012
45California Institute of TechnologyUSA42297372222NaN1885.212012
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 14 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            2200 non-null   int64  1   institution           2200 non-null   object 2   country               2200 non-null   object 3   national_rank         2200 non-null   int64  4   quality_of_education  2200 non-null   int64  5   alumni_employment     2200 non-null   int64  6   quality_of_faculty    2200 non-null   int64  7   publications          2200 non-null   int64  8   influence             2200 non-null   int64  9   citations             2200 non-null   int64  10  broad_impact          2000 non-null   float6411  patents               2200 non-null   int64  12  score                 2200 non-null   float6413  year                  2200 non-null   int64  
dtypes: float64(2), int64(10), object(2)
memory usage: 240.8+ KB
set(df['year'])
{2012, 2013, 2014, 2015}

我们可以看到,这份数据集只有broad_impact一列有缺失值,但缺失不是很多;year一列的范围是2012-2015,接下来我们可以依据年份分别对这些数据进行分析。

2012年数据分析

首先筛选出year2012的数据data_2012

data_2012 = df[df['year'] == 2012].drop('year', axis=1)
data_2012
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
01Harvard UniversityUSA1791111NaN5100.00
12Massachusetts Institute of TechnologyUSA291731244NaN191.67
23Stanford UniversityUSA317115422NaN1589.50
34University of CambridgeUnited Kingdom110244161611NaN5086.17
45California Institute of TechnologyUSA42297372222NaN1885.21
..........................................
9596University of Texas MD Anderson Cancer CenterUSA58101101101954666NaN10043.88
9697University of NottinghamUnited Kingdom610110187101101101NaN9243.79
9798University of BristolUnited Kingdom710110178758186NaN10143.77
9899Utrecht UniversityNetherlands21001011016510160NaN10143.47
99100Mines ParisTechFrance5444101101101101NaN10143.36

100 rows × 13 columns

data_2012.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            100 non-null    int64  1   institution           100 non-null    object 2   country               100 non-null    object 3   national_rank         100 non-null    int64  4   quality_of_education  100 non-null    int64  5   alumni_employment     100 non-null    int64  6   quality_of_faculty    100 non-null    int64  7   publications          100 non-null    int64  8   influence             100 non-null    int64  9   citations             100 non-null    int64  10  broad_impact          0 non-null      float6411  patents               100 non-null    int64  12  score                 100 non-null    float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB

从上面的结果我们可以看出,broad_impact一列全部缺失,在后面的分析中可以删除此列;其余数据列没有缺失值。

接下来查看2012年前100所世界名校的国家分布。

data_2012['country'].value_counts()
USA               58
United Kingdom     8
Japan              5
France             5
Israel             4
Switzerland        4
Canada             3
Germany            3
Australia          2
Netherlands        2
Italy              1
Norway             1
Denmark            1
South Korea        1
Finland            1
Sweden             1
Name: country, dtype: int64

可以看出,世界排名前100的高校榜单中,美国高校占据了其中58个席位,远远超过排名第二的英国(8所)。中国没有高校入选该榜单,不清楚是不是因为没有参与该榜单的排名。

删除broad_impact一列,便于后续分析。

data_2012_copy = data_2012.copy()
data_2012_copy.drop('broad_impact', axis=1, inplace=True)
data_2012 = data_2012_copy
data_2012
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatentsscore
01Harvard UniversityUSA17911115100.00
12Massachusetts Institute of TechnologyUSA291731244191.67
23Stanford UniversityUSA3171154221589.50
34University of CambridgeUnited Kingdom1102441616115086.17
45California Institute of TechnologyUSA422973722221885.21
.......................................
9596University of Texas MD Anderson Cancer CenterUSA5810110110195466610043.88
9697University of NottinghamUnited Kingdom6101101871011011019243.79
9798University of BristolUnited Kingdom71011017875818610143.77
9899Utrecht UniversityNetherlands2100101101651016010143.47
99100Mines ParisTechFrance544410110110110110143.36

100 rows × 12 columns

因为分析national_rankscoreworld_rank的影响意义不大,所以删除这两列数据。

data_2012_world_rank = data_2012.copy()
data_2012_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2012_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatents
01Harvard UniversityUSA7911115
12Massachusetts Institute of TechnologyUSA917312441
23Stanford UniversityUSA1711542215
34University of CambridgeUnited Kingdom1024416161150
45California Institute of TechnologyUSA229737222218
.................................
9596University of Texas MD Anderson Cancer CenterUSA101101101954666100
9697University of NottinghamUnited Kingdom1011018710110110192
9798University of BristolUnited Kingdom10110178758186101
9899Utrecht UniversityNetherlands1001011016510160101
99100Mines ParisTechFrance444101101101101101

100 rows × 10 columns

利用matplotlibseaborn绘制热力图,查看各个变量间的相关度。

plt.figure(figsize=(10, 10))
sns.heatmap(data_2012_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述

可以看出,在2012年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations) 等。

2013年数据分析

2013年的榜单与2012年结构相差不大,分析流程也与上面的流程一致,分析过程不再赘述。

data_2013 = df[df['year'] == 2013].drop('year', axis=1)
data_2013
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
1001Harvard UniversityUSA1111111NaN7100.00
1012Stanford UniversityUSA21124622NaN1193.94
1023University of OxfordUnited Kingdom17121011713NaN1592.54
1034Massachusetts Institute of TechnologyUSA321621633NaN191.45
1045University of CambridgeUnited Kingdom2315591110NaN3990.24
..........................................
19596Australian National UniversityAustralia210110143101101101NaN10144.50
19697University of AlbertaCanada41011011016810192NaN8144.50
19798University of HelsinkiFinland16910181747971NaN10144.39
19899Paris Diderot University - Paris 7France5281017210187101NaN10144.36
199100Georgia Institute of TechnologyUSA57101851019710143NaN3244.26

100 rows × 13 columns

data_2013.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 100 to 199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            100 non-null    int64  1   institution           100 non-null    object 2   country               100 non-null    object 3   national_rank         100 non-null    int64  4   quality_of_education  100 non-null    int64  5   alumni_employment     100 non-null    int64  6   quality_of_faculty    100 non-null    int64  7   publications          100 non-null    int64  8   influence             100 non-null    int64  9   citations             100 non-null    int64  10  broad_impact          0 non-null      float6411  patents               100 non-null    int64  12  score                 100 non-null    float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB
data_2013['country'].value_counts()
USA               57
United Kingdom     7
Japan              6
France             5
Switzerland        4
Canada             4
Israel             4
Australia          2
Germany            2
South Korea        1
Russia             1
Denmark            1
Singapore          1
Netherlands        1
Finland            1
Norway             1
Sweden             1
Italy              1
Name: country, dtype: int64

2013年世界排名前100的高校来自18个国家和地区,在2012年未入选前100名的来自新加坡和俄罗斯的各一所高校入选2013年榜单。此外,仍然没有来自中国的高校入选该榜单。

data_2013_copy = data_2013.copy()
data_2013_copy.drop('broad_impact', axis=1, inplace=True)
data_2013 = data_2013_copy
data_2013
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatentsscore
1001Harvard UniversityUSA11111117100.00
1012Stanford UniversityUSA211246221193.94
1023University of OxfordUnited Kingdom171210117131592.54
1034Massachusetts Institute of TechnologyUSA321621633191.45
1045University of CambridgeUnited Kingdom23155911103990.24
.......................................
19596Australian National UniversityAustralia21011014310110110110144.50
19697University of AlbertaCanada410110110168101928144.50
19798University of HelsinkiFinland1691018174797110144.39
19899Paris Diderot University - Paris 7France528101721018710110144.36
199100Georgia Institute of TechnologyUSA571018510197101433244.26

100 rows × 12 columns

data_2013_world_rank = data_2013.copy()
data_2013_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2013_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatents
1001Harvard UniversityUSA1111117
1012Stanford UniversityUSA112462211
1023University of OxfordUnited Kingdom712101171315
1034Massachusetts Institute of TechnologyUSA216216331
1045University of CambridgeUnited Kingdom31559111039
.................................
19596Australian National UniversityAustralia10110143101101101101
19697University of AlbertaCanada101101101681019281
19798University of HelsinkiFinland6910181747971101
19899Paris Diderot University - Paris 7France281017210187101101
199100Georgia Institute of TechnologyUSA10185101971014332

100 rows × 10 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2013_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述
在2013年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations)、出版物 (Publications) 等,整体与2012年榜单一致。

2014年数据分析

首先依旧是筛选2014年的数据。

data_2014 = df[df['year'] == 2014].drop('year', axis=1)
data_2014
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
2001Harvard UniversityUSA11111111.02100.00
2012Stanford UniversityUSA211245334.0699.09
2023Massachusetts Institute of TechnologyUSA3311215222.0198.69
2034University of CambridgeUnited Kingdom121051091213.04897.64
2045University of OxfordUnited Kingdom27121011121112.01697.51
..........................................
1195996National Dong Hwa UniversityTaiwan24355478210901934800989.073744.24
1196997National Taipei University of TechnologyTaiwan25355478210867987800994.073744.24
1197998Shaanxi Normal UniversityChina82355478210956965800994.073744.23
1198999National University of Defense TechnologyChina83355478210860973800999.063744.21
11991000Yanbian UniversityChina843554782108907908001000.073744.18

1000 rows × 13 columns

data_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 200 to 1199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            1000 non-null   int64  1   institution           1000 non-null   object 2   country               1000 non-null   object 3   national_rank         1000 non-null   int64  4   quality_of_education  1000 non-null   int64  5   alumni_employment     1000 non-null   int64  6   quality_of_faculty    1000 non-null   int64  7   publications          1000 non-null   int64  8   influence             1000 non-null   int64  9   citations             1000 non-null   int64  10  broad_impact          1000 non-null   float6411  patents               1000 non-null   int64  12  score                 1000 non-null   float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB

2014年的榜单相较于前两年,世界排名从100增加到了1000,另外broad_impact一列也不再有缺失值,不知道这一列的加入是否会对决定世界排名的因素有影响。

data_2014['country'].value_counts()
USA                     229
China                    84
Japan                    74
United Kingdom           64
Germany                  55
France                   50
Italy                    47
Spain                    41
South Korea              34
Canada                   32
Australia                27
Taiwan                   25
Brazil                   18
India                    15
Netherlands              13
Austria                  12
Sweden                   11
Belgium                  10
Turkey                   10
Finland                   9
Poland                    9
Switzerland               9
Ireland                   8
Iran                      8
Greece                    7
Portugal                  7
Israel                    7
New Zealand               6
Hong Kong                 6
Hungary                   6
Denmark                   5
South Africa              5
Norway                    5
Czech Republic            5
Chile                     4
Argentina                 4
Egypt                     4
Saudi Arabia              4
Thailand                  3
Russia                    3
Malaysia                  3
Slovenia                  2
Singapore                 2
Mexico                    2
Colombia                  2
Estonia                   1
Cyprus                    1
United Arab Emirates      1
Uganda                    1
Lebanon                   1
Romania                   1
Croatia                   1
Serbia                    1
Lithuania                 1
Bulgaria                  1
Uruguay                   1
Slovak Republic           1
Iceland                   1
Puerto Rico               1
Name: country, dtype: int64

2014年,世界排名前1000的高校中,美国仍然占了相当大的比例 (22.9%)。排名第二的是中国(大陆地区),有84所高校入选榜单。排名第三至第五的国家分别是日本、英国、德国。中国香港也有6所高校入选。

下面我们来查看一下有哪些中国(大陆地区)高校入选了2014年榜单,并且查看这些高校的世界排名。

data_2014_China = data_2014[data_2014['country'] == 'China']
data_2014_China
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
25455Peking UniversityChina13553521065155250155.0755.30
28687Tsinghua UniversityChina22946321079192134162.01652.60
388189Fudan UniversityChina3355126210120264310230.010048.14
394195Shanghai Jiao Tong UniversityChina4325149210102250250234.013848.02
405206Zhejiang UniversityChina535529321086318493290.09447.76
..........................................
1189990Harbin Medical UniversityChina80355478210900862800979.073744.26
1190991Zhejiang Normal UniversityChina81355478210905932800979.073744.26
1197998Shaanxi Normal UniversityChina82355478210956965800994.073744.23
1198999National University of Defense TechnologyChina83355478210860973800999.063744.21
11991000Yanbian UniversityChina843554782108907908001000.073744.18

84 rows × 13 columns

for i in range(84):print('{:4d}: {}'.format(data_2014_China['world_rank'].tolist()[i], data_2014_China['institution'].tolist()[i]))
  55: Peking University87: Tsinghua University189: Fudan University195: Shanghai Jiao Tong University206: Zhejiang University217: Nanjing University242: Dalian University of Technology270: University of Science and Technology of China292: Sun Yat-sen University349: Nankai University354: Xiamen University370: Tongji University385: Beijing Normal University394: Tianjin University399: Xi'an Jiaotong University408: Huazhong University of Science and Technology412: Southeast University435: Jilin University438: Wuhan University457: Shandong University465: Central South University475: Harbin Institute of Technology498: South China University of Technology515: Sichuan University528: East China University of Science and Technology560: Beihang University571: Lanzhou University588: Hunan University599: University of Science and Technology Beijing610: Central China Normal University669: Northeast Normal University672: East China Normal University685: Beijing University of Chemical Technology700: China Agricultural University710: Nanjing University of Science and Technology713: Soochow University (Suzhou)715: Fuzhou University722: Shanghai University731: Donghua University751: Wuhan University of Technology761: Peking Union Medical College782: Second Military Medical University790: University of Electronic Science and Technology of China815: Chongqing University816: Beijing Institute of Technology818: Zhengzhou University824: Capital Medical University842: Nanjing Normal University846: Jinan University865: Nanjing University of Technology870: Ocean University of China877: Jiangnan University883: Huazhong Agricultural University886: Shanghai Normal University896: Nanjing Agricultural University907: Northeastern University (China)912: Northwest University (China)917: Nanjing Medical University921: Southwest University932: Fourth Military Medical University935: Nanjing University of Aeronautics and Astronautics941: Jiangsu University942: Third Military Medical University943: South China Normal University945: Xiangtan University946: Yangzhou University947: Northwestern Polytechnical University955: Southern Medical University960: Beijing Jiaotong University968: Hunan Normal University973: South China Agricultural University975: China Pharmaceutical University980: Xidian University981: Zhejiang University of Technology982: China Medical University (PRC)985: Beijing University of Technology986: Guangxi University987: Northwest A&F University988: Shanxi University990: Harbin Medical University991: Zhejiang Normal University998: Shaanxi Normal University999: National University of Defense Technology
1000: Yanbian University
plt.figure(figsize=(10, 10))
plt.hist(data_2014_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2014)')
plt.ylabel('Count')
plt.show()

在这里插入图片描述

从上图可以看出,入选榜单的84所中国(大陆地区)高校大多集中在800-1000名的位置,位于前200名的高校数量并不多。

data_2014_world_rank = data_2014.copy()
data_2014_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2014_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatents
2001Harvard UniversityUSA1111111.02
2012Stanford UniversityUSA11245334.06
2023Massachusetts Institute of TechnologyUSA311215222.01
2034University of CambridgeUnited Kingdom21051091213.048
2045University of OxfordUnited Kingdom7121011121112.016
....................................
1195996National Dong Hwa UniversityTaiwan355478210901934800989.0737
1196997National Taipei University of TechnologyTaiwan355478210867987800994.0737
1197998Shaanxi Normal UniversityChina355478210956965800994.0737
1198999National University of Defense TechnologyChina355478210860973800999.0637
11991000Yanbian UniversityChina3554782108907908001000.0737

1000 rows × 11 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2014_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述

在2014年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等。broad_impact一列数据的加入使得决定性因素发生了变化,而其他几项因素的相关度相较于前两年也有所改变。可能这也和数据量从100增加到了1000有关。

2015年数据分析

2015年的数据与2014年结构类似,共有1000条数据,并且没有缺失值,分析流程与2014年数据基本一致,下面不再赘述。

data_2015 = df[df['year'] == 2015].drop('year', axis=1)
data_2015
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
12001Harvard UniversityUSA11111111.03100.00
12012Stanford UniversityUSA29245334.01098.66
12023Massachusetts Institute of TechnologyUSA3311215222.0197.54
12034University of CambridgeUnited Kingdom121051161213.04896.81
12045University of OxfordUnited Kingdom27131071279.01596.46
..........................................
2195996University of the AlgarvePortugal7367567218926845812969.081644.03
2196997Alexandria UniversityEgypt4236566218997908645981.087144.03
2197998Federal University of CearáBrazil18367549218830823812975.082444.03
2198999University of A CoruñaSpain40367567218886974812975.065144.02
21991000China Pharmaceutical UniversityChina83367567218861991812981.054744.02

1000 rows × 13 columns

data_2015.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 1200 to 2199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            1000 non-null   int64  1   institution           1000 non-null   object 2   country               1000 non-null   object 3   national_rank         1000 non-null   int64  4   quality_of_education  1000 non-null   int64  5   alumni_employment     1000 non-null   int64  6   quality_of_faculty    1000 non-null   int64  7   publications          1000 non-null   int64  8   influence             1000 non-null   int64  9   citations             1000 non-null   int64  10  broad_impact          1000 non-null   float6411  patents               1000 non-null   int64  12  score                 1000 non-null   float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB
data_2015['country'].value_counts()
USA                     229
China                    83
Japan                    74
United Kingdom           65
Germany                  55
France                   49
Italy                    47
Spain                    40
South Korea              36
Canada                   33
Australia                27
Taiwan                   21
Brazil                   18
India                    16
Netherlands              13
Austria                  12
Sweden                   11
Belgium                  10
Turkey                   10
Poland                    9
Switzerland               9
Finland                   9
Iran                      8
Ireland                   8
Portugal                  7
Greece                    7
Israel                    7
Hong Kong                 6
New Zealand               6
Hungary                   6
South Africa              5
Czech Republic            5
Denmark                   5
Russia                    5
Norway                    5
Egypt                     4
Saudi Arabia              4
Chile                     4
Thailand                  3
Argentina                 3
Malaysia                  3
Romania                   2
Slovenia                  2
Mexico                    2
Colombia                  2
Singapore                 2
Cyprus                    1
Estonia                   1
United Arab Emirates      1
Uganda                    1
Lebanon                   1
Croatia                   1
Serbia                    1
Lithuania                 1
Bulgaria                  1
Uruguay                   1
Slovak Republic           1
Iceland                   1
Puerto Rico               1
Name: country, dtype: int64

前五名依然是美国、中国(大陆地区)、日本、英国、德国,与2014年一致。

data_2015_China = data_2015[data_2015['country'] == 'China']
data_2015_China
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
125556Peking UniversityChina11823821852131182125.02054.26
127778Tsinghua UniversityChina2309732186315865156.03052.21
1379180Shanghai Jiao Tong UniversityChina333514321881267212209.019847.96
1390191Zhejiang UniversityChina436730921871290368265.010647.66
1394195Fudan UniversityChina5367203218112252368204.012347.56
..........................................
2183984Zhejiang University of TechnologyChina79367567218858991812958.067244.04
2189990Henan Normal UniversityChina80367567218959991812958.087144.04
2190991Xidian UniversityChina81367542218830974812984.043444.03
2192993Southwest Jiaotong UniversityChina82367327218937962812998.086144.03
21991000China Pharmaceutical UniversityChina83367567218861991812981.054744.02

83 rows × 13 columns

for i in range(83):print('{:4d}: {}'.format(data_2015_China['world_rank'].tolist()[i], data_2015_China['institution'].tolist()[i]))
  56: Peking University78: Tsinghua University180: Shanghai Jiao Tong University191: Zhejiang University195: Fudan University239: University of Science and Technology of China244: Nanjing University277: Sun Yat-sen University305: Dalian University of Technology313: Xiamen University315: Nankai University331: Tianjin University389: Xi'an Jiaotong University391: Jilin University396: Beijing Normal University400: Huazhong University of Science and Technology410: Harbin Institute of Technology413: Shandong University415: Wuhan University420: Tongji University423: Peking Union Medical College438: Central South University439: South China University of Technology443: Southeast University477: East China University of Science and Technology479: Sichuan University496: Renmin University of China514: Lanzhou University530: University of Science and Technology Beijing553: Hunan University572: Central China Normal University586: East China Normal University615: Soochow University (Suzhou)627: China Agricultural University640: Northeast Normal University642: Beijing University of Chemical Technology659: Donghua University677: Wuhan University of Technology684: Shanghai University705: Beihang University706: Fuzhou University749: Beijing Institute of Technology774: Chongqing University776: Second Military Medical University783: University of Electronic Science and Technology of China788: Capital Medical University795: Zhengzhou University804: Nanjing University of Science and Technology810: Nanjing University of Technology828: Jiangnan University846: Ocean University of China858: Huazhong Agricultural University864: Nanjing Normal University866: Nanjing Agricultural University871: Northeastern University (China)877: Northwest University (China)883: Hefei University of Technology886: Nanjing Medical University888: Shanghai Normal University889: Southwest University895: Nanjing University of Aeronautics and Astronautics899: Fourth Military Medical University904: China University of Geosciences (Wuhan)924: South China Normal University943: Yangzhou University947: Jiangsu University948: Northwestern Polytechnical University949: Xiangtan University953: Harbin Engineering University955: Jinan University957: Third Military Medical University963: Southern Medical University969: South China Agricultural University972: Hunan Normal University975: Shenzhen University976: Tianjin Medical University977: Beijing University of Technology982: China Medical University (PRC)984: Zhejiang University of Technology990: Henan Normal University991: Xidian University993: Southwest Jiaotong University
1000: China Pharmaceutical University
plt.figure(figsize=(10, 10))
plt.hist(data_2015_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2015)')
plt.ylabel('Count')
plt.show()

在这里插入图片描述

plt.figure(figsize=(10, 10))
sns.histplot(data_2014_China['world_rank'].tolist(), color='blue', label='2014')
sns.histplot(data_2015_China['world_rank'].tolist(), color='green', label='2015')
plt.legend()
plt.show()

2015年榜单共有83所来自中国(大陆地区)的高校入选,相较于2014年减少了1所。从上面的图表来看,大多数仍然分布于800-1000名,但相较于2014年,中国(大陆地区)高校的排名总体上呈现前进趋势。

data_2015_world_rank = data_2015.copy()
data_2015_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2015_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatents
12001Harvard UniversityUSA1111111.03
12012Stanford UniversityUSA9245334.010
12023Massachusetts Institute of TechnologyUSA311215222.01
12034University of CambridgeUnited Kingdom21051161213.048
12045University of OxfordUnited Kingdom7131071279.015
....................................
2195996University of the AlgarvePortugal367567218926845812969.0816
2196997Alexandria UniversityEgypt236566218997908645981.0871
2197998Federal University of CearáBrazil367549218830823812975.0824
2198999University of A CoruñaSpain367567218886974812975.0651
21991000China Pharmaceutical UniversityChina367567218861991812981.0547

1000 rows × 11 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2015_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述
在2015年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等,与2014年一致。

这篇关于【Kaggle数据分析实战练习】World University Rankings的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/588549

相关文章

从原理到实战深入理解Java 断言assert

《从原理到实战深入理解Java断言assert》本文深入解析Java断言机制,涵盖语法、工作原理、启用方式及与异常的区别,推荐用于开发阶段的条件检查与状态验证,并强调生产环境应使用参数验证工具类替代... 目录深入理解 Java 断言(assert):从原理到实战引言:为什么需要断言?一、断言基础1.1 语

Java MQTT实战应用

《JavaMQTT实战应用》本文详解MQTT协议,涵盖其发布/订阅机制、低功耗高效特性、三种服务质量等级(QoS0/1/2),以及客户端、代理、主题的核心概念,最后提供Linux部署教程、Sprin... 目录一、MQTT协议二、MQTT优点三、三种服务质量等级四、客户端、代理、主题1. 客户端(Clien

在Spring Boot中集成RabbitMQ的实战记录

《在SpringBoot中集成RabbitMQ的实战记录》本文介绍SpringBoot集成RabbitMQ的步骤,涵盖配置连接、消息发送与接收,并对比两种定义Exchange与队列的方式:手动声明(... 目录前言准备工作1. 安装 RabbitMQ2. 消息发送者(Producer)配置1. 创建 Spr

深度解析Spring Boot拦截器Interceptor与过滤器Filter的区别与实战指南

《深度解析SpringBoot拦截器Interceptor与过滤器Filter的区别与实战指南》本文深度解析SpringBoot中拦截器与过滤器的区别,涵盖执行顺序、依赖关系、异常处理等核心差异,并... 目录Spring Boot拦截器(Interceptor)与过滤器(Filter)深度解析:区别、实现

深度解析Spring AOP @Aspect 原理、实战与最佳实践教程

《深度解析SpringAOP@Aspect原理、实战与最佳实践教程》文章系统讲解了SpringAOP核心概念、实现方式及原理,涵盖横切关注点分离、代理机制(JDK/CGLIB)、切入点类型、性能... 目录1. @ASPect 核心概念1.1 AOP 编程范式1.2 @Aspect 关键特性2. 完整代码实

MySQL中的索引结构和分类实战案例详解

《MySQL中的索引结构和分类实战案例详解》本文详解MySQL索引结构与分类,涵盖B树、B+树、哈希及全文索引,分析其原理与优劣势,并结合实战案例探讨创建、管理及优化技巧,助力提升查询性能,感兴趣的朋... 目录一、索引概述1.1 索引的定义与作用1.2 索引的基本原理二、索引结构详解2.1 B树索引2.2

从入门到精通MySQL 数据库索引(实战案例)

《从入门到精通MySQL数据库索引(实战案例)》索引是数据库的目录,提升查询速度,主要类型包括BTree、Hash、全文、空间索引,需根据场景选择,建议用于高频查询、关联字段、排序等,避免重复率高或... 目录一、索引是什么?能干嘛?核心作用:二、索引的 4 种主要类型(附通俗例子)1. BTree 索引(

Java Web实现类似Excel表格锁定功能实战教程

《JavaWeb实现类似Excel表格锁定功能实战教程》本文将详细介绍通过创建特定div元素并利用CSS布局和JavaScript事件监听来实现类似Excel的锁定行和列效果的方法,感兴趣的朋友跟随... 目录1. 模拟Excel表格锁定功能2. 创建3个div元素实现表格锁定2.1 div元素布局设计2.

Redis 配置文件使用建议redis.conf 从入门到实战

《Redis配置文件使用建议redis.conf从入门到实战》Redis配置方式包括配置文件、命令行参数、运行时CONFIG命令,支持动态修改参数及持久化,常用项涉及端口、绑定、内存策略等,版本8... 目录一、Redis.conf 是什么?二、命令行方式传参(适用于测试)三、运行时动态修改配置(不重启服务

Python并行处理实战之如何使用ProcessPoolExecutor加速计算

《Python并行处理实战之如何使用ProcessPoolExecutor加速计算》Python提供了多种并行处理的方式,其中concurrent.futures模块的ProcessPoolExecu... 目录简介完整代码示例代码解释1. 导入必要的模块2. 定义处理函数3. 主函数4. 生成数字列表5.