【Kaggle数据分析实战练习】World University Rankings

2024-01-09 21:50

本文主要是介绍【Kaggle数据分析实战练习】World University Rankings,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

数据集介绍

本次数据分析的数据集来自Kaggle的World University Rankings的cwurData.csv。数据集共包含2201行14列(含标题行),官方给出的每列的描述如下:

  • world_rank: world rank for university
  • institution: name of university
  • country: country of each university
  • national_rank: rank of university within its country
  • quality_of_education: rank for quality of education
  • alumni_employment: rank for alumni employment
  • quality_of_faculty: rank for quality of faculty
  • publications: rank for publications
  • influence: rank for influence
  • citations: number of students at the university
  • broad_impact: rank for broad impact (only available for 2014 and 2015)
  • patents: rank for patents
  • score: total score, used for determining world rank
  • year: year of ranking (2012 to 2015)

初步探索数据集

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
df = pd.read_csv('./data/cwurData.csv', encoding='utf-8')
df.describe()
world_ranknational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscoreyear
count2200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002200.0000002000.0000002200.0000002200.0000002200.000000
mean459.59090940.278182275.100455357.116818178.888182459.908636459.797727413.417273496.699500433.34636447.7983952014.318182
std304.32036351.740870121.935100186.77925264.050885303.760352303.331822264.366549286.919755273.9965257.7608060.762130
min1.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.0000001.00000043.3600002012.000000
25%175.7500006.000000175.750000175.750000175.750000175.750000175.750000161.000000250.500000170.75000044.4600002014.000000
50%450.50000021.000000355.000000450.500000210.000000450.500000450.500000406.000000496.000000426.00000045.1000002014.000000
75%725.25000049.000000367.000000478.000000218.000000725.000000725.250000645.000000741.000000714.25000047.5450002015.000000
max1000.000000229.000000367.000000567.000000218.0000001000.000000991.000000812.0000001000.000000871.000000100.0000002015.000000
df.head()
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscoreyear
01Harvard UniversityUSA1791111NaN5100.002012
12Massachusetts Institute of TechnologyUSA291731244NaN191.672012
23Stanford UniversityUSA317115422NaN1589.502012
34University of CambridgeUnited Kingdom110244161611NaN5086.172012
45California Institute of TechnologyUSA42297372222NaN1885.212012
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2200 entries, 0 to 2199
Data columns (total 14 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            2200 non-null   int64  1   institution           2200 non-null   object 2   country               2200 non-null   object 3   national_rank         2200 non-null   int64  4   quality_of_education  2200 non-null   int64  5   alumni_employment     2200 non-null   int64  6   quality_of_faculty    2200 non-null   int64  7   publications          2200 non-null   int64  8   influence             2200 non-null   int64  9   citations             2200 non-null   int64  10  broad_impact          2000 non-null   float6411  patents               2200 non-null   int64  12  score                 2200 non-null   float6413  year                  2200 non-null   int64  
dtypes: float64(2), int64(10), object(2)
memory usage: 240.8+ KB
set(df['year'])
{2012, 2013, 2014, 2015}

我们可以看到,这份数据集只有broad_impact一列有缺失值,但缺失不是很多;year一列的范围是2012-2015,接下来我们可以依据年份分别对这些数据进行分析。

2012年数据分析

首先筛选出year2012的数据data_2012

data_2012 = df[df['year'] == 2012].drop('year', axis=1)
data_2012
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
01Harvard UniversityUSA1791111NaN5100.00
12Massachusetts Institute of TechnologyUSA291731244NaN191.67
23Stanford UniversityUSA317115422NaN1589.50
34University of CambridgeUnited Kingdom110244161611NaN5086.17
45California Institute of TechnologyUSA42297372222NaN1885.21
..........................................
9596University of Texas MD Anderson Cancer CenterUSA58101101101954666NaN10043.88
9697University of NottinghamUnited Kingdom610110187101101101NaN9243.79
9798University of BristolUnited Kingdom710110178758186NaN10143.77
9899Utrecht UniversityNetherlands21001011016510160NaN10143.47
99100Mines ParisTechFrance5444101101101101NaN10143.36

100 rows × 13 columns

data_2012.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            100 non-null    int64  1   institution           100 non-null    object 2   country               100 non-null    object 3   national_rank         100 non-null    int64  4   quality_of_education  100 non-null    int64  5   alumni_employment     100 non-null    int64  6   quality_of_faculty    100 non-null    int64  7   publications          100 non-null    int64  8   influence             100 non-null    int64  9   citations             100 non-null    int64  10  broad_impact          0 non-null      float6411  patents               100 non-null    int64  12  score                 100 non-null    float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB

从上面的结果我们可以看出,broad_impact一列全部缺失,在后面的分析中可以删除此列;其余数据列没有缺失值。

接下来查看2012年前100所世界名校的国家分布。

data_2012['country'].value_counts()
USA               58
United Kingdom     8
Japan              5
France             5
Israel             4
Switzerland        4
Canada             3
Germany            3
Australia          2
Netherlands        2
Italy              1
Norway             1
Denmark            1
South Korea        1
Finland            1
Sweden             1
Name: country, dtype: int64

可以看出,世界排名前100的高校榜单中,美国高校占据了其中58个席位,远远超过排名第二的英国(8所)。中国没有高校入选该榜单,不清楚是不是因为没有参与该榜单的排名。

删除broad_impact一列,便于后续分析。

data_2012_copy = data_2012.copy()
data_2012_copy.drop('broad_impact', axis=1, inplace=True)
data_2012 = data_2012_copy
data_2012
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatentsscore
01Harvard UniversityUSA17911115100.00
12Massachusetts Institute of TechnologyUSA291731244191.67
23Stanford UniversityUSA3171154221589.50
34University of CambridgeUnited Kingdom1102441616115086.17
45California Institute of TechnologyUSA422973722221885.21
.......................................
9596University of Texas MD Anderson Cancer CenterUSA5810110110195466610043.88
9697University of NottinghamUnited Kingdom6101101871011011019243.79
9798University of BristolUnited Kingdom71011017875818610143.77
9899Utrecht UniversityNetherlands2100101101651016010143.47
99100Mines ParisTechFrance544410110110110110143.36

100 rows × 12 columns

因为分析national_rankscoreworld_rank的影响意义不大,所以删除这两列数据。

data_2012_world_rank = data_2012.copy()
data_2012_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2012_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatents
01Harvard UniversityUSA7911115
12Massachusetts Institute of TechnologyUSA917312441
23Stanford UniversityUSA1711542215
34University of CambridgeUnited Kingdom1024416161150
45California Institute of TechnologyUSA229737222218
.................................
9596University of Texas MD Anderson Cancer CenterUSA101101101954666100
9697University of NottinghamUnited Kingdom1011018710110110192
9798University of BristolUnited Kingdom10110178758186101
9899Utrecht UniversityNetherlands1001011016510160101
99100Mines ParisTechFrance444101101101101101

100 rows × 10 columns

利用matplotlibseaborn绘制热力图,查看各个变量间的相关度。

plt.figure(figsize=(10, 10))
sns.heatmap(data_2012_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述

可以看出,在2012年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations) 等。

2013年数据分析

2013年的榜单与2012年结构相差不大,分析流程也与上面的流程一致,分析过程不再赘述。

data_2013 = df[df['year'] == 2013].drop('year', axis=1)
data_2013
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
1001Harvard UniversityUSA1111111NaN7100.00
1012Stanford UniversityUSA21124622NaN1193.94
1023University of OxfordUnited Kingdom17121011713NaN1592.54
1034Massachusetts Institute of TechnologyUSA321621633NaN191.45
1045University of CambridgeUnited Kingdom2315591110NaN3990.24
..........................................
19596Australian National UniversityAustralia210110143101101101NaN10144.50
19697University of AlbertaCanada41011011016810192NaN8144.50
19798University of HelsinkiFinland16910181747971NaN10144.39
19899Paris Diderot University - Paris 7France5281017210187101NaN10144.36
199100Georgia Institute of TechnologyUSA57101851019710143NaN3244.26

100 rows × 13 columns

data_2013.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 100 to 199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            100 non-null    int64  1   institution           100 non-null    object 2   country               100 non-null    object 3   national_rank         100 non-null    int64  4   quality_of_education  100 non-null    int64  5   alumni_employment     100 non-null    int64  6   quality_of_faculty    100 non-null    int64  7   publications          100 non-null    int64  8   influence             100 non-null    int64  9   citations             100 non-null    int64  10  broad_impact          0 non-null      float6411  patents               100 non-null    int64  12  score                 100 non-null    float64
dtypes: float64(2), int64(9), object(2)
memory usage: 10.9+ KB
data_2013['country'].value_counts()
USA               57
United Kingdom     7
Japan              6
France             5
Switzerland        4
Canada             4
Israel             4
Australia          2
Germany            2
South Korea        1
Russia             1
Denmark            1
Singapore          1
Netherlands        1
Finland            1
Norway             1
Sweden             1
Italy              1
Name: country, dtype: int64

2013年世界排名前100的高校来自18个国家和地区,在2012年未入选前100名的来自新加坡和俄罗斯的各一所高校入选2013年榜单。此外,仍然没有来自中国的高校入选该榜单。

data_2013_copy = data_2013.copy()
data_2013_copy.drop('broad_impact', axis=1, inplace=True)
data_2013 = data_2013_copy
data_2013
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatentsscore
1001Harvard UniversityUSA11111117100.00
1012Stanford UniversityUSA211246221193.94
1023University of OxfordUnited Kingdom171210117131592.54
1034Massachusetts Institute of TechnologyUSA321621633191.45
1045University of CambridgeUnited Kingdom23155911103990.24
.......................................
19596Australian National UniversityAustralia21011014310110110110144.50
19697University of AlbertaCanada410110110168101928144.50
19798University of HelsinkiFinland1691018174797110144.39
19899Paris Diderot University - Paris 7France528101721018710110144.36
199100Georgia Institute of TechnologyUSA571018510197101433244.26

100 rows × 12 columns

data_2013_world_rank = data_2013.copy()
data_2013_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2013_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationspatents
1001Harvard UniversityUSA1111117
1012Stanford UniversityUSA112462211
1023University of OxfordUnited Kingdom712101171315
1034Massachusetts Institute of TechnologyUSA216216331
1045University of CambridgeUnited Kingdom31559111039
.................................
19596Australian National UniversityAustralia10110143101101101101
19697University of AlbertaCanada101101101681019281
19798University of HelsinkiFinland6910181747971101
19899Paris Diderot University - Paris 7France281017210187101101
199100Georgia Institute of TechnologyUSA10185101971014332

100 rows × 10 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2013_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述
在2013年的榜单中,决定世界排名 (World Rank) 的几个主要因素为教师质量 (Quality of Faculty)、影响力 (Influence)、被引用次数 (Citations)、出版物 (Publications) 等,整体与2012年榜单一致。

2014年数据分析

首先依旧是筛选2014年的数据。

data_2014 = df[df['year'] == 2014].drop('year', axis=1)
data_2014
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
2001Harvard UniversityUSA11111111.02100.00
2012Stanford UniversityUSA211245334.0699.09
2023Massachusetts Institute of TechnologyUSA3311215222.0198.69
2034University of CambridgeUnited Kingdom121051091213.04897.64
2045University of OxfordUnited Kingdom27121011121112.01697.51
..........................................
1195996National Dong Hwa UniversityTaiwan24355478210901934800989.073744.24
1196997National Taipei University of TechnologyTaiwan25355478210867987800994.073744.24
1197998Shaanxi Normal UniversityChina82355478210956965800994.073744.23
1198999National University of Defense TechnologyChina83355478210860973800999.063744.21
11991000Yanbian UniversityChina843554782108907908001000.073744.18

1000 rows × 13 columns

data_2014.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 200 to 1199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            1000 non-null   int64  1   institution           1000 non-null   object 2   country               1000 non-null   object 3   national_rank         1000 non-null   int64  4   quality_of_education  1000 non-null   int64  5   alumni_employment     1000 non-null   int64  6   quality_of_faculty    1000 non-null   int64  7   publications          1000 non-null   int64  8   influence             1000 non-null   int64  9   citations             1000 non-null   int64  10  broad_impact          1000 non-null   float6411  patents               1000 non-null   int64  12  score                 1000 non-null   float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB

2014年的榜单相较于前两年,世界排名从100增加到了1000,另外broad_impact一列也不再有缺失值,不知道这一列的加入是否会对决定世界排名的因素有影响。

data_2014['country'].value_counts()
USA                     229
China                    84
Japan                    74
United Kingdom           64
Germany                  55
France                   50
Italy                    47
Spain                    41
South Korea              34
Canada                   32
Australia                27
Taiwan                   25
Brazil                   18
India                    15
Netherlands              13
Austria                  12
Sweden                   11
Belgium                  10
Turkey                   10
Finland                   9
Poland                    9
Switzerland               9
Ireland                   8
Iran                      8
Greece                    7
Portugal                  7
Israel                    7
New Zealand               6
Hong Kong                 6
Hungary                   6
Denmark                   5
South Africa              5
Norway                    5
Czech Republic            5
Chile                     4
Argentina                 4
Egypt                     4
Saudi Arabia              4
Thailand                  3
Russia                    3
Malaysia                  3
Slovenia                  2
Singapore                 2
Mexico                    2
Colombia                  2
Estonia                   1
Cyprus                    1
United Arab Emirates      1
Uganda                    1
Lebanon                   1
Romania                   1
Croatia                   1
Serbia                    1
Lithuania                 1
Bulgaria                  1
Uruguay                   1
Slovak Republic           1
Iceland                   1
Puerto Rico               1
Name: country, dtype: int64

2014年,世界排名前1000的高校中,美国仍然占了相当大的比例 (22.9%)。排名第二的是中国(大陆地区),有84所高校入选榜单。排名第三至第五的国家分别是日本、英国、德国。中国香港也有6所高校入选。

下面我们来查看一下有哪些中国(大陆地区)高校入选了2014年榜单,并且查看这些高校的世界排名。

data_2014_China = data_2014[data_2014['country'] == 'China']
data_2014_China
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
25455Peking UniversityChina13553521065155250155.0755.30
28687Tsinghua UniversityChina22946321079192134162.01652.60
388189Fudan UniversityChina3355126210120264310230.010048.14
394195Shanghai Jiao Tong UniversityChina4325149210102250250234.013848.02
405206Zhejiang UniversityChina535529321086318493290.09447.76
..........................................
1189990Harbin Medical UniversityChina80355478210900862800979.073744.26
1190991Zhejiang Normal UniversityChina81355478210905932800979.073744.26
1197998Shaanxi Normal UniversityChina82355478210956965800994.073744.23
1198999National University of Defense TechnologyChina83355478210860973800999.063744.21
11991000Yanbian UniversityChina843554782108907908001000.073744.18

84 rows × 13 columns

for i in range(84):print('{:4d}: {}'.format(data_2014_China['world_rank'].tolist()[i], data_2014_China['institution'].tolist()[i]))
  55: Peking University87: Tsinghua University189: Fudan University195: Shanghai Jiao Tong University206: Zhejiang University217: Nanjing University242: Dalian University of Technology270: University of Science and Technology of China292: Sun Yat-sen University349: Nankai University354: Xiamen University370: Tongji University385: Beijing Normal University394: Tianjin University399: Xi'an Jiaotong University408: Huazhong University of Science and Technology412: Southeast University435: Jilin University438: Wuhan University457: Shandong University465: Central South University475: Harbin Institute of Technology498: South China University of Technology515: Sichuan University528: East China University of Science and Technology560: Beihang University571: Lanzhou University588: Hunan University599: University of Science and Technology Beijing610: Central China Normal University669: Northeast Normal University672: East China Normal University685: Beijing University of Chemical Technology700: China Agricultural University710: Nanjing University of Science and Technology713: Soochow University (Suzhou)715: Fuzhou University722: Shanghai University731: Donghua University751: Wuhan University of Technology761: Peking Union Medical College782: Second Military Medical University790: University of Electronic Science and Technology of China815: Chongqing University816: Beijing Institute of Technology818: Zhengzhou University824: Capital Medical University842: Nanjing Normal University846: Jinan University865: Nanjing University of Technology870: Ocean University of China877: Jiangnan University883: Huazhong Agricultural University886: Shanghai Normal University896: Nanjing Agricultural University907: Northeastern University (China)912: Northwest University (China)917: Nanjing Medical University921: Southwest University932: Fourth Military Medical University935: Nanjing University of Aeronautics and Astronautics941: Jiangsu University942: Third Military Medical University943: South China Normal University945: Xiangtan University946: Yangzhou University947: Northwestern Polytechnical University955: Southern Medical University960: Beijing Jiaotong University968: Hunan Normal University973: South China Agricultural University975: China Pharmaceutical University980: Xidian University981: Zhejiang University of Technology982: China Medical University (PRC)985: Beijing University of Technology986: Guangxi University987: Northwest A&F University988: Shanxi University990: Harbin Medical University991: Zhejiang Normal University998: Shaanxi Normal University999: National University of Defense Technology
1000: Yanbian University
plt.figure(figsize=(10, 10))
plt.hist(data_2014_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2014)')
plt.ylabel('Count')
plt.show()

在这里插入图片描述

从上图可以看出,入选榜单的84所中国(大陆地区)高校大多集中在800-1000名的位置,位于前200名的高校数量并不多。

data_2014_world_rank = data_2014.copy()
data_2014_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2014_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatents
2001Harvard UniversityUSA1111111.02
2012Stanford UniversityUSA11245334.06
2023Massachusetts Institute of TechnologyUSA311215222.01
2034University of CambridgeUnited Kingdom21051091213.048
2045University of OxfordUnited Kingdom7121011121112.016
....................................
1195996National Dong Hwa UniversityTaiwan355478210901934800989.0737
1196997National Taipei University of TechnologyTaiwan355478210867987800994.0737
1197998Shaanxi Normal UniversityChina355478210956965800994.0737
1198999National University of Defense TechnologyChina355478210860973800999.0637
11991000Yanbian UniversityChina3554782108907908001000.0737

1000 rows × 11 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2014_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述

在2014年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等。broad_impact一列数据的加入使得决定性因素发生了变化,而其他几项因素的相关度相较于前两年也有所改变。可能这也和数据量从100增加到了1000有关。

2015年数据分析

2015年的数据与2014年结构类似,共有1000条数据,并且没有缺失值,分析流程与2014年数据基本一致,下面不再赘述。

data_2015 = df[df['year'] == 2015].drop('year', axis=1)
data_2015
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
12001Harvard UniversityUSA11111111.03100.00
12012Stanford UniversityUSA29245334.01098.66
12023Massachusetts Institute of TechnologyUSA3311215222.0197.54
12034University of CambridgeUnited Kingdom121051161213.04896.81
12045University of OxfordUnited Kingdom27131071279.01596.46
..........................................
2195996University of the AlgarvePortugal7367567218926845812969.081644.03
2196997Alexandria UniversityEgypt4236566218997908645981.087144.03
2197998Federal University of CearáBrazil18367549218830823812975.082444.03
2198999University of A CoruñaSpain40367567218886974812975.065144.02
21991000China Pharmaceutical UniversityChina83367567218861991812981.054744.02

1000 rows × 13 columns

data_2015.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 1200 to 2199
Data columns (total 13 columns):#   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  0   world_rank            1000 non-null   int64  1   institution           1000 non-null   object 2   country               1000 non-null   object 3   national_rank         1000 non-null   int64  4   quality_of_education  1000 non-null   int64  5   alumni_employment     1000 non-null   int64  6   quality_of_faculty    1000 non-null   int64  7   publications          1000 non-null   int64  8   influence             1000 non-null   int64  9   citations             1000 non-null   int64  10  broad_impact          1000 non-null   float6411  patents               1000 non-null   int64  12  score                 1000 non-null   float64
dtypes: float64(2), int64(9), object(2)
memory usage: 109.4+ KB
data_2015['country'].value_counts()
USA                     229
China                    83
Japan                    74
United Kingdom           65
Germany                  55
France                   49
Italy                    47
Spain                    40
South Korea              36
Canada                   33
Australia                27
Taiwan                   21
Brazil                   18
India                    16
Netherlands              13
Austria                  12
Sweden                   11
Belgium                  10
Turkey                   10
Poland                    9
Switzerland               9
Finland                   9
Iran                      8
Ireland                   8
Portugal                  7
Greece                    7
Israel                    7
Hong Kong                 6
New Zealand               6
Hungary                   6
South Africa              5
Czech Republic            5
Denmark                   5
Russia                    5
Norway                    5
Egypt                     4
Saudi Arabia              4
Chile                     4
Thailand                  3
Argentina                 3
Malaysia                  3
Romania                   2
Slovenia                  2
Mexico                    2
Colombia                  2
Singapore                 2
Cyprus                    1
Estonia                   1
United Arab Emirates      1
Uganda                    1
Lebanon                   1
Croatia                   1
Serbia                    1
Lithuania                 1
Bulgaria                  1
Uruguay                   1
Slovak Republic           1
Iceland                   1
Puerto Rico               1
Name: country, dtype: int64

前五名依然是美国、中国(大陆地区)、日本、英国、德国,与2014年一致。

data_2015_China = data_2015[data_2015['country'] == 'China']
data_2015_China
world_rankinstitutioncountrynational_rankquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatentsscore
125556Peking UniversityChina11823821852131182125.02054.26
127778Tsinghua UniversityChina2309732186315865156.03052.21
1379180Shanghai Jiao Tong UniversityChina333514321881267212209.019847.96
1390191Zhejiang UniversityChina436730921871290368265.010647.66
1394195Fudan UniversityChina5367203218112252368204.012347.56
..........................................
2183984Zhejiang University of TechnologyChina79367567218858991812958.067244.04
2189990Henan Normal UniversityChina80367567218959991812958.087144.04
2190991Xidian UniversityChina81367542218830974812984.043444.03
2192993Southwest Jiaotong UniversityChina82367327218937962812998.086144.03
21991000China Pharmaceutical UniversityChina83367567218861991812981.054744.02

83 rows × 13 columns

for i in range(83):print('{:4d}: {}'.format(data_2015_China['world_rank'].tolist()[i], data_2015_China['institution'].tolist()[i]))
  56: Peking University78: Tsinghua University180: Shanghai Jiao Tong University191: Zhejiang University195: Fudan University239: University of Science and Technology of China244: Nanjing University277: Sun Yat-sen University305: Dalian University of Technology313: Xiamen University315: Nankai University331: Tianjin University389: Xi'an Jiaotong University391: Jilin University396: Beijing Normal University400: Huazhong University of Science and Technology410: Harbin Institute of Technology413: Shandong University415: Wuhan University420: Tongji University423: Peking Union Medical College438: Central South University439: South China University of Technology443: Southeast University477: East China University of Science and Technology479: Sichuan University496: Renmin University of China514: Lanzhou University530: University of Science and Technology Beijing553: Hunan University572: Central China Normal University586: East China Normal University615: Soochow University (Suzhou)627: China Agricultural University640: Northeast Normal University642: Beijing University of Chemical Technology659: Donghua University677: Wuhan University of Technology684: Shanghai University705: Beihang University706: Fuzhou University749: Beijing Institute of Technology774: Chongqing University776: Second Military Medical University783: University of Electronic Science and Technology of China788: Capital Medical University795: Zhengzhou University804: Nanjing University of Science and Technology810: Nanjing University of Technology828: Jiangnan University846: Ocean University of China858: Huazhong Agricultural University864: Nanjing Normal University866: Nanjing Agricultural University871: Northeastern University (China)877: Northwest University (China)883: Hefei University of Technology886: Nanjing Medical University888: Shanghai Normal University889: Southwest University895: Nanjing University of Aeronautics and Astronautics899: Fourth Military Medical University904: China University of Geosciences (Wuhan)924: South China Normal University943: Yangzhou University947: Jiangsu University948: Northwestern Polytechnical University949: Xiangtan University953: Harbin Engineering University955: Jinan University957: Third Military Medical University963: Southern Medical University969: South China Agricultural University972: Hunan Normal University975: Shenzhen University976: Tianjin Medical University977: Beijing University of Technology982: China Medical University (PRC)984: Zhejiang University of Technology990: Henan Normal University991: Xidian University993: Southwest Jiaotong University
1000: China Pharmaceutical University
plt.figure(figsize=(10, 10))
plt.hist(data_2015_China['world_rank'].tolist(), 10)
plt.xlabel('World Rank (2015)')
plt.ylabel('Count')
plt.show()

在这里插入图片描述

plt.figure(figsize=(10, 10))
sns.histplot(data_2014_China['world_rank'].tolist(), color='blue', label='2014')
sns.histplot(data_2015_China['world_rank'].tolist(), color='green', label='2015')
plt.legend()
plt.show()

2015年榜单共有83所来自中国(大陆地区)的高校入选,相较于2014年减少了1所。从上面的图表来看,大多数仍然分布于800-1000名,但相较于2014年,中国(大陆地区)高校的排名总体上呈现前进趋势。

data_2015_world_rank = data_2015.copy()
data_2015_world_rank.drop(['national_rank', 'score'], axis=1, inplace=True)
data_2015_world_rank
world_rankinstitutioncountryquality_of_educationalumni_employmentquality_of_facultypublicationsinfluencecitationsbroad_impactpatents
12001Harvard UniversityUSA1111111.03
12012Stanford UniversityUSA9245334.010
12023Massachusetts Institute of TechnologyUSA311215222.01
12034University of CambridgeUnited Kingdom21051161213.048
12045University of OxfordUnited Kingdom7131071279.015
....................................
2195996University of the AlgarvePortugal367567218926845812969.0816
2196997Alexandria UniversityEgypt236566218997908645981.0871
2197998Federal University of CearáBrazil367549218830823812975.0824
2198999University of A CoruñaSpain367567218886974812975.0651
21991000China Pharmaceutical UniversityChina367567218861991812981.0547

1000 rows × 11 columns

plt.figure(figsize=(10, 10))
sns.heatmap(data_2015_world_rank.corr(), annot=True)
<AxesSubplot:>

在这里插入图片描述
在2015年的榜单中,决定世界排名 (World Rank) 的几个主要因素为广泛影响力 (Broad Impact)、出版物 (Publications)、影响力 (Influence) 等,与2014年一致。

这篇关于【Kaggle数据分析实战练习】World University Rankings的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/588549

相关文章

网页解析 lxml 库--实战

lxml库使用流程 lxml 是 Python 的第三方解析库,完全使用 Python 语言编写,它对 XPath表达式提供了良好的支 持,因此能够了高效地解析 HTML/XML 文档。本节讲解如何通过 lxml 库解析 HTML 文档。 pip install lxml lxm| 库提供了一个 etree 模块,该模块专门用来解析 HTML/XML 文档,下面来介绍一下 lxml 库

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

C#实战|大乐透选号器[6]:实现实时显示已选择的红蓝球数量

哈喽,你好啊,我是雷工。 关于大乐透选号器在前面已经记录了5篇笔记,这是第6篇; 接下来实现实时显示当前选中红球数量,蓝球数量; 以下为练习笔记。 01 效果演示 当选择和取消选择红球或蓝球时,在对应的位置显示实时已选择的红球、蓝球的数量; 02 标签名称 分别设置Label标签名称为:lblRedCount、lblBlueCount

滚雪球学Java(87):Java事务处理:JDBC的ACID属性与实战技巧!真有两下子!

咦咦咦,各位小可爱,我是你们的好伙伴——bug菌,今天又来给大家普及Java SE啦,别躲起来啊,听我讲干货还不快点赞,赞多了我就有动力讲得更嗨啦!所以呀,养成先点赞后阅读的好习惯,别被干货淹没了哦~ 🏆本文收录于「滚雪球学Java」专栏,专业攻坚指数级提升,助你一臂之力,带你早日登顶🚀,欢迎大家关注&&收藏!持续更新中,up!up!up!! 环境说明:Windows 10

2014 Multi-University Training Contest 8小记

1002 计算几何 最大的速度才可能拥有无限的面积。 最大的速度的点 求凸包, 凸包上的点( 注意不是端点 ) 才拥有无限的面积 注意 :  凸包上如果有重点则不满足。 另外最大的速度为0也不行的。 int cmp(double x){if(fabs(x) < 1e-8) return 0 ;if(x > 0) return 1 ;return -1 ;}struct poin

2014 Multi-University Training Contest 7小记

1003   数学 , 先暴力再解方程。 在b进制下是个2 , 3 位数的 大概是10000进制以上 。这部分解方程 2-10000 直接暴力 typedef long long LL ;LL n ;int ok(int b){LL m = n ;int c ;while(m){c = m % b ;if(c == 3 || c == 4 || c == 5 ||

2014 Multi-University Training Contest 6小记

1003  贪心 对于111...10....000 这样的序列,  a 为1的个数,b为0的个数,易得当 x= a / (a + b) 时 f最小。 讲串分成若干段  1..10..0   ,  1..10..0 ,  要满足x非递减 。  对于 xi > xi+1  这样的合并 即可。 const int maxn = 100008 ;struct Node{int

RabbitMQ练习(AMQP 0-9-1 Overview)

1、What is AMQP 0-9-1 AMQP 0-9-1(高级消息队列协议)是一种网络协议,它允许遵从该协议的客户端(Publisher或者Consumer)应用程序与遵从该协议的消息中间件代理(Broker,如RabbitMQ)进行通信。 AMQP 0-9-1模型的核心概念包括消息发布者(producers/publisher)、消息(messages)、交换机(exchanges)、

【Rust练习】12.枚举

练习题来自:https://practice-zh.course.rs/compound-types/enum.html 1 // 修复错误enum Number {Zero,One,Two,}enum Number1 {Zero = 0,One,Two,}// C语言风格的枚举定义enum Number2 {Zero = 0.0,One = 1.0,Two = 2.0,}fn m

MySql 事务练习

事务(transaction) -- 事务 transaction-- 事务是一组操作的集合,是一个不可分割的工作单位,事务会将所有的操作作为一个整体一起向系统提交或撤销请求-- 事务的操作要么同时成功,要么同时失败-- MySql的事务默认是自动提交的,当执行一个DML语句,MySql会立即自动隐式提交事务-- 常见案例:银行转账-- 逻辑:A给B转账1000:1.查询