基于机器学习预测未来的二氧化碳排放量(随机森林和XGBoost)

本文主要是介绍基于机器学习预测未来的二氧化碳排放量(随机森林和XGBoost),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

基于机器学习预测未来的二氧化碳排放量(随机森林和XGBoost)

简介:

CO2排放是当今全球关注的环境问题之一。本文将使用Python对OWID提供的CO2排放数据集进行分析,并尝试构建机器学习模型来预测未来的CO2排放趋势。我们将探索数据集中的CO2排放情况,分析各国/地区的排放趋势,并利用机器学习算法来预测未来的CO2排放量。

1. 数据集介绍:

我们使用的数据集是OWID(Our World in Data)提供的CO2排放数据集。该数据集包含了各国/地区自1949年至2020年的CO2排放量以及相关的经济、人口等数据。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
dataset = pd.read_csv('owid-co2-data.csv')
print(dataset.head()) #显示数据集的前5行。 
print(dataset.shape) #显示数据集的行数和列数
  iso_code      country  year    co2  consumption_co2  co2_growth_prct  \
0      AFG  Afghanistan  1949  0.015              NaN              NaN   
1      AFG  Afghanistan  1950  0.084              NaN            475.0   
2      AFG  Afghanistan  1951  0.092              NaN              8.7   
3      AFG  Afghanistan  1952  0.092              NaN              0.0   
4      AFG  Afghanistan  1953  0.106              NaN             16.0   co2_growth_abs  trade_co2  co2_per_capita  consumption_co2_per_capita  ...  \
0             NaN        NaN           0.002                         NaN  ...   
1           0.070        NaN           0.011                         NaN  ...   
2           0.007        NaN           0.012                         NaN  ...   
3           0.000        NaN           0.012                         NaN  ...   
4           0.015        NaN           0.013                         NaN  ...   ghg_per_capita  methane  methane_per_capita  nitrous_oxide  \
0             NaN      NaN                 NaN            NaN   
1             NaN      NaN                 NaN            NaN   
2             NaN      NaN                 NaN            NaN   
3             NaN      NaN                 NaN            NaN   
4             NaN      NaN                 NaN            NaN   nitrous_oxide_per_capita  population           gdp  \
0                       NaN   7624058.0           NaN   
1                       NaN   7752117.0  9.421400e+09   
2                       NaN   7840151.0  9.692280e+09   
3                       NaN   7935996.0  1.001732e+10   
4                       NaN   8039684.0  1.063052e+10   primary_energy_consumption  energy_per_capita  energy_per_gdp  
0                         NaN                NaN             NaN  
1                         NaN                NaN             NaN  
2                         NaN                NaN             NaN  
3                         NaN                NaN             NaN  
4                         NaN                NaN             NaN  [5 rows x 58 columns]
(25204, 58)

2. 数据预处理:

df = dataset.drop(columns=[ 'consumption_co2','co2_growth_prct','co2_growth_abs' ])
df.head()
iso_codecountryyearco2trade_co2co2_per_capitaconsumption_co2_per_capitashare_global_co2cumulative_co2share_global_cumulative_co2...ghg_per_capitamethanemethane_per_capitanitrous_oxidenitrous_oxide_per_capitapopulationgdpprimary_energy_consumptionenergy_per_capitaenergy_per_gdp
0AFGAfghanistan19490.015NaN0.002NaN0.00.0150.0...NaNNaNNaNNaNNaN7624058.0NaNNaNNaNNaN
1AFGAfghanistan19500.084NaN0.011NaN0.00.0990.0...NaNNaNNaNNaNNaN7752117.09.421400e+09NaNNaNNaN
2AFGAfghanistan19510.092NaN0.012NaN0.00.1910.0...NaNNaNNaNNaNNaN7840151.09.692280e+09NaNNaNNaN
3AFGAfghanistan19520.092NaN0.012NaN0.00.2820.0...NaNNaNNaNNaNNaN7935996.01.001732e+10NaNNaNNaN
4AFGAfghanistan19530.106NaN0.013NaN0.00.3880.0...NaNNaNNaNNaNNaN8039684.01.063052e+10NaNNaNNaN

5 rows × 55 columns

df1 = df[['country', 'year','co2','coal_co2','cement_co2', 'flaring_co2','gas_co2','oil_co2', 'other_industry_co2','methane', 'nitrous_oxide', 'population' ]]
df1
countryyearco2coal_co2cement_co2flaring_co2gas_co2oil_co2other_industry_co2methanenitrous_oxidepopulation
0Afghanistan19490.0150.015NaNNaNNaNNaNNaNNaNNaN7624058.0
1Afghanistan19500.0840.021NaNNaNNaN0.063NaNNaNNaN7752117.0
2Afghanistan19510.0920.026NaNNaNNaN0.066NaNNaNNaN7840151.0
3Afghanistan19520.0920.032NaNNaNNaN0.060NaNNaNNaN7935996.0
4Afghanistan19530.1060.038NaNNaNNaN0.068NaNNaNNaN8039684.0
.......................................
25199Zimbabwe201610.7386.9590.639NaNNaN3.139NaN11.926.5514030338.0
25200Zimbabwe20179.5825.6650.678NaNNaN3.239NaNNaNNaN14236599.0
25201Zimbabwe201811.8547.1010.697NaNNaN4.056NaNNaNNaN14438812.0
25202Zimbabwe201910.9496.0200.697NaNNaN4.232NaNNaNNaN14645473.0
25203Zimbabwe202010.5316.2570.697NaNNaN3.576NaNNaNNaN14862927.0

25204 rows × 12 columns

final_df = df1[df1['year' ]>1995]
final_df
countryyearco2coal_co2cement_co2flaring_co2gas_co2oil_co2other_industry_co2methanenitrous_oxidepopulation
47Afghanistan19961.1650.0070.0470.0220.3080.780NaN9.933.2918853444.0
48Afghanistan19971.0840.0040.0470.0220.2830.728NaN10.603.5919357126.0
49Afghanistan19981.0290.0040.0470.0220.2650.691NaN11.103.8819737770.0
50Afghanistan19990.8100.0040.0470.0220.2420.495NaN11.874.1520170847.0
51Afghanistan20000.7580.0040.0100.0220.2240.498NaN10.593.6220779957.0
.......................................
25199Zimbabwe201610.7386.9590.639NaNNaN3.139NaN11.926.5514030338.0
25200Zimbabwe20179.5825.6650.678NaNNaN3.239NaNNaNNaN14236599.0
25201Zimbabwe201811.8547.1010.697NaNNaN4.056NaNNaNNaN14438812.0
25202Zimbabwe201910.9496.0200.697NaNNaN4.232NaNNaNNaN14645473.0
25203Zimbabwe202010.5316.2570.697NaNNaN3.576NaNNaNNaN14862927.0

6073 rows × 12 columns

final_df = final_df[(final_df['country'].isin(['United States', 'Africa', 'Antartica','South Korea', 'Bangladesh', 'Canada', 'Germany', 'Brazil', 'Argentina','Japan', 'India', 'United Kingdom', 'Saudi Arabia', 'China', 'Australia','Russia']) & (final_df['co2'] > 0))]
final_df
countryyearco2coal_co2cement_co2flaring_co2gas_co2oil_co2other_industry_co2methanenitrous_oxidepopulation
184Africa1996783.254353.13027.68123.787108.019270.637NaNNaNNaN735361106.0
185Africa1997812.903360.83728.35023.39495.205305.117NaNNaNNaN753737584.0
186Africa1998838.022355.51429.20322.961112.712317.632NaNNaNNaN772437161.0
187Africa1999830.397366.52330.31123.569114.377295.618NaNNaNNaN791504165.0
188Africa2000886.562370.24731.51055.282114.350315.173NaNNaNNaN810984230.0
.......................................
24063United States20165248.0241379.74439.43951.9081502.4752246.52427.933629.38251.7323015992.0
24064United States20175207.7511338.66740.32456.1861480.0592265.32627.190NaNNaN325084758.0
24065United States20185375.4911283.53238.97171.0081641.0412316.81124.128NaNNaN327096263.0
24066United States20195255.8161098.85440.89684.5101694.8942313.37223.291NaNNaN329064917.0
24067United States20204712.771888.64940.79584.5101654.9882020.53823.291NaNNaN331002647.0

375 rows × 12 columns

final_df.isnull().sum()
country                 0
year                    0
co2                     0
coal_co2               25
cement_co2              0
flaring_co2            77
gas_co2                 0
oil_co2                 0
other_industry_co2    125
methane                81
nitrous_oxide          81
population              0
dtype: int64

3.数据可视化

我们将根据我们的数据集绘制图表并分析一些结果。 我们绘制一下随时间线的co2排放趋势图:

px.line(dataset, x = 'year', y = 'co2', color='country')

在这里插入图片描述

dataset = dataset.dropna(subset=['co2'])px.scatter(dataset[dataset['year']==2019], x="co2_per_capita", y="energy_per_capita", size="co2", color="country", hover_name="country", log_x=True, size_max=60)

在这里插入图片描述

continent_data =  dataset[(dataset['country'].isin(['Europe', 'Africa', 'North America', 'South America', 'Oceania', 'Asia'])) & (dataset['co2'] > 0)]
continent_data
iso_codecountryyearco2consumption_co2co2_growth_prctco2_growth_abstrade_co2co2_per_capitaconsumption_co2_per_capita...ghg_per_capitamethanemethane_per_capitanitrous_oxidenitrous_oxide_per_capitapopulationgdpprimary_energy_consumptionenergy_per_capitaenergy_per_gdp
72NaNAfrica18840.022NaNNaNNaNNaN0.005NaN...NaNNaNNaNNaNNaN130848603.0NaNNaNNaNNaN
73NaNAfrica18850.037NaN66.670.015NaN0.008NaN...NaNNaNNaNNaNNaN131563803.0NaNNaNNaNNaN
74NaNAfrica18860.048NaN30.000.011NaN0.010NaN...NaNNaNNaNNaNNaN132284841.0NaNNaNNaNNaN
75NaNAfrica18870.048NaN0.000.000NaN0.010NaN...NaNNaNNaNNaNNaN133011765.0NaNNaNNaNNaN
76NaNAfrica18880.081NaN69.230.033NaN0.017NaN...NaNNaNNaNNaNNaN133744628.0NaNNaNNaNNaN
..................................................................
20888NaNSouth America20161164.8981240.096-3.32-40.06475.1982.7992.980...NaNNaNNaNNaNNaN416164871.0NaNNaNNaNNaN
20889NaNSouth America20171156.7341238.620-0.70-8.16481.8862.7552.950...NaNNaNNaNNaNNaN419903920.0NaNNaNNaNNaN
20890NaNSouth America20181091.4501173.851-5.64-65.28482.4012.5772.771...NaNNaNNaNNaNNaN423581098.0NaNNaNNaNNaN
20891NaNSouth America20191065.5101139.737-2.38-25.94074.2282.4942.668...NaNNaNNaNNaNNaN427199425.0NaNNaNNaNNaN
20892NaNSouth America2020994.160NaN-6.70-71.349NaN2.308NaN...NaNNaNNaNNaNNaN430759771.0NaNNaNNaNNaN

1111 rows × 58 columns

px.pie(final_df, names='country', values='co2')

在这里插入图片描述

final_df_2020 = final_df[(final_df[ 'year' ]==2020) ]
final_df_2020
final_df_2020[['country','coal_co2','cement_co2','flaring_co2','gas_co2', 'oil_co2','other_industry_co2']].plot(x='country', kind='bar',figsize=(9,5),width=0.9)
plt.title('2020 CO2 consumption')
plt.xlabel('Countries' )
plt.ylabel('CO2 measured in million tonnes')

在这里插入图片描述

print(dataset.info)
print(dataset.head())
<bound method DataFrame.info of       iso_code      country  year     co2  consumption_co2  co2_growth_prct  \
0          AFG  Afghanistan  1949   0.015              NaN              NaN   
1          AFG  Afghanistan  1950   0.084              NaN           475.00   
2          AFG  Afghanistan  1951   0.092              NaN             8.70   
3          AFG  Afghanistan  1952   0.092              NaN             0.00   
4          AFG  Afghanistan  1953   0.106              NaN            16.00   
...        ...          ...   ...     ...              ...              ...   
25199      ZWE     Zimbabwe  2016  10.738           12.153           -12.17   
25200      ZWE     Zimbabwe  2017   9.582           11.248           -10.77   
25201      ZWE     Zimbabwe  2018  11.854           13.163            23.72   
25202      ZWE     Zimbabwe  2019  10.949           12.422            -7.64   
25203      ZWE     Zimbabwe  2020  10.531              NaN            -3.82   co2_growth_abs  trade_co2  co2_per_capita  consumption_co2_per_capita  \
0                 NaN        NaN           0.002                         NaN   
1               0.070        NaN           0.011                         NaN   
2               0.007        NaN           0.012                         NaN   
3               0.000        NaN           0.012                         NaN   
4               0.015        NaN           0.013                         NaN   
...               ...        ...             ...                         ...   
25199          -1.488      1.415           0.765                       0.866   
25200          -1.156      1.666           0.673                       0.790   
25201           2.273      1.308           0.821                       0.912   
25202          -0.905      1.473           0.748                       0.848   
25203          -0.418        NaN           0.709                         NaN   ...  ghg_per_capita  methane  methane_per_capita  nitrous_oxide  \
0      ...             NaN      NaN                 NaN            NaN   
1      ...             NaN      NaN                 NaN            NaN   
2      ...             NaN      NaN                 NaN            NaN   
3      ...             NaN      NaN                 NaN            NaN   
4      ...             NaN      NaN                 NaN            NaN   
...    ...             ...      ...                 ...            ...   
25199  ...           4.703    11.92                0.85           6.55   
25200  ...             NaN      NaN                 NaN            NaN   
25201  ...             NaN      NaN                 NaN            NaN   
25202  ...             NaN      NaN                 NaN            NaN   
25203  ...             NaN      NaN                 NaN            NaN   nitrous_oxide_per_capita  population           gdp  \
0                           NaN   7624058.0           NaN   
1                           NaN   7752117.0  9.421400e+09   
2                           NaN   7840151.0  9.692280e+09   
3                           NaN   7935996.0  1.001732e+10   
4                           NaN   8039684.0  1.063052e+10   
...                         ...         ...           ...   
25199                     0.467  14030338.0  2.096179e+10   
25200                       NaN  14236599.0  2.194784e+10   
25201                       NaN  14438812.0  2.271535e+10   
25202                       NaN  14645473.0           NaN   
25203                       NaN  14862927.0           NaN   primary_energy_consumption  energy_per_capita  energy_per_gdp  
0                             NaN                NaN             NaN  
1                             NaN                NaN             NaN  
2                             NaN                NaN             NaN  
3                             NaN                NaN             NaN  
4                             NaN                NaN             NaN  
...                           ...                ...             ...  
25199                        47.5           3385.574           1.889  
25200                         NaN                NaN             NaN  
25201                         NaN                NaN             NaN  
25202                         NaN                NaN             NaN  
25203                         NaN                NaN             NaN  [23949 rows x 58 columns]>iso_code      country  year    co2  consumption_co2  co2_growth_prct  \
0      AFG  Afghanistan  1949  0.015              NaN              NaN   
1      AFG  Afghanistan  1950  0.084              NaN            475.0   
2      AFG  Afghanistan  1951  0.092              NaN              8.7   
3      AFG  Afghanistan  1952  0.092              NaN              0.0   
4      AFG  Afghanistan  1953  0.106              NaN             16.0   co2_growth_abs  trade_co2  co2_per_capita  consumption_co2_per_capita  ...  \
0             NaN        NaN           0.002                         NaN  ...   
1           0.070        NaN           0.011                         NaN  ...   
2           0.007        NaN           0.012                         NaN  ...   
3           0.000        NaN           0.012                         NaN  ...   
4           0.015        NaN           0.013                         NaN  ...   ghg_per_capita  methane  methane_per_capita  nitrous_oxide  \
0             NaN      NaN                 NaN            NaN   
1             NaN      NaN                 NaN            NaN   
2             NaN      NaN                 NaN            NaN   
3             NaN      NaN                 NaN            NaN   
4             NaN      NaN                 NaN            NaN   nitrous_oxide_per_capita  population           gdp  \
0                       NaN   7624058.0           NaN   
1                       NaN   7752117.0  9.421400e+09   
2                       NaN   7840151.0  9.692280e+09   
3                       NaN   7935996.0  1.001732e+10   
4                       NaN   8039684.0  1.063052e+10   primary_energy_consumption  energy_per_capita  energy_per_gdp  
0                         NaN                NaN             NaN  
1                         NaN                NaN             NaN  
2                         NaN                NaN             NaN  
3                         NaN                NaN             NaN  
4                         NaN                NaN             NaN  [5 rows x 58 columns]

4. 数据预处理:

在进行数据分析之前,我们需要对数据进行预处理,包括处理缺失值和选择感兴趣的特征列。

features = ['year', 'population', 'gdp', 'primary_energy_consumption', 'energy_per_capita', 'energy_per_gdp']
target = 'co2'
data = dataset[features + [target]].dropna()

5. 机器学习建模:

我们将尝试使用随机森林回归和XGBoost算法建立CO2排放的预测模型,并评估其性能。

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error# 分割数据集
X = data[features]
y = data[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)# 随机森林回归
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
y_pred_rf = rf_model.predict(X_test)
mse_rf = mean_squared_error(y_test, y_pred_rf)
print("随机森林回归均方误差:", mse_rf)
随机森林回归均方误差: 5564.344690878827
from xgboost import XGBRegressor# XGBoost 回归
xgb_model = XGBRegressor(n_estimators=100, random_state=42)
xgb_model.fit(X_train, y_train)
y_pred_xgb = xgb_model.predict(X_test)
mse_xgb = mean_squared_error(y_test, y_pred_xgb)
print("XGBoost 回归均方误差:", mse_xgb)# 查看测试集的预测结果
results_rf = pd.DataFrame({'Actual': y_test, 'Predicted_RF': y_pred_rf})
results_xgb = pd.DataFrame({'Actual': y_test, 'Predicted_XGB': y_pred_xgb})print("随机森林回归预测结果:\n", results_rf.head())
print("XGBoost 回归预测结果:\n", results_xgb.head())
XGBoost 回归均方误差: 6550.179319235124
随机森林回归预测结果:Actual  Predicted_RF
4678   1183.215    1140.30654
22293     2.536       3.88285
1552    187.609     171.56567
22409     7.440       9.76497
11248   129.475      76.18446
XGBoost 回归预测结果:Actual  Predicted_XGB
4678   1183.215    1219.047852
22293     2.536       5.083570
1552    187.609     184.427612
22409     7.440       9.800858
11248   129.475      98.655746

6.结论:

通过对CO2排放数据集的分析和机器学习建模,我们可以更好地理解全球CO2排放的情况,并为未来的环境保护和可持续发展提供数据支持。未来的研究可以进一步探索CO2排放与气候变化、经济增长等因素之间的关系,并提出相应的政策建议。

如有遇到问题可以找小编沟通交流哦。另外小编帮忙辅导大课作业,学生毕设等。不限于MapReduce, MySQL, python,java,大数据,模型训练等。 hadoop hdfs yarn spark Django flask flink kafka flume datax sqoop seatunnel echart可视化 机器学习等
在这里插入图片描述

这篇关于基于机器学习预测未来的二氧化碳排放量(随机森林和XGBoost)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1001167

相关文章

Python中的随机森林算法与实战

《Python中的随机森林算法与实战》本文详细介绍了随机森林算法,包括其原理、实现步骤、分类和回归案例,并讨论了其优点和缺点,通过面向对象编程实现了一个简单的随机森林模型,并应用于鸢尾花分类和波士顿房... 目录1、随机森林算法概述2、随机森林的原理3、实现步骤4、分类案例:使用随机森林预测鸢尾花品种4.1

使用C#如何创建人名或其他物体随机分组

《使用C#如何创建人名或其他物体随机分组》文章描述了一个随机分配人员到多个团队的代码示例,包括将人员列表随机化并根据组数分配到不同组,最后按组号排序显示结果... 目录C#创建人名或其他物体随机分组此示例使用以下代码将人员分配到组代码首先将lstPeople ListBox总结C#创建人名或其他物体随机分组

HarmonyOS学习(七)——UI(五)常用布局总结

自适应布局 1.1、线性布局(LinearLayout) 通过线性容器Row和Column实现线性布局。Column容器内的子组件按照垂直方向排列,Row组件中的子组件按照水平方向排列。 属性说明space通过space参数设置主轴上子组件的间距,达到各子组件在排列上的等间距效果alignItems设置子组件在交叉轴上的对齐方式,且在各类尺寸屏幕上表现一致,其中交叉轴为垂直时,取值为Vert

Ilya-AI分享的他在OpenAI学习到的15个提示工程技巧

Ilya(不是本人,claude AI)在社交媒体上分享了他在OpenAI学习到的15个Prompt撰写技巧。 以下是详细的内容: 提示精确化:在编写提示时,力求表达清晰准确。清楚地阐述任务需求和概念定义至关重要。例:不用"分析文本",而用"判断这段话的情感倾向:积极、消极还是中性"。 快速迭代:善于快速连续调整提示。熟练的提示工程师能够灵活地进行多轮优化。例:从"总结文章"到"用

【前端学习】AntV G6-08 深入图形与图形分组、自定义节点、节点动画(下)

【课程链接】 AntV G6:深入图形与图形分组、自定义节点、节点动画(下)_哔哩哔哩_bilibili 本章十吾老师讲解了一个复杂的自定义节点中,应该怎样去计算和绘制图形,如何给一个图形制作不间断的动画,以及在鼠标事件之后产生动画。(有点难,需要好好理解) <!DOCTYPE html><html><head><meta charset="UTF-8"><title>06

学习hash总结

2014/1/29/   最近刚开始学hash,名字很陌生,但是hash的思想却很熟悉,以前早就做过此类的题,但是不知道这就是hash思想而已,说白了hash就是一个映射,往往灵活利用数组的下标来实现算法,hash的作用:1、判重;2、统计次数;

JAVA智听未来一站式有声阅读平台听书系统小程序源码

智听未来,一站式有声阅读平台听书系统 🌟&nbsp;开篇:遇见未来,从“智听”开始 在这个快节奏的时代,你是否渴望在忙碌的间隙,找到一片属于自己的宁静角落?是否梦想着能随时随地,沉浸在知识的海洋,或是故事的奇幻世界里?今天,就让我带你一起探索“智听未来”——这一站式有声阅读平台听书系统,它正悄悄改变着我们的阅读方式,让未来触手可及! 📚&nbsp;第一站:海量资源,应有尽有 走进“智听

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]

【机器学习】高斯过程的基本概念和应用领域以及在python中的实例

引言 高斯过程(Gaussian Process,简称GP)是一种概率模型,用于描述一组随机变量的联合概率分布,其中任何一个有限维度的子集都具有高斯分布 文章目录 引言一、高斯过程1.1 基本定义1.1.1 随机过程1.1.2 高斯分布 1.2 高斯过程的特性1.2.1 联合高斯性1.2.2 均值函数1.2.3 协方差函数(或核函数) 1.3 核函数1.4 高斯过程回归(Gauss

【学习笔记】 陈强-机器学习-Python-Ch15 人工神经网络(1)sklearn

系列文章目录 监督学习:参数方法 【学习笔记】 陈强-机器学习-Python-Ch4 线性回归 【学习笔记】 陈强-机器学习-Python-Ch5 逻辑回归 【课后题练习】 陈强-机器学习-Python-Ch5 逻辑回归(SAheart.csv) 【学习笔记】 陈强-机器学习-Python-Ch6 多项逻辑回归 【学习笔记 及 课后题练习】 陈强-机器学习-Python-Ch7 判别分析 【学