本文主要是介绍K近邻KNeighborsRegressor--StandardScaler标准化--mean_squared_error均方根误差 学习笔记,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
目录
- np.abs()函数
- pd.sample()参数含义
- pd.str同时去掉分隔符和货币符号
- standarscaler注意点
- scipy.spatial中distance距离工具
- 两点之间的距离
- 两个数据之间的距离
- 使用Sklearn计算距离
- sklearn 计算均方根误差
- sklearn标准化
- K近邻模型
- 多变量knn模型测试
- K近邻
np.abs()函数
np.abs() : 计算数值各元素运算的绝对值
import numpy as np
np.abs([1,-2,3,-4])Out:
array([1, 2, 3, 4])
- np.sqrt() : 计算平方根
- np.square() : 计算平方
pd.sample()参数含义
DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)[source]
DataFrame可以是Series、DataFrame
- n的含义是抽样的个数,是整数;frac是浮点数,是抽样的比例
- replace为True含义为数据本身改变,为False含义为数据本身未改变 , 需要定义新变量接收
- weights的含义是给抽样所在axis的每个元素赋值抽样权重,所以weights的长度必须和所在axis的长度相同,不然会报错,缺失值的weights会被设置为0,如果weights加和不等于1,会被normalized到加和为1,inf和-inf值不被允许
- axis的含义是抽样的方向,axis=0,对行进行抽样,axis=1,对列进行抽样
- random_state是用来复现结果的
pd.str同时去掉分隔符和货币符号
DataFrame数据类型有千位分隔符和货币符 , 即为"$10,000.00"时 , 怎么转换为int或者float类型 :
- df.str.replace("$|," , “”).astype(float)
standarscaler注意点
- StandarScaler().fit_transform(a) 中的a不能直接是一个DataFrame , 而要是DataFrame中的具体几列
scipy.spatial中distance距离工具
from scipy.spatial import distance
两点之间的距离
- 使用distance.euclidean(a,b)
两个数据之间的距离
- 使用distance.cdist ( m[a] , n[b] )
使用Sklearn计算距离
from sklearn.neighbors import KNeigborsRegressor
knn = KNeighborsRegressor()
cols = ['A','B']
knn.fit(train_data[cols],train_data['C'])
ret = knn.predict(test_df[cols]
sklearn 计算均方根误差
from sklearn.metrics import mean_squared_error
mse = mean_squared_error( test_data['C'], ret)
rmse = mse ** (1/2)
sklearn标准化
from sklearn.preprocessing import StandarScater
data[features] = StandardScaler().fit_transform(data[features])
features为数据data的columns
K近邻模型
多变量knn模型测试
import pandas as pd
from sklearn.preprocessing import StandardScalerdc_data=pd.read_csv('listings.csv')
features=['accommodates','bedrooms','bathrooms','beds','price','minimum_nights','maximum_nights','number_of_reviews']
dc_data = dc_data[features]
dc_data = dc_data.dropna() #去掉NaN数据
dc_data['price'] = dc_data.price.str.replace('\$|,','').astype(float) #将货币千分位数转换为float类数据
dc_data[features] = StandardScaler().fit_transform(dc_data[features]) #标准化
normalized_data = dc_data# 建立训练集和测试集
norm_train_df = normalized_data.copy().iloc[:2750]
norm_test_df = normalized_data.copy().iloc[2750:]from scipy.spatial import distancedef predict_price_multivariate(new_data_value,feature_columns):temp_df = norm_train_dftemp_df['distance'] = distance.cdist(temp_df[feature_columns],[new_data_value[feature_columns]])temp_df = temp_df.sort_values('distance')knn_5 = temp_df.price.iloc[:5]predicted_price = knn_5.mean()return predicted_price
# K近邻预测价格的函数cols = ['accommodates', 'bathrooms']
norm_test_df['predicted_price'] = norm_test_df[cols].apply(predict_price_multivariate,feature_columns=cols,axis=1)
norm_test_df['squared_error'] = (norm_test_df['predicted_price']-norm_test_df['price'])**2
mse = norm_test_df['squared_error'].mean()
rmse = mse ** (1/2)
# 最终获得RMSE均方根误差值
0.15274960512174854
K近邻
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressorfeatures = ['accommodates','bedrooms','bathrooms','beds','price','minimum_nights','maximum_nights','number_of_reviews']
dc_data = pd.read_csv('listings.csv')
dc_data = dc_data[features]
dc_data['price'] = dc_data.price.str.replace('\$|,','').astype(float)
dc_data = dc_data.dropna()dc_data[features] = StandardScaler().fit_transform(dc_data[features])
normalized_data = dc_data
# 标准化norm_train_df = normalized_data.copy().iloc[:2750]
norm_test_df = normalized_data.copy().iloc[2750:]cols = ['accommodates','bedrooms','bathrooms','beds','minimum_nights','maximum_nights','number_of_reviews']
knn = KNeighborsRegressor(n_neighbors=4) #默认n_neighbors值为5
knn.fit(norm_train_df[cols], norm_train_df['price'])
four_features_predictions = knn.predict(norm_test_df[cols])from sklearn.metrics import mean_squared_error
four_features_mse = mean_squared_error(norm_test_df['price'],four_features_predictions)
four_features_rmse = four_features_mse ** (1/2)
four_features_rmse
0.823713617207827
这篇关于K近邻KNeighborsRegressor--StandardScaler标准化--mean_squared_error均方根误差 学习笔记的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!