本文主要是介绍Datacamp 笔记代码 Unsupervised Learning in Python 第三章 Decorrelating your data and dimension reduction,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python
Datacamp track: Data Scientist with Python - Course 23 (3)
Exercise
Correlated data in nature
You are given an array grains
giving the width and length of samples of grain. You suspect that width and length will be correlated. To confirm this, make a scatter plot of width vs length and measure their Pearson correlation.
Instruction
- Import:
matplotlib.pyplot
asplt
.pearsonr
fromscipy.stats
.
- Assign column
0
ofgrains
towidth
and column1
ofgrains
tolength
. - Make a scatter plot with
width
on the x-axis andlength
on the y-axis. - Use the
pearsonr()
function to calculate the Pearson correlation ofwidth
andlength
.
import pandas as pdgrains = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2141/datasets/seeds-width-vs-length.csv', header=None).values
# Perform the necessary imports
import matplotlib.pyplot as plt
from scipy.stats import pearsonr# Assign the 0th column of grains: width
width = grains[:,0]# Assign the 1st column of grains: length
length = grains[:,1]# Scatter plot width vs length
plt.scatter(width, length)
plt.axis('equal')
plt.show()# Calculate the Pearson correlation
correlation, pvalue = pearsonr(width, length)# Display the correlation
print(correlation)
0.8604149377143467
Exercise
Decorrelating the grain measurements with PCA
You observed in the previous exercise that the width and length measurements of the grain are correlated. Now, you’ll use PCA to decorrelate these measurements, then plot the decorrelated points and measure their Pearson correlation.
Instruction
- Import
PCA
fromsklearn.decomposition
. - Create an instance of
PCA
calledmodel
. - Use the
.fit_transform()
method ofmodel
to apply the PCA transformation tograins
. Assign the result topca_features
. - The subsequent code to extract, plot, and compute the Pearson correlation of the first two columns
pca_features
has been written for you, so hit ‘Submit Answer’ to see the result!
# Import PCA
from sklearn.decomposition import PCA# Create PCA instance: model
model = PCA()# Apply the fit_transform method of model to grains: pca_features
pca_features = model.fit_transform(grains)# Assign 0th column of pca_features: xs
xs = pca_features[:,0]# Assign 1st column of pca_features: ys
ys = pca_features[:,1]# Scatter plot xs vs ys
plt.scatter(xs, ys)
plt.axis('equal')
plt.show()# Calculate the Pearson correlation of xs and ys
correlation, pvalue = pearsonr(xs, ys)# Display the correlation
print(correlation
这篇关于Datacamp 笔记代码 Unsupervised Learning in Python 第三章 Decorrelating your data and dimension reduction的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!