本文主要是介绍Datacamp 笔记代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python
Datacamp track: Data Scientist with Python - Course 23 (1)
Exercise
Clustering 2D points
From the scatter plot of the previous exercise, you saw that the points seem to separate into 3 clusters. You’ll now create a KMeans model to find 3 clusters, and fit it to the data points from the previous exercise. After the model has been fit, you’ll obtain the cluster labels for some new points using the .predict() method.
You are given the array points from the previous exercise, and also an array new_points.
Instruction
- Import
KMeansfromsklearn.cluster. - Using
KMeans(), create aKMeansinstance calledmodelto find3clusters. To specify the number of clusters, use then_clusterskeyword argument. - Use the
.fit()method ofmodelto fit the model to the array of pointspoints. - Use the
.predict()method ofmodelto predict the cluster labels ofnew_points, assigning the result tolabels. - Hit ‘Submit Answer’ to see the cluster labels of
new_points.
import pandas as pddf = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2072/datasets/3-point-clouds-in-2d.csv', header=None)
data = df.values
N = 300
points = data[:N,:]
new_points = data[N:,:]
# Import KMeans
from sklearn.cluster import KMeans# Create a KMeans instance with 3 clusters: model
model = KMeans(n_clusters=3)# Fit model to points
model.fit(points)# Determine the cluster labels of new_points: labels
labels = model.predict(new_points)# Print cluster labels of new_points
print(labels)
[0 2 1 0 2 0 2 2 2 1 0 2 2 1 1 2 1 1 2 2 1 2 0 2 0 1 2 1 1 0 0 2 2 2 1 0 22 0 2 1 0 0 1 0 2 1 1 2 2 2 2 1 1 0 0 1 1 1 0 0 2 2 2 0 2 1 2 0 1 0 0 0 20 1 1 0 2 1 0 1 0 2 1 2 1 0 2 2 2 0 2 2 0 1 1 1 1 0 2 0 1 1 0 0 2 0 1 1 01 1 1 2 2 2 2 1 1 2 0 2 1 2 0 1 2 1 1 2 1 2 1 0 2 0 0 2 1 0 2 0 0 1 2 2 01 0 1 2 0 1 1 0 1 2 2 1 2 1 1 2 2 0 2 2 1 0 1 0 0 2 0 2 2 0 0 1 0 0 0 1 22 0 1 0 1 1 2 2 2 0 2 2 2 1 1 0 2 0 0 0 1 2 2 2 2 2 2 1 1 2 1 1 1 1 2 1 12 2 0 1 0 0 1 0 1 0 1 2 2 1 2 2 2 1 0 0 1 2 2 1 2 1 1 2 1 1 0 1 0 0 0 2 11 1 0 2 0 1 0 1 1 2 0 0 0 1 2 2 2 0 2 1 1 2 0 0 1 0 0 1 0 2 0 1 1 1 1 2 11 2 2 0]
Exercise
Inspect your clustering
Let’s now inspect the clustering you performed in the previous exercise!
A solution to the previous exercise has already run, so new_points is an array of points and labels is the array of their cluster labels.
Instruction
- Import
matplotlib.pyplotasplt. - Assign column
0ofnew_pointstoxs, and column1ofnew_pointstoys. - Make a scatter plot of
xsandys, specifying thec=labelskeyword arguments to color the points by their cluster label. Also specifyalpha=0.5. - Compute the coordinates of the centroids using the
.cluster_centers_attribute ofmodel. - Assign column
0ofcentroidstocentroids_x, and column1ofcentroidstocentroids_y. - Make a scatter plot of
centroids_xandcentroids_y, using'D'(a diamond) as a marker by specifying themarkerparameter. Set the size of the markers to be50usings=50.
# Import pyplot
import matplotlib.pyplot as plt# Assign the columns of new_points: xs and ys
xs = new_points[:,0]
ys = new_points[:,1]# Make a scatter plot of xs and ys, using labels to define the colors
plt.scatter(xs, ys, c=labels, alpha=0.5)# Assign the cluster centers: centroids
centroids = model.cluster_centers_# Assign the columns of centroids: centroids_x, centroids_y
centroids_x = centroids[:,0]
centroids_y = centroids
这篇关于Datacamp 笔记代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!