本文主要是介绍Datacamp 笔记代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
更多原始数据文档和JupyterNotebook
Github: https://github.com/JinnyR/Datacamp_DataScienceTrack_Python
Datacamp track: Data Scientist with Python - Course 23 (1)
Exercise
Clustering 2D points
From the scatter plot of the previous exercise, you saw that the points seem to separate into 3 clusters. You’ll now create a KMeans model to find 3 clusters, and fit it to the data points from the previous exercise. After the model has been fit, you’ll obtain the cluster labels for some new points using the .predict()
method.
You are given the array points
from the previous exercise, and also an array new_points
.
Instruction
- Import
KMeans
fromsklearn.cluster
. - Using
KMeans()
, create aKMeans
instance calledmodel
to find3
clusters. To specify the number of clusters, use then_clusters
keyword argument. - Use the
.fit()
method ofmodel
to fit the model to the array of pointspoints
. - Use the
.predict()
method ofmodel
to predict the cluster labels ofnew_points
, assigning the result tolabels
. - Hit ‘Submit Answer’ to see the cluster labels of
new_points
.
import pandas as pddf = pd.read_csv('https://s3.amazonaws.com/assets.datacamp.com/production/course_2072/datasets/3-point-clouds-in-2d.csv', header=None)
data = df.values
N = 300
points = data[:N,:]
new_points = data[N:,:]
# Import KMeans
from sklearn.cluster import KMeans# Create a KMeans instance with 3 clusters: model
model = KMeans(n_clusters=3)# Fit model to points
model.fit(points)# Determine the cluster labels of new_points: labels
labels = model.predict(new_points)# Print cluster labels of new_points
print(labels)
[0 2 1 0 2 0 2 2 2 1 0 2 2 1 1 2 1 1 2 2 1 2 0 2 0 1 2 1 1 0 0 2 2 2 1 0 22 0 2 1 0 0 1 0 2 1 1 2 2 2 2 1 1 0 0 1 1 1 0 0 2 2 2 0 2 1 2 0 1 0 0 0 20 1 1 0 2 1 0 1 0 2 1 2 1 0 2 2 2 0 2 2 0 1 1 1 1 0 2 0 1 1 0 0 2 0 1 1 01 1 1 2 2 2 2 1 1 2 0 2 1 2 0 1 2 1 1 2 1 2 1 0 2 0 0 2 1 0 2 0 0 1 2 2 01 0 1 2 0 1 1 0 1 2 2 1 2 1 1 2 2 0 2 2 1 0 1 0 0 2 0 2 2 0 0 1 0 0 0 1 22 0 1 0 1 1 2 2 2 0 2 2 2 1 1 0 2 0 0 0 1 2 2 2 2 2 2 1 1 2 1 1 1 1 2 1 12 2 0 1 0 0 1 0 1 0 1 2 2 1 2 2 2 1 0 0 1 2 2 1 2 1 1 2 1 1 0 1 0 0 0 2 11 1 0 2 0 1 0 1 1 2 0 0 0 1 2 2 2 0 2 1 1 2 0 0 1 0 0 1 0 2 0 1 1 1 1 2 11 2 2 0]
Exercise
Inspect your clustering
Let’s now inspect the clustering you performed in the previous exercise!
A solution to the previous exercise has already run, so new_points
is an array of points and labels
is the array of their cluster labels.
Instruction
- Import
matplotlib.pyplot
asplt
. - Assign column
0
ofnew_points
toxs
, and column1
ofnew_points
toys
. - Make a scatter plot of
xs
andys
, specifying thec=labels
keyword arguments to color the points by their cluster label. Also specifyalpha=0.5
. - Compute the coordinates of the centroids using the
.cluster_centers_
attribute ofmodel
. - Assign column
0
ofcentroids
tocentroids_x
, and column1
ofcentroids
tocentroids_y
. - Make a scatter plot of
centroids_x
andcentroids_y
, using'D'
(a diamond) as a marker by specifying themarker
parameter. Set the size of the markers to be50
usings=50
.
# Import pyplot
import matplotlib.pyplot as plt# Assign the columns of new_points: xs and ys
xs = new_points[:,0]
ys = new_points[:,1]# Make a scatter plot of xs and ys, using labels to define the colors
plt.scatter(xs, ys, c=labels, alpha=0.5)# Assign the cluster centers: centroids
centroids = model.cluster_centers_# Assign the columns of centroids: centroids_x, centroids_y
centroids_x = centroids[:,0]
centroids_y = centroids
这篇关于Datacamp 笔记代码 Unsupervised Learning in Python 第一章 Clustering for dataset exploration的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!