Datacamp 笔记代码 Unsupervised Learning in Python 第二章 Visualization with hierarchical clustering t-SNE

Datacamp track: Data Scientist with Python - Course 23 (2)


Hierarchical clustering of the grain data

In the video, you learned that the SciPy linkage()function performs hierarchical clustering on an array of samples. Use the linkage() function to obtain a hierarchical clustering of the grain samples, and use dendrogram() to visualize the result. A sample of the grain measurements is provided in the array samples, while the variety of each grain sample is given by the list varieties.


  • Import:
    • linkage and dendrogram from scipy.cluster.hierarchy.
    • matplotlib.pyplot as plt.
  • Perform hierarchical clustering on samples using the linkage() function with the method='complete'keyword argument. Assign the result to mergings.
  • Plot a dendrogram using the dendrogram() function on mergings. Specify the keyword arguments labels=varieties, leaf_rotation=90, and leaf_font_size=6.
import pandas as pddf = pd.read_csv('', header=None)sample_indices = [5 * i + 1 for i in range(42)]
df = df.iloc[sample_indices]
samples = df[list(range(7))].values
varieties = list(df[7].map({1: 'Kama wheat', 2: 'Rosa wheat', 3: 'Canadian wheat'}))
# Perform the necessary imports
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt# Calculate the linkage: mergings
mergings = linkage(samples, method='complete')# Plot the dendrogram, using varieties as labels



Hierarchies of stocks

In chapter 1, you used k-means clustering to cluster companies according to their stock price movements. Now, you’ll perform hierarchical clustering of the companies. You are given a NumPy array of price movements movements, where the rows correspond to companies, and a list of the company names companies. SciPy hierarchical clustering doesn’t fit into a sklearn pipeline, so you’ll need to use the normalize() function from sklearn.preprocessinginstead of Normalizer.

linkage and dendrogram have already been imported from scipy.cluster.hierarchy, and PyPlot has been imported as plt.


  • Import normalize from sklearn.preprocessing.
  • Rescale the price movements for each stock by using the

这篇关于Datacamp 笔记代码 Unsupervised Learning in Python 第二章 Visualization with hierarchical clustering t-SNE的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



