本文主要是介绍pycharm-ConvergenceWarning: Number of distinct clusters (19) found smaller than n_clusters (20).,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
ConvergenceWarning: Number of distinct clusters (19) found smaller than n_clusters (20).
pycharm利用Kmeans做文本聚类,选择最优k值时,飘红
可以发现,从20开始就飘红了,追溯代码,可能是聚类中心点个数设置太大了,n_features达到20时error已经等于0,后面的也就无需设置太多中心点。
原始代码:
def k_determin():'''测试选择最优参数'''dataset = df['论文摘要']print("%d documents" % len(dataset))X, vectorizer = transform(dataset, n_features=50)true_ks = []scores = []#中心点的个数最大选择自己的数据量,我这里本来是34条数据for i in range(3, 34, 1):score = train(X, vectorizer, true_k=i) / len(dataset)print(i, score)true_ks.append(i)scores.append(score)plt.figure(figsize=(8, 4))plt.plot(true_ks, scores, label="Error", color="blue", linewidth=1)plt.xlabel("n_features")plt.ylabel("Error")plt.legend()plt.show()
k_determin()
修改代码:
for i in range(3, 20, 1):score = train(X, vectorizer, true_k=i) / len(dataset)print(i, score)true_ks.append(i)scores.append(score)
运行不会出现“ConvergenceWarning: Number of distinct clusters (19) found smaller than n_clusters (20).”红色字样
这篇关于pycharm-ConvergenceWarning: Number of distinct clusters (19) found smaller than n_clusters (20).的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!