决策树预测隐形眼镜的类型(python机器学习记录)

2024-06-08 13:08

本文主要是介绍决策树预测隐形眼镜的类型(python机器学习记录),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

使用决策树预测隐形眼镜类型

tree.py(创建树的代码)
# -*- coding: UTF-8 -*-
from math import log
import operator#计算数据集的香农熵     熵越高数据越乱
def calcShannonEnt(dataSet):numEntries = len(dataSet)labelCounts = {}for featVec in dataSet:currentLabel = featVec[-1]if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0labelCounts[currentLabel] += 1shannonEnt = 0.0for key in labelCounts:prob = float(labelCounts[key]) / numEntriesshannonEnt -= prob * log(prob, 2)return shannonEnt#划分数据集
def splitDataSet(dataSet,axis,value):retDataSet=[]for featVec in dataSet:if featVec[axis]==value:reducedFeatVec=featVec[:axis]reducedFeatVec.extend(featVec[axis+1:])retDataSet.append(reducedFeatVec)# print retDataSetreturn retDataSet#选择最好的数据集划分方式
def chooseBestFeatureTosplit(dataSet):numFeatures=len(dataSet[0])-1baseEntropy=calcShannonEnt(dataSet)bestInfoGain=0.0;bestFeature=-1for i in range(numFeatures):featList=[example[i] for example in dataSet]uniqueVals=set(featList)newEntorpy=0.0for value in uniqueVals:subDataSet =splitDataSet(dataSet,i,value)prob=len(subDataSet)/float(len(dataSet))newEntorpy+=prob*calcShannonEnt(subDataSet)infoGain=baseEntropy-newEntorpyif(infoGain>bestInfoGain):bestInfoGain=infoGainbestFeature=ireturn  bestFeature
def majorityCnt(classList):classCount={}for vote in classList:if vote not in classCount.keys(): classCount[vote] = 0classCount[vote] += 1sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)return sortedClassCount[0][0]#创建树
def createTree(dataSet,labels):classList = [example[-1] for example in dataSet]if classList.count(classList[0]) == len(classList):return classList[0]#stop splitting when all of the classes are equalif len(dataSet[0]) == 1: #stop splitting when there are no more features in dataSetreturn majorityCnt(classList)bestFeat = chooseBestFeatureTosplit(dataSet)bestFeatLabel = labels[bestFeat]myTree = {bestFeatLabel:{}}del(labels[bestFeat])featValues = [example[bestFeat] for example in dataSet]uniqueVals = set(featValues)for value in uniqueVals:subLabels = labels[:]       #copy all of labels, so trees don't mess up existing labelsmyTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value),subLabels)return myTree

treePlotter.py(绘制树的代码)

# -*- coding: UTF-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('gbk')import matplotlib.pyplot as pltdecisionNode = dict(boxstyle="sawtooth", fc="0.8")
leafNode = dict(boxstyle="round4", fc="0.8")
arrow_args = dict(arrowstyle="<-")#获取叶节点的数目
def getNumLeafs(myTree):numLeafs = 0firstStr = myTree.keys()[0]secondDict = myTree[firstStr]for key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesnumLeafs += getNumLeafs(secondDict[key])else:numLeafs += 1return numLeafs#获取叶节点的层数
def getTreeDepth(myTree):maxDepth = 0firstStr = myTree.keys()[0]secondDict = myTree[firstStr]for key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesthisDepth = 1 + getTreeDepth(secondDict[key])else:thisDepth = 1if thisDepth > maxDepth: maxDepth = thisDepthreturn maxDepthdef plotNode(nodeTxt, centerPt, parentPt, nodeType):createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',xytext=centerPt, textcoords='axes fraction',va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)def plotMidText(cntrPt, parentPt, txtString):xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0]yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1]createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)def plotTree(myTree, parentPt, nodeTxt):  # if the first key tells you what feat was split onnumLeafs = getNumLeafs(myTree)  # this determines the x width of this treedepth = getTreeDepth(myTree)firstStr = myTree.keys()[0]  # the text label for this node should be thiscntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff)plotMidText(cntrPt, parentPt, nodeTxt)plotNode(firstStr, cntrPt, parentPt, decisionNode)secondDict = myTree[firstStr]plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalDfor key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesplotTree(secondDict[key], cntrPt, str(key))  # recursionelse:  # it's a leaf node print the leaf nodeplotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalWplotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD#主函数
def createPlot(inTree):fig = plt.figure(1, facecolor='white')fig.clf()axprops = dict(xticks=[], yticks=[])createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)  # no ticks# createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropsesplotTree.totalW = float(getNumLeafs(inTree))plotTree.totalD = float(getTreeDepth(inTree))plotTree.xOff = -0.5 / plotTree.totalWplotTree.yOff = 1.0plotTree(inTree, (0.5, 1.0), '')plt.show()
  1. 收集数据:lenses.txt
    young	myope	no	reduced	no lenses
    young	myope	no	normal	soft
    young	myope	yes	reduced	no lenses
    young	myope	yes	normal	hard
    young	hyper	no	reduced	no lenses
    young	hyper	no	normal	soft
    young	hyper	yes	reduced	no lenses
    young	hyper	yes	normal	hard
    pre	myope	no	reduced	no lenses
    pre	myope	no	normal	soft
    pre	myope	yes	reduced	no lenses
    pre	myope	yes	normal	hard
    pre	hyper	no	reduced	no lenses
    pre	hyper	no	normal	soft
    pre	hyper	yes	reduced	no lenses
    pre	hyper	yes	normal	no lenses
    presbyopic	myope	no	reduced	no lenses
    presbyopic	myope	no	normal	no lenses
    presbyopic	myope	yes	reduced	no lenses
    presbyopic	myope	yes	normal	hard
    presbyopic	hyper	no	reduced	no lenses
    presbyopic	hyper	no	normal	soft
    presbyopic	hyper	yes	reduced	no lenses
    presbyopic	hyper	yes	normal	no lenses
    
  2. 准备数据:解析tab键分隔符的
  3. 分析数据:快速检测数据,确保正确的解析数据内容,使用createPlot()函数绘制最终的树形图。
  4. 训练算法:使用createTree()函数
  5. 测试算法:编写测试函数验证决策树可以正确分类给定的数据实例
  6. 使用算法:存储数的数据结构,以便下次使用时无需重新构造树
在导入tree.py和treePlotter.py的python交互模式下:
>>>import tree
>>>import treePlotter
>>>fr=open("lenses.txt")
>>>lenses=[inst.strip().split("\t") for inst in fr.readlines()]
>>>lensesLabels=["age","prescript","astigmatic","tearRte"]
>>>lensesTree=tree.createTree(lenses,lensesLabels)
>>>treePlotter.createPlot(lensesTree)

绘制的决策树:



参考书籍:机器学习实战





这篇关于决策树预测隐形眼镜的类型(python机器学习记录)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1042270

相关文章

Python如何在Word中生成多种不同类型的图表

《Python如何在Word中生成多种不同类型的图表》Word文档中插入图表不仅能直观呈现数据,还能提升文档的可读性和专业性,本文将介绍如何使用Python在Word文档中创建和自定义各种图表,需要的... 目录在Word中创建柱形图在Word中创建条形图在Word中创建折线图在Word中创建饼图在Word

Python Excel实现自动添加编号

《PythonExcel实现自动添加编号》这篇文章主要为大家详细介绍了如何使用Python在Excel中实现自动添加编号效果,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录1、背景介绍2、库的安装3、核心代码4、完整代码1、背景介绍简单的说,就是在Excel中有一列h=会有重复

国内环境搭建私有知识问答库踩坑记录(ollama+deepseek+ragflow)

《国内环境搭建私有知识问答库踩坑记录(ollama+deepseek+ragflow)》本文给大家利用deepseek模型搭建私有知识问答库的详细步骤和遇到的问题及解决办法,感兴趣的朋友一起看看吧... 目录1. 第1步大家在安装完ollama后,需要到系统环境变量中添加两个变量2. 第3步 “在cmd中

SpringBoot接收JSON类型的参数方式

《SpringBoot接收JSON类型的参数方式》:本文主要介绍SpringBoot接收JSON类型的参数方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、jsON二、代码准备三、Apifox操作总结一、JSON在学习前端技术时,我们有讲到过JSON,而在

Python FastAPI入门安装使用

《PythonFastAPI入门安装使用》FastAPI是一个现代、快速的PythonWeb框架,用于构建API,它基于Python3.6+的类型提示特性,使得代码更加简洁且易于绶护,这篇文章主要介... 目录第一节:FastAPI入门一、FastAPI框架介绍什么是ASGI服务(WSGI)二、FastAP

Python中Windows和macOS文件路径格式不一致的解决方法

《Python中Windows和macOS文件路径格式不一致的解决方法》在Python中,Windows和macOS的文件路径字符串格式不一致主要体现在路径分隔符上,这种差异可能导致跨平台代码在处理文... 目录方法 1:使用 os.path 模块方法 2:使用 pathlib 模块(推荐)方法 3:统一使

一文教你解决Python不支持中文路径的问题

《一文教你解决Python不支持中文路径的问题》Python是一种广泛使用的高级编程语言,然而在处理包含中文字符的文件路径时,Python有时会表现出一些不友好的行为,下面小编就来为大家介绍一下具体的... 目录问题背景解决方案1. 设置正确的文件编码2. 使用pathlib模块3. 转换路径为Unicod

Python结合Flask框架构建一个简易的远程控制系统

《Python结合Flask框架构建一个简易的远程控制系统》这篇文章主要为大家详细介绍了如何使用Python与Flask框架构建一个简易的远程控制系统,能够远程执行操作命令(如关机、重启、锁屏等),还... 目录1.概述2.功能使用系统命令执行实时屏幕监控3. BUG修复过程1. Authorization

Python使用DeepSeek进行联网搜索功能详解

《Python使用DeepSeek进行联网搜索功能详解》Python作为一种非常流行的编程语言,结合DeepSeek这一高性能的深度学习工具包,可以方便地处理各种深度学习任务,本文将介绍一下如何使用P... 目录一、环境准备与依赖安装二、DeepSeek简介三、联网搜索与数据集准备四、实践示例:图像分类1.

Python中__new__()方法适应及注意事项详解

《Python中__new__()方法适应及注意事项详解》:本文主要介绍Python中__new__()方法适应及注意事项的相关资料,new()方法是Python中的一个特殊构造方法,用于在创建对... 目录前言基本用法返回值单例模式自定义对象创建注意事项总结前言new() 方法在 python 中是一个