决策树预测隐形眼镜的类型(python机器学习记录)

2024-06-08 13:08

本文主要是介绍决策树预测隐形眼镜的类型(python机器学习记录),希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

使用决策树预测隐形眼镜类型

tree.py(创建树的代码)
# -*- coding: UTF-8 -*-
from math import log
import operator#计算数据集的香农熵     熵越高数据越乱
def calcShannonEnt(dataSet):numEntries = len(dataSet)labelCounts = {}for featVec in dataSet:currentLabel = featVec[-1]if currentLabel not in labelCounts.keys(): labelCounts[currentLabel] = 0labelCounts[currentLabel] += 1shannonEnt = 0.0for key in labelCounts:prob = float(labelCounts[key]) / numEntriesshannonEnt -= prob * log(prob, 2)return shannonEnt#划分数据集
def splitDataSet(dataSet,axis,value):retDataSet=[]for featVec in dataSet:if featVec[axis]==value:reducedFeatVec=featVec[:axis]reducedFeatVec.extend(featVec[axis+1:])retDataSet.append(reducedFeatVec)# print retDataSetreturn retDataSet#选择最好的数据集划分方式
def chooseBestFeatureTosplit(dataSet):numFeatures=len(dataSet[0])-1baseEntropy=calcShannonEnt(dataSet)bestInfoGain=0.0;bestFeature=-1for i in range(numFeatures):featList=[example[i] for example in dataSet]uniqueVals=set(featList)newEntorpy=0.0for value in uniqueVals:subDataSet =splitDataSet(dataSet,i,value)prob=len(subDataSet)/float(len(dataSet))newEntorpy+=prob*calcShannonEnt(subDataSet)infoGain=baseEntropy-newEntorpyif(infoGain>bestInfoGain):bestInfoGain=infoGainbestFeature=ireturn  bestFeature
def majorityCnt(classList):classCount={}for vote in classList:if vote not in classCount.keys(): classCount[vote] = 0classCount[vote] += 1sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)return sortedClassCount[0][0]#创建树
def createTree(dataSet,labels):classList = [example[-1] for example in dataSet]if classList.count(classList[0]) == len(classList):return classList[0]#stop splitting when all of the classes are equalif len(dataSet[0]) == 1: #stop splitting when there are no more features in dataSetreturn majorityCnt(classList)bestFeat = chooseBestFeatureTosplit(dataSet)bestFeatLabel = labels[bestFeat]myTree = {bestFeatLabel:{}}del(labels[bestFeat])featValues = [example[bestFeat] for example in dataSet]uniqueVals = set(featValues)for value in uniqueVals:subLabels = labels[:]       #copy all of labels, so trees don't mess up existing labelsmyTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value),subLabels)return myTree

treePlotter.py(绘制树的代码)

# -*- coding: UTF-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('gbk')import matplotlib.pyplot as pltdecisionNode = dict(boxstyle="sawtooth", fc="0.8")
leafNode = dict(boxstyle="round4", fc="0.8")
arrow_args = dict(arrowstyle="<-")#获取叶节点的数目
def getNumLeafs(myTree):numLeafs = 0firstStr = myTree.keys()[0]secondDict = myTree[firstStr]for key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesnumLeafs += getNumLeafs(secondDict[key])else:numLeafs += 1return numLeafs#获取叶节点的层数
def getTreeDepth(myTree):maxDepth = 0firstStr = myTree.keys()[0]secondDict = myTree[firstStr]for key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesthisDepth = 1 + getTreeDepth(secondDict[key])else:thisDepth = 1if thisDepth > maxDepth: maxDepth = thisDepthreturn maxDepthdef plotNode(nodeTxt, centerPt, parentPt, nodeType):createPlot.ax1.annotate(nodeTxt, xy=parentPt, xycoords='axes fraction',xytext=centerPt, textcoords='axes fraction',va="center", ha="center", bbox=nodeType, arrowprops=arrow_args)def plotMidText(cntrPt, parentPt, txtString):xMid = (parentPt[0] - cntrPt[0]) / 2.0 + cntrPt[0]yMid = (parentPt[1] - cntrPt[1]) / 2.0 + cntrPt[1]createPlot.ax1.text(xMid, yMid, txtString, va="center", ha="center", rotation=30)def plotTree(myTree, parentPt, nodeTxt):  # if the first key tells you what feat was split onnumLeafs = getNumLeafs(myTree)  # this determines the x width of this treedepth = getTreeDepth(myTree)firstStr = myTree.keys()[0]  # the text label for this node should be thiscntrPt = (plotTree.xOff + (1.0 + float(numLeafs)) / 2.0 / plotTree.totalW, plotTree.yOff)plotMidText(cntrPt, parentPt, nodeTxt)plotNode(firstStr, cntrPt, parentPt, decisionNode)secondDict = myTree[firstStr]plotTree.yOff = plotTree.yOff - 1.0 / plotTree.totalDfor key in secondDict.keys():if type(secondDict[key]).__name__ == 'dict':  # test to see if the nodes are dictonaires, if not they are leaf nodesplotTree(secondDict[key], cntrPt, str(key))  # recursionelse:  # it's a leaf node print the leaf nodeplotTree.xOff = plotTree.xOff + 1.0 / plotTree.totalWplotNode(secondDict[key], (plotTree.xOff, plotTree.yOff), cntrPt, leafNode)plotMidText((plotTree.xOff, plotTree.yOff), cntrPt, str(key))plotTree.yOff = plotTree.yOff + 1.0 / plotTree.totalD#主函数
def createPlot(inTree):fig = plt.figure(1, facecolor='white')fig.clf()axprops = dict(xticks=[], yticks=[])createPlot.ax1 = plt.subplot(111, frameon=False, **axprops)  # no ticks# createPlot.ax1 = plt.subplot(111, frameon=False) #ticks for demo puropsesplotTree.totalW = float(getNumLeafs(inTree))plotTree.totalD = float(getTreeDepth(inTree))plotTree.xOff = -0.5 / plotTree.totalWplotTree.yOff = 1.0plotTree(inTree, (0.5, 1.0), '')plt.show()
  1. 收集数据:lenses.txt
    young	myope	no	reduced	no lenses
    young	myope	no	normal	soft
    young	myope	yes	reduced	no lenses
    young	myope	yes	normal	hard
    young	hyper	no	reduced	no lenses
    young	hyper	no	normal	soft
    young	hyper	yes	reduced	no lenses
    young	hyper	yes	normal	hard
    pre	myope	no	reduced	no lenses
    pre	myope	no	normal	soft
    pre	myope	yes	reduced	no lenses
    pre	myope	yes	normal	hard
    pre	hyper	no	reduced	no lenses
    pre	hyper	no	normal	soft
    pre	hyper	yes	reduced	no lenses
    pre	hyper	yes	normal	no lenses
    presbyopic	myope	no	reduced	no lenses
    presbyopic	myope	no	normal	no lenses
    presbyopic	myope	yes	reduced	no lenses
    presbyopic	myope	yes	normal	hard
    presbyopic	hyper	no	reduced	no lenses
    presbyopic	hyper	no	normal	soft
    presbyopic	hyper	yes	reduced	no lenses
    presbyopic	hyper	yes	normal	no lenses
    
  2. 准备数据:解析tab键分隔符的
  3. 分析数据:快速检测数据,确保正确的解析数据内容,使用createPlot()函数绘制最终的树形图。
  4. 训练算法:使用createTree()函数
  5. 测试算法:编写测试函数验证决策树可以正确分类给定的数据实例
  6. 使用算法:存储数的数据结构,以便下次使用时无需重新构造树
在导入tree.py和treePlotter.py的python交互模式下:
>>>import tree
>>>import treePlotter
>>>fr=open("lenses.txt")
>>>lenses=[inst.strip().split("\t") for inst in fr.readlines()]
>>>lensesLabels=["age","prescript","astigmatic","tearRte"]
>>>lensesTree=tree.createTree(lenses,lensesLabels)
>>>treePlotter.createPlot(lensesTree)

绘制的决策树:



参考书籍:机器学习实战





这篇关于决策树预测隐形眼镜的类型(python机器学习记录)的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1042270

相关文章

如何使用 Python 读取 Excel 数据

《如何使用Python读取Excel数据》:本文主要介绍使用Python读取Excel数据的详细教程,通过pandas和openpyxl,你可以轻松读取Excel文件,并进行各种数据处理操... 目录使用 python 读取 Excel 数据的详细教程1. 安装必要的依赖2. 读取 Excel 文件3. 读

Python的time模块一些常用功能(各种与时间相关的函数)

《Python的time模块一些常用功能(各种与时间相关的函数)》Python的time模块提供了各种与时间相关的函数,包括获取当前时间、处理时间间隔、执行时间测量等,:本文主要介绍Python的... 目录1. 获取当前时间2. 时间格式化3. 延时执行4. 时间戳运算5. 计算代码执行时间6. 转换为指

利用Python调试串口的示例代码

《利用Python调试串口的示例代码》在嵌入式开发、物联网设备调试过程中,串口通信是最基础的调试手段本文将带你用Python+ttkbootstrap打造一款高颜值、多功能的串口调试助手,需要的可以了... 目录概述:为什么需要专业的串口调试工具项目架构设计1.1 技术栈选型1.2 关键类说明1.3 线程模

Python ZIP文件操作技巧详解

《PythonZIP文件操作技巧详解》在数据处理和系统开发中,ZIP文件操作是开发者必须掌握的核心技能,Python标准库提供的zipfile模块以简洁的API和跨平台特性,成为处理ZIP文件的首选... 目录一、ZIP文件操作基础三板斧1.1 创建压缩包1.2 解压操作1.3 文件遍历与信息获取二、进阶技

Python Transformers库(NLP处理库)案例代码讲解

《PythonTransformers库(NLP处理库)案例代码讲解》本文介绍transformers库的全面讲解,包含基础知识、高级用法、案例代码及学习路径,内容经过组织,适合不同阶段的学习者,对... 目录一、基础知识1. Transformers 库简介2. 安装与环境配置3. 快速上手示例二、核心模

Python正则表达式语法及re模块中的常用函数详解

《Python正则表达式语法及re模块中的常用函数详解》这篇文章主要给大家介绍了关于Python正则表达式语法及re模块中常用函数的相关资料,正则表达式是一种强大的字符串处理工具,可以用于匹配、切分、... 目录概念、作用和步骤语法re模块中的常用函数总结 概念、作用和步骤概念: 本身也是一个字符串,其中

Python使用getopt处理命令行参数示例解析(最佳实践)

《Python使用getopt处理命令行参数示例解析(最佳实践)》getopt模块是Python标准库中一个简单但强大的命令行参数处理工具,它特别适合那些需要快速实现基本命令行参数解析的场景,或者需要... 目录为什么需要处理命令行参数?getopt模块基础实际应用示例与其他参数处理方式的比较常见问http

python实现svg图片转换为png和gif

《python实现svg图片转换为png和gif》这篇文章主要为大家详细介绍了python如何实现将svg图片格式转换为png和gif,文中的示例代码讲解详细,感兴趣的小伙伴可以跟随小编一起学习一下... 目录python实现svg图片转换为png和gifpython实现图片格式之间的相互转换延展:基于Py

Python中的getopt模块用法小结

《Python中的getopt模块用法小结》getopt.getopt()函数是Python中用于解析命令行参数的标准库函数,该函数可以从命令行中提取选项和参数,并对它们进行处理,本文详细介绍了Pyt... 目录getopt模块介绍getopt.getopt函数的介绍getopt模块的常用用法getopt模

Python利用ElementTree实现快速解析XML文件

《Python利用ElementTree实现快速解析XML文件》ElementTree是Python标准库的一部分,而且是Python标准库中用于解析和操作XML数据的模块,下面小编就来和大家详细讲讲... 目录一、XML文件解析到底有多重要二、ElementTree快速入门1. 加载XML的两种方式2.