本文主要是介绍J48 源码学习| Weka,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
J48 C4.5决策树算法源码学习
TODO: J48 的分类效率分析。
一、 准备工作。



二、代码流解析:
模型的学习程序从 J48.java 开始。
J48.buildClassifier(ins): 选取 C45 决策树模型为例:
modSelection = new C45ModelSelection(m_minNumObj, instances);
m_root = new C45PruneableClassifierTree(modSelection, !m_unpruned, m_CF,m_subtreeRaising, !m_noCleanup);
m_root.buildClassifier(instances);
将C45Pruneable*.buildClassifier(ins) 继续展开:
对基类ClassifierTree. buildTree()继续展开:data.deleteWithMissingClass();buildTree(data, m_subtreeRaising || !m_cleanup);collapse();if (m_pruneTheTree) {prune();}if (m_cleanup) {cleanup(new Instances(data, 0));}
调用 modSelection.selectModel(ins);
modSelection.split(ins). // 分割数据
m_sons[i] = getNewTree(localInstances[i]); // 构建子树
将 C45ModelSelection.selectModel(ins) 继续展开:
if (Utils.sm(checkDistribution.total(), 2 * m_minNoObj)|| Utils.eq(checkDistribution.total(), checkDistribution.perClass(checkDistribution.maxClass())))return noSplitModel;multiValue = !(attribute.isNumeric() || attribute.numValues() < (0.3 * m_allData.numInstances()));currentModel = new C45Split[data.numAttributes()];sumOfWeights = data.sumOfWeights();// For each attribute.for (i = 0; i < data.numAttributes(); i++) {// Apart from class attribute.if (i != (data).classIndex()) {// Get models for current attribute.currentModel[i] = new C45Split(i, m_minNoObj, sumOfWeights);currentModel[i].buildClassifier(data);// ... 省略代码部分: 更新 averageInfoGain的总和} elsecurrentModel[i] = null;}averageInfoGain = averageInfoGain / (double) validModels;// Find "best" attribute to split on.minResult = 0;for (i = 0; i < data.numAttributes(); i++) {// Use 1E-3 here to get a closer approximation to the original implementation.if ((currentModel[i].infoGain() >= (averageInfoGain - 1E-3))&& Utils.gr(currentModel[i].gainRatio(), minResult)) {bestModel = currentModel[i];
这篇关于J48 源码学习| Weka的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!