如何存储决策树

Question

如何存储决策树

use*_*146 6 python decision-tree python-2.7

我尝试了几种不同的方法，其中一些我在这里找到，其中包括制作 Node 类和嵌套字典，但我似乎无法让它们工作。

我的代码当前接受几行 DNA（a、t、g、c），然后将其存储为 numpy 数组。然后，它找到提供最大增益的属性，并将数据拆分为 4 个新的 numpy 数组（取决于属性中存在的 a、t、g 或 c）。

我无法创建可以构建树的递归函数。我对 python 和编程本身很陌生，所以请详细描述我应该做什么。

谢谢你的帮助

Answer 1

prl*_*900 6

如果您想从头开始实现决策树，我建议您使用类来构建树。一棵树由节点组成，其中一个节点递归地包含节点，叶子是终端节点。对于二叉树的情况，这些类可以是这样的：

class Node(object):
    def __init__(self):
        self.split_variable = None
        self.left_child = None
        self.right_child = None

    def get_name(self):
        return 'Node'

class Leaf(object):
    def __init__(self):
        self.value = None

    def get_name(self):
        return 'Leaf'

Run Code Online (Sandbox Code Playgroud)

对于 Node 类：“split_variable”将包含拆分中使用的变量名称，即：[a,t,g,c]，“left_child”和“right_child”将是 Node 或 Leaf 的新实例。该变量的 True/False 存在将被映射到左/右子级。（如果是回归树，您需要向 Node 类“split_value”添加第四个变量，并将小于/大于该值的值映射到左/右子节点中）。

对于叶类：“值”包含树类变量的分配值（即离散变量的多数值或连续变量的平均值）。

为了完成您的实现，您需要函数来遍历您的树，对其进行评估和/或可视化。这些函数将被递归调用以完成对树的遍历。您可以在此处使用类的 get_name() 函数来区分节点和叶子。要实现这一部分，它实际上取决于您存储数据的方式，我建议您使用类似于表的pandas DataFrame。示例评估函数可以是（伪代码）：

def evaluate_tree(your_data, node):
    if your_data[node.split_variable]:
        if node.left_child.get_name() == 'Node':
            evaluate_tree(your_data, node.left_child)
        elif node.left_child.get_name() == 'Leaf':
            return node.left_child.value
    else:
        if node.right_child.get_name() == 'Node':
            evaluate_tree(your_data, node.right_child)
        elif node.right_child.get_name() == 'Leaf':
            return node.right_child.value

Run Code Online (Sandbox Code Playgroud)

祝你好运！

Answer 2

Chr*_*ten 1

如果您希望在 Python 中使用决策树，您可以使用 Sci-kit learn 中的决策树模块，而不是编写自己的决策树类和逻辑： http: //scikit-learn.org/stable/modules/tree。 html。使用 Scikit Learn 决策树模块，您可以将决策树对象保存到内存中，或者将树的某些属性写入文件或数据库。

Sci-kit learn 以及 Anacondas 包中的其他 Python 库几乎是 Python 数据探索和分析的标准。您可以从 Continuum 获取 Anaconda 软件包： http: //continuum.io/downloads

编辑1

我在黑客新闻上看到了这个。它是关于使用 PostgreSQL 作为从中提取值的数据库在 Python 中构建决策树。结账可能会很有趣： http://www.garysieling.com/blog/building-decision-tree-python-postgres-data

归档时间：	11 年，6 月前
查看次数：	8562 次
最近记录：	10 年，12 月前