创建贝叶斯网络并使用Python3.x学习参数

Spu*_*Spu 22 machine-learning probability bayesian-networks python-3.x scikit-learn

我正在Windows上为python3.x搜索最合适的工具来创建贝叶斯网络,从数据中学习其参数并执行推理.

网络结构我想自己定义如下: 在此输入图像描述

它是取自文件.

除"大小"和"GraspPose"之外,所有变量都是离散的(并且只能采用2种可能的状态),它们是连续的,应该被建模为高斯混合物.

作者使用期望最大化算法来学习条件概率表的参数,并使用Junction-Tree算法来计算精确推理.

据我所知,所有这些都是在MatLab中与墨菲的贝叶斯网络工具箱一起实现的.

我试图在python中搜索类似的东西,这是我的结果:

  1. Python的贝叶斯网络工具箱http://sourceforge.net/projects/pbnt.berlios/(http://pbnt.berlios.de/).网站不起作用,项目似乎不受支持.
  2. BayesPy https://github.com/bayespy/bayespy 我认为这是我真正需要的,但我找不到类似于我的案例的一些例子,以了解如何构建网络结构.
  3. PyMC似乎是一个功能强大的模块,但我在Windows 64,python 3.3上导入它时遇到了问题.我安装开发版时遇到错误

    警告(theano.configdefaults):未检测到g ++!Theano将无法执行优化的C实现(针对CPU和GPU),并且将默认为Python实现.性能将严重下降.要删除此警告,请将Theano标记cxx设置为空字符串.

更新:

  1. libpgm(http://pythonhosted.org/libpgm/).正是我需要的,遗憾的是python 3.x不支持
  2. 非常有趣的积极开发图书馆:PGMPY.遗憾的是,不支持连续变量和从数据中学习.https://github.com/pgmpy/pgmpy/

任何建议和具体的例子将受到高度赞赏.

Jam*_*ood 9

看起来石榴最近更新为包括贝叶斯网络.我自己没有尝试过,但界面看起来不错,并且sklearn-ish.

  • @Spu你试过吗?你有什么经历? (3认同)

erd*_*ant 6

尝试使用bnlearn 库,它包含许多从数据中学习参数并执行推理的函数。

pip install bnlearn
Run Code Online (Sandbox Code Playgroud)

您的用例将是这样的:

# Import the library
import bnlearn

# Define the network structure
edges = [('task', 'size'),
         ('lat var', 'size'),
         ('task', 'fill level'),
         ('task', 'object shape'),
         ('task', 'side graspable'),
         ('size', 'GrasPose'),
         ('task', 'GrasPose'),
         ('fill level', 'GrasPose'),
         ('object shape', 'GrasPose'),
         ('side graspable', 'GrasPose'),
         ('GrasPose', 'latvar'),
]

# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)

# DAG is stored in adjacency matrix
print(DAG['adjmat'])

# target           task   size  lat var  ...  side graspable  GrasPose  latvar
# source                                 ...                                  
# task            False   True    False  ...            True      True   False
# size            False  False    False  ...           False      True   False
# lat var         False   True    False  ...           False     False   False
# fill level      False  False    False  ...           False      True   False
# object shape    False  False    False  ...           False      True   False
# side graspable  False  False    False  ...           False      True   False
# GrasPose        False  False    False  ...           False     False    True
# latvar          False  False    False  ...           False     False   False
# 
# [8 rows x 8 columns]

# No CPDs are in the DAG. Lets see what happens if we print it.
bnlearn.print_CPD(DAG)
# >[BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.

# Plot DAG. Note that it can be differently orientated if you re-make the plot.
bnlearn.plot(DAG)
Run Code Online (Sandbox Code Playgroud)

预定义的 DAG

现在我们需要数据来学习它的参数。假设这些存储在您的df中。数据文件中的变量名称必须存在于 DAG 中。

# Read data
df = pd.read_csv('path_to_your_data.csv')

# Learn the parameters and store CPDs in the DAG. Use the methodtype your desire. Options are maximumlikelihood or bayes.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
# CPDs are present in the DAG at this point.
bnlearn.print_CPD(DAG)

# Start making inferences now. As an example:
q1 = bnlearn.inference.fit(DAG, variables=['lat var'], evidence={'fill level':1, 'size':0, 'task':1})
Run Code Online (Sandbox Code Playgroud)

下面是一个带有演示数据集(洒水器)的工作示例。你可以尝试一下这个。

# Import example dataset
df = bnlearn.import_example('sprinkler')
print(df)
#      Cloudy  Sprinkler  Rain  Wet_Grass
# 0         0          0     0          0
# 1         1          0     1          1
# 2         0          1     0          1
# 3         1          1     1          1
# 4         1          1     1          1
# ..      ...        ...   ...        ...
# 995       1          0     1          1
# 996       1          0     1          1
# 997       1          0     1          1
# 998       0          0     0          0
# 999       0          1     1          1

# [1000 rows x 4 columns]


# Define the network structure
edges = [('Cloudy', 'Sprinkler'),
         ('Cloudy', 'Rain'),
         ('Sprinkler', 'Wet_Grass'),
         ('Rain', 'Wet_Grass')]

# Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
# Plot the DAG
bnlearn.plot(DAG)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

# Parameter learning on the user-defined DAG and input data
DAG = bnlearn.parameter_learning.fit(DAG, df)

# Print the learned CPDs
bnlearn.print_CPD(DAG)

# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
# CPD of Cloudy:
# +-----------+-------+
# | Cloudy(0) | 0.494 |
# +-----------+-------+
# | Cloudy(1) | 0.506 |
# +-----------+-------+
# CPD of Sprinkler:
# +--------------+--------------------+--------------------+
# | Cloudy       | Cloudy(0)          | Cloudy(1)          |
# +--------------+--------------------+--------------------+
# | Sprinkler(0) | 0.4807692307692308 | 0.7075098814229249 |
# +--------------+--------------------+--------------------+
# | Sprinkler(1) | 0.5192307692307693 | 0.2924901185770751 |
# +--------------+--------------------+--------------------+
# CPD of Rain:
# +---------+--------------------+---------------------+
# | Cloudy  | Cloudy(0)          | Cloudy(1)           |
# +---------+--------------------+---------------------+
# | Rain(0) | 0.6518218623481782 | 0.33695652173913043 |
# +---------+--------------------+---------------------+
# | Rain(1) | 0.3481781376518219 | 0.6630434782608695  |
# +---------+--------------------+---------------------+
# CPD of Wet_Grass:
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Rain         | Rain(0)            | Rain(0)             | Rain(1)             | Rain(1)             |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Sprinkler    | Sprinkler(0)       | Sprinkler(1)        | Sprinkler(0)        | Sprinkler(1)        |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(0) | 0.7553816046966731 | 0.33755274261603374 | 0.25588235294117645 | 0.37910447761194027 |
# +--------------+--------------------+---------------------+---------------------+---------------------+
# | Wet_Grass(1) | 0.2446183953033268 | 0.6624472573839663  | 0.7441176470588236  | 0.6208955223880597  |
# +--------------+--------------------+---------------------+---------------------+---------------------+

# Make inference
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})

# +--------------+------------------+
# | Wet_Grass    |   phi(Wet_Grass) |
# +==============+==================+
# | Wet_Grass(0) |           0.2559 |
# +--------------+------------------+
# | Wet_Grass(1) |           0.7441 |
# +--------------+------------------+

print(q1.values)
# array([0.25588235, 0.74411765])
Run Code Online (Sandbox Code Playgroud)

更多示例可以在bnlearn页面的文档中找到或阅读博客