用Neo4J模拟马尔可夫链

JnB*_*ymn 9 markov-chains n-gram neo4j cypher

马尔科夫链是由一组其可以以一定的概率转换到其他状态的状态的.

通过为每个状态创建一个节点,每个转换的关系,然后用适当的概率注释转换关系,可以很容易地在Neo4J中表示马尔可夫链.

但是,你能用Neo4J 模拟马尔可夫链吗?例如,可以强制Neo4J在某个状态下启动,然后根据概率转换到下一个状态和下一个状态吗?Neo4J可以打印出通过这个状态空间的路径吗?

也许通过一个简单的例子可以更容易理解.假设我想根据公司科技博客的文本制作一个2克的英语模型.我启动了一个执行以下操作的脚本:

  1. 它删除了博客的文本.
  2. 它迭代每对相邻的字母并在Neo4J中创建一个节点.
  3. 它在每个相邻字母的3元组上再次迭代,然后在由前两个字母表示的节点和由最后两个字母表示的节点之间创建Neo4J指向关系.它将此关系的计数器初始化为1.如果关系已存在,则计数器递增.
  4. 最后,它遍历每个节点,计算已发生的总传出转换的数量,然后在特定节点的每个关系上创建一个新的注释等于count/totalcount.这是转换概率.

现在Neo4J图已经完成了,如何从我的2克英语模型中创建一个"句子"?输出可能是这样的:

在没有IST LAT WHEY CRATICT FROURE BIRS GROCID REPONSTURES的REPTAGIN是CRE的REGOACTIONA.

Mic*_*man 6

Neo4j不提供开箱即用的功能,但由于您已经正确地填充了数据库,因此您需要的遍历只需几行代码.

我在这里重新创建了你的实验,只做了一些修改.首先,我通过一次文本填充数据库(步骤2和3),但这是次要的.更重要的是,我只存储每个关系的出现次数和节点上的总数(步骤4),因为我认为不需要预先计算概率.

您要求的代码看起来像这样:

/**
 * A component that creates a random sentence by a random walk on a Markov Chain stored in Neo4j, produced by
 * {@link NGramDatabasePopulator}.
 */
public class RandomSentenceCreator {

    private final Random random = new Random(System.currentTimeMillis());

    /**
     * Create a random sentence from the underlying n-gram model. Starts at a random node an follows random outgoing
     * relationships of type {@link Constants#REL} with a probability proportional to that transition occurrence in the
     * text that was processed to form the model. This happens until the desired length is achieved. In case a node with
     * no outgoing relationships it reached, the walk is re-started from a random node.
     *
     * @param database storing the n-gram model.
     * @param length   desired number of characters in the random sentence.
     * @return random sentence.
     */
    public String createRandomSentence(GraphDatabaseService database, int length) {
        Node startNode = randomNode(database);
        return walk(startNode, length, 0);
    }

    private String walk(Node startNode, int maxLength, int currentLength) {
        if (currentLength >= maxLength) {
            return (String) startNode.getProperty(NAME);
        }

        int totalRelationships = (int) startNode.getProperty(TOTAL, 0);
        if (totalRelationships == 0) {
            //terminal node, restart from random
            return walk(randomNode(startNode.getGraphDatabase()), maxLength, currentLength);
        }

        int choice = random.nextInt(totalRelationships) + 1;
        int total = 0;
        Iterator<Relationship> relationshipIterator = startNode.getRelationships(OUTGOING, REL).iterator();

        Relationship toFollow = null;
        while (total < choice && relationshipIterator.hasNext()) {
            toFollow = relationshipIterator.next();
            total += (int) toFollow.getProperty(PROBABILITY);
        }

        Node nextNode;
        if (toFollow == null) {
            //no relationship to follow => stay on the same node and try again
            nextNode = startNode;
        } else {
            nextNode = toFollow.getEndNode();
        }

        return ((String) nextNode.getProperty(NAME)).substring(0, 1) + walk(nextNode, maxLength, currentLength + 1);
    }

    private Node randomNode(GraphDatabaseService database) {
        return random(GlobalGraphOperations.at(database).getAllNodes());
    }
}
Run Code Online (Sandbox Code Playgroud)