Ind*_*ndi 7 visualization nlp r igraph opennlp
我想将openNLP中的解析(POS标记)显示为树结构可视化.下面我提供了openNLP的解析树,但我不能将其绘制为Python解析常见的可视化树.
install.packages(
"http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz",
repos=NULL,
type="source"
)
library(NLP)
library(openNLP)
x <- 'Scroll bar does not work the best either.'
s <- as.String(x)
## Annotators
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
parse_annotator <- Parse_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))
p <- parse_annotator(s, a2)
ptext <- sapply(p$features, `[[`, "parse")
ptext
Tree_parse(ptext)
## > ptext
## [1] "(TOP (S (NP (NNP Scroll) (NN bar)) (VP (VBZ does) (RB not) (VP (VB work) (NP (DT the) (JJS best)) (ADVP (RB either))))(. .)))"
## > Tree_parse(ptext)
## (TOP
## (S
## (NP (NNP Scroll) (NN bar))
## (VP (VBZ does) (RB not) (VP (VB work) (NP (DT the) (JJS best)) (ADVP (RB either))))
## (. .)))
Run Code Online (Sandbox Code Playgroud)
树结构应该类似于:
有没有办法显示这个树形象?
我发现这个相关的树即用于绘制可能有用的数字表达式的问题,但是我无法概括为句子解析可视化.
pic*_*ick 10
这是一个igraph版本.此函数将结果Parse_annotator作为其输入,因此ptext在您的示例中. NLP::Tree_parse已经创建了一个漂亮的树结构,所以这里的想法是递归遍历它并创建一个插入的边缘列表igraph.边缘列表只是头部 - 尾部值的2列矩阵.
为了igraph在适当的节点之间创建边缘,它们需要具有唯一标识符.我通过regmatches<-在使用之前将一系列整数(使用)附加到文本中的单词来完成此操作Tree_parse.
内部函数edgemaker遍历树,随时填充edgelist.有一些选项可以将叶子与其余节点分开着色,但是如果你通过选项,vertex.label.color它们会将它们全部着色.
## Make a graph from Tree_parse result
parse2graph <- function(ptext, leaf.color='chartreuse4', label.color='blue4',
title=NULL, cex.main=.9, ...) {
stopifnot(require(NLP) && require(igraph))
## Replace words with unique versions
ms <- gregexpr("[^() ]+", ptext) # just ignoring spaces and brackets?
words <- regmatches(ptext, ms)[[1]] # just words
regmatches(ptext, ms) <- list(paste0(words, seq.int(length(words)))) # add id to words
## Going to construct an edgelist and pass that to igraph
## allocate here since we know the size (number of nodes - 1) and -1 more to exclude 'TOP'
edgelist <- matrix('', nrow=length(words)-2, ncol=2)
## Function to fill in edgelist in place
edgemaker <- (function() {
i <- 0 # row counter
g <- function(node) { # the recursive function
if (inherits(node, "Tree")) { # only recurse subtrees
if ((val <- node$value) != 'TOP1') { # skip 'TOP' node (added '1' above)
for (child in node$children) {
childval <- if(inherits(child, "Tree")) child$value else child
i <<- i+1
edgelist[i,1:2] <<- c(val, childval)
}
}
invisible(lapply(node$children, g))
}
}
})()
## Create the edgelist from the parse tree
edgemaker(Tree_parse(ptext))
## Make the graph, add options for coloring leaves separately
g <- graph_from_edgelist(edgelist)
vertex_attr(g, 'label.color') <- label.color # non-leaf colors
vertex_attr(g, 'label.color', V(g)[!degree(g, mode='out')]) <- leaf.color
V(g)$label <- sub("\\d+", '', V(g)$name) # remove the numbers for labels
plot(g, layout=layout.reingold.tilford, ...)
if (!missing(title)) title(title, cex.main=cex.main)
}
Run Code Online (Sandbox Code Playgroud)
因此,使用您的示例,字符串x及其带注释的版本ptext,看起来像
x <- 'Scroll bar does not work the best either.'
ptext
# [1] "(TOP (S (NP (NNP Scroll) (NN bar)) (VP (VBZ does) (RB not) (VP (VB work) (NP (DT the) (JJS best)) (ADVP (RB either))))(. .)))"
Run Code Online (Sandbox Code Playgroud)
通过调用创建图形
library(igraph)
library(NLP)
parse2graph(ptext, # plus optional graphing parameters
title = sprintf("'%s'", x), margin=-0.05,
vertex.color=NA, vertex.frame.color=NA,
vertex.label.font=2, vertex.label.cex=1.5, asp=0.5,
edge.width=1.5, edge.color='black', edge.arrow.size=0)
Run Code Online (Sandbox Code Playgroud)