从R中的句子中提取对对象的动作

Question

从R中的句子中提取对对象的动作

我想从R中的句子列表中提取对对象执行的操作。以简要介绍一下。

S = “The boy opened the box. He took the chocolates. He ate the chocolates. 
     He went to school”

Run Code Online (Sandbox Code Playgroud)

我正在寻找以下组合：

Opened box
Took chocolates
Ate chocolates
Went school

Run Code Online (Sandbox Code Playgroud)

我已经能够分别提取动词和名词。但是无法找到一种将它们结合起来以获得这种见解的方法。

library(openNLP)
library(openNLPmodels.en)
library(NLP)

s = as.String("The boy opened the box. He took the chocolates. He ate the 
               chocolates. He went to school")

tagPOS<-  function(x, ...) {
s <- as.String(x)
word_token_annotator<- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags<- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged<- paste(sprintf("%s/%s", s[a3w], POStags), collapse = ",")
list(POStagged = POStagged, POStags = POStags)
}

nouns = c("/NN", "/NNS","/NNP","/NNPS")
verbs = c("/VB","/VBD","/VBG","/VBN","/VBP","/VBZ")

s = tolower(s)
s = gsub("\n","",s)
s = gsub('"',"",s)

tags = tagPOS(s)
tags = tags$POStagged
tags = unlist(strsplit(tags, split=","))

nouns_present = tags[grepl(paste(nouns, collapse = "|"), tags)]
nouns_present = unique(nouns_present)
verbs_present = tags[grepl(paste(verbs, collapse = "|"), tags)]
verbs_present = unique(verbs_present)
nouns_present<- gsub("^(.*?)/.*", "\\1", nouns_present)
verbs_present = gsub("^(.*?)/.*", "\\1", verbs_present)
nouns_present = 
paste("'",as.character(nouns_present),"'",collapse=",",sep="")
verbs_present = 
paste("'",as.character(verbs_present),"'",collapse=",",sep="")

Run Code Online (Sandbox Code Playgroud)

这个想法是要建立一个网络图，在该图上单击动词节点，所有附加到它的对象都会出现，反之亦然。任何帮助都会很棒。

Answer 1

小智 0

我假设您还想获取关键动作动词之前和之后的单词。我能够通过使用tidytextpackage 来实现这一点。（参考链接：https://uc-r.github.io/word_relationships）

library(tidytext)
library(tidyverse)

#first create another column with divided up text strings by n(i set as every two words paired together)
mydf <-unnest_tokens(comments, "tokens", Response, token = "ngrams", n=2, to_lower = TRUE, drop = FALSE)

#remove stopwords:
mydf %>%
  separate(tokens, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% stop_words$word,
         !word2 %in% stop_words$word,
         ) %>%
  count(word1, word2, sort = TRUE) %>% view()

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，3 月前
查看次数：	285 次
最近记录：	8 年，3 月前