我的 A 列有句子,B 列有一些单词。我想检查词性 B 列中的单词是否属于 A 列中的句子。
目前我可以使用以下代码获取单个句子的词性:
我试图获取与文本文件中每个句子相对应的词性。请为此建议代码。
s <- unlist(lapply(posText, function(x) { str_split(x, "\n") }))
tagPOS <- function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)
}
tagged_str <- tagPOS(s)
Run Code Online (Sandbox Code Playgroud)
使用 lapply 您可以标记多个句子。由于您没有提供可重复的数据,我创建了自己的数据。
代码
#Reproducible data - Quotes from Wuthering Heights by Emily Bronte
posText<- "I gave him my heart, and he took and pinched it to death; and flung it back to me.
People feel with their hearts, Ellen, and since he has destroyed mine, I have not power to feel for him."
library(stringr)
#Spliting into sentence based on carriage return
s <- unlist(lapply(posText, function(x) { str_split(x, "\n") }))
library(NLP)
library(openNLP)
tagPOS <- function(x, ...) {
s <- as.String(x)
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- Annotation(1L, "sentence", 1L, nchar(s))
a2 <- annotate(s, word_token_annotator, a2)
a3 <- annotate(s, Maxent_POS_Tag_Annotator(), a2)
a3w <- a3[a3$type == "word"]
POStags <- unlist(lapply(a3w$features, `[[`, "POS"))
POStagged <- paste(sprintf("%s/%s", s[a3w], POStags), collapse = " ")
list(POStagged = POStagged, POStags = POStags)
}
result <- lapply(s,tagPOS)
result <- as.data.frame(do.call(rbind,result))
Run Code Online (Sandbox Code Playgroud)
输出创建一个包含两列的数据框,其中一列是包含单词且标签由“/”分隔的句子。第二列具有按句子中出现的方式排序的一组标签。
输出:
> print(result)
POStagged
1 I/PRP gave/VBD him/PRP my/PRP$ heart/NN ,/, and/CC he/PRP took/VBD and/CC pinched/VBD it/PRP to/TO death/NN ;/: and/CC flung/VBD it/PRP back/RB to/TO me/PRP ./.
2 People/NNS feel/VBP with/IN their/PRP$ hearts/NNS ,/, Ellen/NNP ,/, and/CC since/IN he/PRP has/VBZ destroyed/VBN mine/NN ,/, I/PRP have/VBP not/RB power/NN to/TO feel/VB for/IN him/PRP ./.
POStags
1 PRP, VBD, PRP, PRP$, NN, ,, CC, PRP, VBD, CC, VBD, PRP, TO, NN, :, CC, VBD, PRP, RB, TO, PRP, .
2 NNS, VBP, IN, PRP$, NNS, ,, NNP, ,, CC, IN, PRP, VBZ, VBN, NN, ,, PRP, VBP, RB, NN, TO, VB, IN, PRP, .
>
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3282 次 |
| 最近记录: |