小编BRZ*_*BRZ的帖子

Twitter数据分析 - 术语文档矩阵中的错误

试图对twitter数据进行一些分析.下载推文并使用下面的推文文本创建语料库

# Creating a Corpus
wim_corpus = Corpus(VectorSource(wimbledon_text))

Run Code Online (Sandbox Code Playgroud)

在尝试创建如下的TermDocumentMatrix时,我收到错误和警告.

tdm = TermDocumentMatrix(wim_corpus, 
                       control = list(removePunctuation = TRUE, 
                                      stopwords =  TRUE, 
                                      removeNumbers = TRUE, tolower = TRUE)) 

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),    : 'i, j, v' different lengths


In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
 all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In TermDocumentMatrix.VCorpus(corpus) …

Run Code Online (Sandbox Code Playgroud)

BRZ*_*BRZ

lucky-day

8
推荐指数

3
解决办法

2万
查看次数

substring +获取关键字周围的单词

如果我有一个字符串:

moon <- "The cow jumped over the moon with a silver plate in its mouth"

Run Code Online (Sandbox Code Playgroud)

有没有办法可以提取附近的单词"moon".邻居可以是"月亮"周围的2或3个单词.

所以,如果我的

"The cow jumped over the moon with a silver plate in its mouth"

Run Code Online (Sandbox Code Playgroud)

我希望我的输出只是:

"jumped over the moon with a silver"

Run Code Online (Sandbox Code Playgroud)

我知道str_locate如果我想通过字符提取我可以使用,但不知道如何使用"单词"来完成它.这可以在R中完成吗？

感谢和问候,Simak

BRZ*_*BRZ

2013 08-01

5
推荐指数

1
解决办法

807
查看次数

运行正则表达式时假冒转义错误

昨天，我在正则表达式匹配方面获得了帮助，它可以独立运行。但是当放入此代码时，我收到了“伪造的转义错误”。代码和追溯如下。您能指出我做错了什么吗？

#!/usr/bin/env python
import re

sf = open("a.txt","r")
out = open("b.txt","w")
regex = re.compile(r'Merging\s+\d+[^=]*=\s*\'\w+@\w+\x\w+\'\\"')


for line in sf:
    m = regex.findall(line)
    for i in m:
       print >> out,line,

Run Code Online (Sandbox Code Playgroud)

追溯为：

追溯（最近一次通话）： File "match.py", line 6, in <module> regex = re.compile(r'Merging\s+\d+[^=]*=\s*\'\w+@\w+\x\w+\'\\"') File "/usr/lib/python2.7/re.py", line 190, in compile return _compile(pattern, flags) File "/usr/lib/python2.7/re.py", line 242, in _compile raise error, v # invalid expression sre_constants.error: bogus escape: '\\x'

python regex

BRZ*_*BRZ

2018 06-28

5
推荐指数

1
解决办法

1万
查看次数

从下一行的值向数据框添加一列

我有一个具有以下结构和值的数据框：

total_size <- 5000;
id line_number
1   1232
2   1456
3   1832
4   2002

Run Code Online (Sandbox Code Playgroud)

我需要使用 next_row 中的值动态地将新列添加到数据框中。即：新列值应为：下一行-1 的(line_number)。最后一行的值应使用total_size 值填充。

我需要产生的最终输出是：

id line_number   end_line_number
1   1232         1455
2   1456         1831
3   1832         2001
4   2002         5000

Run Code Online (Sandbox Code Playgroud)

知道如何在 R 中动态生成它吗？

BRZ*_*BRZ

2013 07-12

4
推荐指数

1
解决办法

4602
查看次数

在R中使用Perl RegExp

我有一个字符串,我试图从中提取关键字之前的术语.

str = "This is a <Keyword>(-)Controlled design"

Run Code Online (Sandbox Code Playgroud)

关键字和受控制之间或" - "之间可以有空格.我需要在"受控"前提取.在Perl中,我使用下面的正则表达式:

/(\w+)[- ]controlled/i)

Run Code Online (Sandbox Code Playgroud)

在处理反斜杠和设置后,我在R中尝试相同perl=TRUE.但它不起作用.如何使用此表达式来提取R？是否有可以使用的备用表达式/库？

谢谢,simak

regex r

BRZ*_*BRZ

2015 01-18

3
推荐指数

1
解决办法

215
查看次数

使用 str_detect 检测字符串中的模式

我正在尝试使用来检测字符串是否包含特定模式str_detect。我的图案是一系列“....” - 确切的点数未知。我正在尝试使用str_detect如下......

然而，在这种特殊情况下，str_detect返回 TRUE。想知道我在哪里做错了，是否str_detect是正确使用的函数？希望这里有人可以提供帮助吗？

library(stringr)
dot_pat="\\.........................";
str="The primary.objective is of the study."
str_detect(str,dot_pat)

Run Code Online (Sandbox Code Playgroud)

这将返回 TRUE。我期待 FALSE，因为其中的点str不遵循模式。

预先感谢，西马克

r stringr

BRZ*_*BRZ

2013 07-23

3
推荐指数

1
解决办法

2万
查看次数

标签统计

r ×5

regex ×2

python ×1

stringr ×1

Twitter数据分析 - 术语文档矩阵中的错误

substring +获取关键字周围的单词

运行正则表达式时假冒转义错误

从下一行的值向数据框添加一列

在R中使用Perl RegExp

使用 str_detect 检测字符串中的模式

标签 统计

小编BRZ_BRZ的帖子

标签统计