小编Rco*_*ing的帖子

如何从R中的字符串中删除某个模式中的重复单词

我的目标是仅从字符串集中的括号中删除重复的单词.

a = c( 'I (have|has|have) certain (words|word|worded|word) certain',
'(You|You|Youre) (can|cans|can) do this (works|works|worked)',
'I (am|are|am) (sure|sure|surely) you know (what|when|what) (you|her|you) should (do|do)' )

Run Code Online (Sandbox Code Playgroud)

我想要的就是这样

a
[1]'I (have|has) certain (words|word|worded) certain'
[2]'(You|Youre) (can|cans) do this (works|worked)'
[3]'I (am|are) pretty (sure|surely) you know (what|when) (you|her) should (do|)'

Run Code Online (Sandbox Code Playgroud)

为了得到结果,我使用了这样的代码

a = gsub('\\|', " | ",  a)
a = gsub('\\(', "(  ",  a)
a = gsub('\\)', "  )",  a)
a = vapply(strsplit(a, " "), function(x) paste(unique(x), collapse = " "), character(1L))

Run Code Online (Sandbox Code Playgroud)

但是,它导致了不良产出.

a …

Run Code Online (Sandbox Code Playgroud)

Rco*_*ing

lucky-day

3
推荐指数

1
解决办法

793
查看次数

计算数据框中字符串的出现次数

通过R，我可以轻松地创建一个数据帧，其中包含来自字符串列表的某些字符串模式的频率。

library(stringr)
library(tm)
library(dplyr)    
text = c('i am so hhappy happy now','you look ssad','sad day today','noway')
dat = sapply(c('happy', 'sad'), function(i) str_count(text, i))
dat = data.frame(dat)  
dat = dat %>% mutate(Sentiment = (happy)-(sad))

Run Code Online (Sandbox Code Playgroud)

结果，我可以有一个这样的数据框

  happy sad Sentiment
1     2   0         2
2     0   1        -1
3     0   1        -1
4     0   0         0

Run Code Online (Sandbox Code Playgroud)

在Python中，我可以假设其余代码 sapply()

import pandas as pd
text = ['i am so hhappy happy now','you look ssad','sad day today','noway']
????
dat = pd.DataFrame(dat)
dat['Sentiment'] …

Run Code Online (Sandbox Code Playgroud)

python string apply dataframe pandas

Rco*_*ing

2017 08-30

2
推荐指数

1
解决办法

2670
查看次数

标签统计

apply ×1

dataframe ×1

pandas ×1

python ×1

r ×1

string ×1

如何从R中的字符串中删除某个模式中的重复单词

计算数据框中字符串的出现次数

标签 统计

小编Rco_ing的帖子

标签统计