vag*_*ond 2 r vector gsub mapply
我想从多个字符向量中删除多个模式。目前我要去:
a.vector <- gsub("@\\w+", "", a.vector)
a.vector <- gsub("http\\w+", "", a.vector)
a.vector <- gsub("[[:punct:]], "", a.vector)
Run Code Online (Sandbox Code Playgroud)
等等等等
这很痛苦。我正在看这个问题和答案:R: gsub, pattern = vector and replacement = vector但它没有解决问题。
themapply和 themgsub都没有工作。我做了这些载体
remove <- c("@\\w+", "http\\w+", "[[:punct:]]")
substitute <- c("")
Run Code Online (Sandbox Code Playgroud)
既不mapply(gsub, remove, substitute, a.vector)也不mgsub(remove, substitute, a.vector) worked.
a.vector 看起来像这样:
[4951] "@karakamen: Suicide amongst successful men is becoming rampant. Kudos for staing the conversation. #mental"
[4952] "@stiphan: you are phenomenal.. #mental #Writing. httptxjwufmfg"
Run Code Online (Sandbox Code Playgroud)
我想要:
[4951] "Suicide amongst successful men is becoming rampant Kudos for staing the conversation #mental"
[4952] "you are phenomenal #mental #Writing" `
Run Code Online (Sandbox Code Playgroud)
我知道这个答案在现场很晚,但它源于我不喜欢必须手动列出grep函数内的删除模式(请参阅此处的其他解决方案)。我的想法是预先设置模式,将它们保留为字符向量,然后使用regexseparator粘贴它们(即“需要时”)"|":
library(stringr)
remove <- c("@\\w+", "http\\w+", "[[:punct:]]")
a.vector <- str_remove_all(a.vector, paste(remove, collapse = "|"))
Run Code Online (Sandbox Code Playgroud)
是的,这确实与此处的其他一些答案有效相同,但我认为我的解决方案允许您保留原始的“字符删除向量” remove。
尝试使用|. 例如
>s<-"@karakamen: Suicide amongst successful men is becoming rampant. Kudos for staing the conversation. #mental"
> gsub("@\\w+|http\\w+|[[:punct:]]", "", s)
[1] " Suicide amongst successful men is becoming rampant Kudos for staing the conversation #mental"
Run Code Online (Sandbox Code Playgroud)
但是,如果您有大量模式,或者如果应用一种模式的结果与其他模式相匹配,这可能会成为问题。
考虑remove按照您的建议创建向量,然后将其应用到循环中
> s1 <- s
> remove<-c("@\\w+","http\\w+","[[:punct:]]")
> for (p in remove) s1 <- gsub(p, "", s1)
> s1
[1] " Suicide amongst successful men is becoming rampant Kudos for staing the conversation #mental"
Run Code Online (Sandbox Code Playgroud)
当然,这种方法需要扩展以将其应用于整个表或向量。但是如果你把它放到一个返回最终字符串的函数中,你应该能够将它传递给其中一个apply变体