假设下一个数据帧:
# code countries
#1 A001 [[Germany, China, Japan], [Chile, Mexico], [Poland]]
#2 A002 [[], [Japan], [Singapore, Indonesia, Micronesia]]
#3 A003 [[Tuvalu, Chile], [], [North Macedonia, Sweden]]
Run Code Online (Sandbox Code Playgroud)
我怎么能[在它第一次出现之后和]最后一次出现之前删除所有内容?
在某种程度上,数据帧可能如下所示:
code countries
#1 A001 [Germany, China, Japan, Chile, Mexico, Poland]
#2 A002 [Japan, Singapore, Indonesia, Micronesia]
#3 A003 [Tuvalu, Chile, North Macedonia, Sweden]
Run Code Online (Sandbox Code Playgroud)
df <- data.frame(code=c('A001', 'A002', 'A003'),
countries=c('[[Germany, China, Japan], [Chile, Mexico], [Poland]]',
'[[], [Japan], [Singapore, Indonesia, Micronesia]]',
'[[Tuvalu, Chile], [], [North Macedonia, Sweden]]')
)
Run Code Online (Sandbox Code Playgroud)
下面是使用的方法regex在base R
df$countries <- gsub("(?<=\\[),\\s*|(?<=\\,)\\s+,", "",
gsub("(^\\[|\\]$)(*SKIP)(*FAIL)|([][])", "", df$countries, perl = TRUE), perl = TRUE)
df$countries
#[1] "[Germany, China, Japan, Chile, Mexico, Poland]"
#[2] "[Japan, Singapore, Indonesia, Micronesia]"
#[3] "[Tuvalu, Chile, North Macedonia, Sweden]"
Run Code Online (Sandbox Code Playgroud)
或者另一种选择是提取单词然后将paste它们放在一起
library(stringr)
library(purrr)
df$countries <- map_chr(str_extract_all(df$countries, "\\w+"),
~ sprintf("[%s]", toString(.x)))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
31 次 |
| 最近记录: |