在第一次和最后一次出现后删除特定的错字

Alv*_*nez 0 r dataframe

假设下一个数据帧:

#  code                                            countries
#1 A001 [[Germany, China, Japan], [Chile, Mexico], [Poland]]
#2 A002     [[], [Japan], [Singapore, Indonesia, Micronesia]]
#3 A003       [[Tuvalu, Chile], [], [North Macedonia, Sweden]]
Run Code Online (Sandbox Code Playgroud)

我怎么能[在它第一次出现之后和]最后一次出现之前删除所有内容?

在某种程度上,数据帧可能如下所示:

   code countries
#1 A001 [Germany, China, Japan, Chile, Mexico, Poland]
#2 A002     [Japan, Singapore, Indonesia, Micronesia]
#3 A003       [Tuvalu, Chile, North Macedonia, Sweden]

Run Code Online (Sandbox Code Playgroud)

数据

df <- data.frame(code=c('A001', 'A002', 'A003'),
                 countries=c('[[Germany, China, Japan], [Chile, Mexico], [Poland]]',
                             '[[], [Japan], [Singapore, Indonesia, Micronesia]]',
                             '[[Tuvalu, Chile], [], [North Macedonia, Sweden]]')
                )
Run Code Online (Sandbox Code Playgroud)

akr*_*run 5

下面是使用的方法regexbase R

df$countries <- gsub("(?<=\\[),\\s*|(?<=\\,)\\s+,", "", 
    gsub("(^\\[|\\]$)(*SKIP)(*FAIL)|([][])", "", df$countries, perl = TRUE), perl = TRUE)
df$countries
#[1] "[Germany, China, Japan, Chile, Mexico, Poland]" 
#[2] "[Japan, Singapore, Indonesia, Micronesia]"    
#[3]  "[Tuvalu, Chile, North Macedonia, Sweden]"   
Run Code Online (Sandbox Code Playgroud)

或者另一种选择是提取单词然后将paste它们放在一起

library(stringr)
library(purrr)
df$countries <- map_chr(str_extract_all(df$countries, "\\w+"), 
     ~ sprintf("[%s]", toString(.x)))
Run Code Online (Sandbox Code Playgroud)