用 R 替换多个文件中的多个字符串

Mar*_*arc 5 r file-management gsub

我在一个文件夹中有大约 700,000 个文件,我需要在其中查找多个字符串并将其替换为不同的其他字符串(所有 4 个字符代码)。不确定文件中是否存在字符串。我正在尝试使用 gsub,但我找不到如何使用正则表达式进行操作。有人可以告诉我处理这项任务的好方法吗?

这是我到目前为止使用的代码。它只适用于一条y <- gsub(...)指令,但不适用于我的目的,显然是因为在定义 y 变量时只考虑了最后一条 gsub 指令......

chm_files <- list.files(getwd(), pattern=("^[[:digit:]]*.chm$"), full.names=F)

for(chm_file in chm_files) {
  x <- readLines(chm_file)
  y <- gsub("AG02|AG07|AG05|AG18|AG19|AG08|AG09|AG17", "AGRL", x)
  y <- gsub("SB28|SB42|SB43|SB33|SB41|SB34|SB39|SB35", "SWHT", x)
  y <- gsub("WB28|WB42|WB43|WB32|WB09|WB33|WB41|WB26", "BARL", x)
  y <- gsub("WW02|WW25|WW08|WW31|WW05|WW28|WW19|WW42", "WWHT", x)
  cat(y, file=chm_file, sep="\n")
}
Run Code Online (Sandbox Code Playgroud)

Mat*_*yly 4

I am sure there are already numerous pre-built functions for this task in various R-packages, but anyhow I just cooked this one up for myself and others to use/modify. Apart from the tasks request above it also prints out a tracking log of the count of all changes made across files function: multi_replace.

Here is some example code of how it should be run

# local directory with files you want to work with
setwd("C:/Users/DW/Desktop/New folder")
# get a list of files based on a pattern of interest e.g. .html, .txt, .php 
filer = list.files(pattern=".php")
# f - list of original string values you want to change
f <- c("localhost","dbtest","root","oldpassword")
# r - list of values to replace the above values with
# make sure the indexing of f & r
r <- c("newhost", "newdb", "newroot", "newpassword")

# Run the function and watch all your changes take place ;)
tracking_sheet <- multi_replace(filer, f, r)
tracking_sheet
Run Code Online (Sandbox Code Playgroud)