我想使用R来比较书面文本和提取元素之间不同的部分.
考虑a和b两个文本段落.一个是另一个的修改版本:
a <- "This part is the same. This part is old."
b <- "This string is updated. This part is the same."
Run Code Online (Sandbox Code Playgroud)
我想比较两个字符串并接收字符串的一部分,该部分对于两个中的任何一个都是唯一的输出,最好是两个输入字符串分开.
预期产量:
stringdiff <- list(a = " This part is old.", b = "This string is updated. ")
> stringdiff
$a
[1] " This part is old."
$b
[1] "This string is updated. "
Run Code Online (Sandbox Code Playgroud)
我尝试过两个字符串之间不同的Extract字符的解决方案,但这只比较了唯一的字符.简单比较R中两个文本的答案更接近,但仍然只比较独特的单词.
有没有办法在没有太多麻烦的情况下获得预期的产出?
对于示例数据框:
df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7",
"a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy",
"ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code",
"name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character",
"collector")), name = structure(list(), class = c("collector_character",
"collector")), number = structure(list(), class = c("collector_double",
"collector"))), .Names = c("code", "name", "number")), default = structure(list(), …Run Code Online (Sandbox Code Playgroud) 我的数据(dt1)如下:
dt1 <- structure(list(date = structure(c(NA, 17179, 17180, 17181, 17182,
17183, 17178, 17179, 17180, 17181, 17182, 17183), class = "Date"),
f = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L), y1 = c(68L,
43L, 99L, 53L, 12L, 20L, 29L, 49L, 68L, 15L, 71L, 88L), y2 = c(15L,
15L, 66L, 53L, 63L, 37L, 91L, 17L, 87L, 87L, 43L, 77L)), row.names = c(NA,
-12L), class = "data.frame")
date f y1 y2
1 12-01-17 0 68 15
2 13-01-17 …Run Code Online (Sandbox Code Playgroud)