df1 <- data.frame(freetext = c("open until monday night", "one more time to insert your coin"), numid = c(291,312))
df2 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5))
Run Code Online (Sandbox Code Playgroud)
我会使用自由文本列作为选项来合并两个数据框。然而,文本与删除或显示的某些单词并不完全相同。
是否有任何选项可以找到行之间相同单词的最大数量并根据此将它们合并?
这是预期输出的示例
df3 <- data.frame(freetext = c("open until night", "one time to insert your be"), aid = c(3,5), numid = c(291,312))
Run Code Online (Sandbox Code Playgroud) library(tidyverse)
library(fuzzyjoin)
df1 <- tibble(col1 = c("Apple Shipping", "Banana Shipping", "FedEX USA Ground",
"FedEx USA Commercial", "FedEx International"),
col2 = 1:5)
#> # A tibble: 5 x 2
#> col1 col2
#> <chr> <int>
#> 1 Apple Shipping 1
#> 2 Banana Shipping 2
#> 3 FedEX USA Ground 3
#> 4 FedEx USA Commercial 4
#> 5 FedEx International 5
df2 <- tibble(col3 = c("Banana", "FedEX USA"), col4 = c(700, 900))
#> # A tibble: 2 x 2
#> …Run Code Online (Sandbox Code Playgroud) 我有两个数据框:
第一个包含大量蛋白质,我对其进行了多次计算。这里有一个例子:
>Accession Description # Peptides A2 # PSM A2 # Peptides B2 # PSM B2 # Peptides C2 # PSM C2 # Peptides D2 # PSM D2 # Peptides E2 # PSM E2 # AAs MW [kDa] calc. pI
P01837 Ig kappa chain C region OS=Mus musculus PE=1 SV=1 - [IGKC_MOUSE] 10 319 8 128 8 116 7 114 106 11,8 5,41
P01868 Ig gamma-1 chain C region secreted form OS=Mus musculus GN=Ighg1 PE=1 SV=1 - [IGHG1_MOUSE] 13 251 …Run Code Online (Sandbox Code Playgroud)