Mak*_*oni 3 r gsub str-replace dataframe data.table
Imagine that I have a dataframe or datatable with strings column where one row looks like this:
a1; b: b1, b2, b3; c: c1, c2, c3; d: d1, d2, d3, d4
Run Code Online (Sandbox Code Playgroud)
and a look-up table with codes for mapping each of these strings. For example:
string code
a1 10
b1 20
b2 30
b3 40
c1 50
c2 60
...
Run Code Online (Sandbox Code Playgroud)
I would like to have a mapping function that maps this string to code:
10; b: 20, 30, 40; c: 50, 60, 70; d: 80, 90, 100
Run Code Online (Sandbox Code Playgroud)
I have a column of these strings in data.table/data.frame (more tha 100k) so any quick solution would be very appreciated.
Note that this string length is not always the same... for example in one row i can have strings a to d, in other a to f.
EDIT:
We got the solution for case above, however imagine I have a string like this:
a; b: peter, joe smith, john smith; c: luke, james, john smith
Run Code Online (Sandbox Code Playgroud)
How to replace these knowning that john smith can have two different codes depending on whether it belongs to b or c category?
Also, string can contain words with space in between them.
EDIT 2:
string code
a 10
peter 20
joe smith 30
john smith 40
luke 50
james 60
john smith 70
...
Run Code Online (Sandbox Code Playgroud)
The final solution is:
10; b: 20, 30, 40; c: 50, 60, 70
Run Code Online (Sandbox Code Playgroud)
EDIT 3 As suggested, I have opened a new question for next issue: How to replace repeated strings and space in-between with look-up codes in R
We can use gsubfn
library(gsubfn)
gsubfn("([a-z]\\d+)", setNames(as.list(df1$code), df1$string), str1)
#[1] "10; b: 20, 30, 40; c: 50, 60, 70; d: 80, 90, 100, 110"
Run Code Online (Sandbox Code Playgroud)
For the edited version
gsubfn("(\\w+ ?\\w+?)", setNames(as.list(df2$code), df2$string), str2)
#[1] "a; b: 20, 30, 40; c: 50, 60, 40"
Run Code Online (Sandbox Code Playgroud)
str1 <- "a1; b: b1, b2, b3; c: c1, c2, c3; d: d1, d2, d3, d4"
df1 <- structure(list(string = c("a1", "b1", "b2", "b3", "c1", "c2",
"c3", "d1", "d2", "d3", "d4"), code = c(10L, 20L, 30L, 40L, 50L,
60L, 70L, 80L, 90L, 100L, 110L)), class = "data.frame",
row.names = c(NA, -11L))
str2 <- "a; b: peter, joe smith, john smith; c: luke, james, john smith"
df2 <- structure(list(string = c("a", "peter", "joe smith", "john smith",
"luke", "james", "john smith"), code = c(10L, 20L, 30L, 40L,
50L, 60L, 70L)), class = "data.frame", row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
71 次 |
| 最近记录: |