我有一张表,其中包含热编码的变量。我想将这些变量折叠到一列中。例如,任何具有“high”、“med”或“low”的列,我希望成为具有 high = 0、med = 1 和 low = 2 的数字编码的一列。我如何在 R 中执行此dplyr操作?我怀疑转向会有所帮助,但我不知道从哪里开始。生成的列名称应包含三列的名称,不带 high、med、low 名称。例如,我会将 columns d-high_cm1、d-med_cm1、转换d-low_cm1为d-cm1数字编码。
输入:
sex age cost_cm d-high_cm1 d-med_cm1 d-low_cm1 c-high_cm1 c-med_cm1 c-low_cm1
f old 1 1 0 0 1 0 0
m young 0 1 0 0 1 0 0
m old 0 0 1 0 0 1 0
f young 0 1 0 0 0 0 1
m old 1 0 0 1 0 0 1
Run Code Online (Sandbox Code Playgroud)
预期输出:
sex age cost_cm d-cm1 c-cm1
f old 1 0 0
m young 0 0 0
m old 0 1 1
f young 0 0 2
m old 1 2 2
Run Code Online (Sandbox Code Playgroud)
我们可以做
\nlibrary(stringr)\nlibrary(dplyr)\nlibrary(tidyr)\ndf1 %>% \n mutate(across(contains("-"), ~ case_when(str_detect(cur_column(),\n 'low') ~ . * 2, str_detect(cur_column(), 'med') ~ . * 1,\n TRUE ~ .* 0))) %>%\n rename_with(~ str_replace(., "-(\\\\w+)_(\\\\w+)", "-\\\\2_\\\\1"), contains('-')) %>% \n pivot_longer(cols = contains('-'), names_to = c(".value"), \n names_pattern = "^([^_]+)_.*")%>% \n group_by(sex, age, cost_cm) %>% \n summarise(across(everything(), max), .groups = 'drop')\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n# A tibble: 5 \xc3\x97 5\n sex age cost_cm `d-cm1` `c-cm1`\n <chr> <chr> <int> <dbl> <dbl>\n1 f old 1 0 0\n2 f young 0 0 2\n3 m old 0 1 1\n4 m old 1 2 2\n5 m young 0 0 0\nRun Code Online (Sandbox Code Playgroud)\n或者使用base R
lst1 <- lapply(split.default(df1[-c(1:3)], sub("-[^_]+", "", \n names(df1)[-(1:3)])), function(x) do.call(pmax, x * (0:2)[col(x)]))\ncbind(df1[1:3], lst1)\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n sex age cost_cm c_cm1 d_cm1\n1 f old 1 0 0\n2 m young 0 0 0\n3 m old 0 1 1\n4 f young 0 2 0\n5 m old 1 2 2\nRun Code Online (Sandbox Code Playgroud)\ndf1 <- structure(list(sex = c("f", "m", "m", "f", "m"), age = c("old", \n"young", "old", "young", "old"), cost_cm = c(1L, 0L, 0L, 0L, \n1L), `d-high_cm1` = c(1L, 1L, 0L, 1L, 0L), `d-med_cm1` = c(0L, \n0L, 1L, 0L, 0L), `d-low_cm1` = c(0L, 0L, 0L, 0L, 1L), `c-high_cm1` = c(1L, \n1L, 0L, 0L, 0L), `c-med_cm1` = c(0L, 0L, 1L, 0L, 0L), `c-low_cm1` = c(0L, \n0L, 0L, 1L, 1L)), class = "data.frame", row.names = c(NA, -5L\n))\nRun Code Online (Sandbox Code Playgroud)\n