假设,我有一个数据集如下:
State <- c("CA", "WI", "TX", "MS", "NY", "KT", "UT", "CO", "PA", "SC")
Pov_rt <- c(25, 30, 35, 40, 45, 50, 10, 15, 25, 40)
df <- data. Frame(State, Pov_rt)
Run Code Online (Sandbox Code Playgroud)
我创建了一个新专栏
df[["pov_level"]] <- cut(
df$Pov_rt,
breaks = c(-Inf, 10, 20, 30, 40, Inf),
labels = c(
"Very Low Poverty (<10%)",
"Low Poverty (10-20%)",
"Medium Poverty (20-30%)",
"High Poverty (30-40%)",
"Very High Poverty (>40%)"
)
)
Run Code Online (Sandbox Code Playgroud)
我还想订一张桌子
table(df$pov_level, df$State)
CA CO KT MS NY PA SC TX UT WI
Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0
Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0
Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1
High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0
Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
如果我想将此表转换为数据框,我会这样做
x <- data.frame(unclass(table(df$pov_level, df$State)))
Run Code Online (Sandbox Code Playgroud)
而不是x <- as.data.frame(table(df$pov_level, df$State))因为这给了我一个未分组列表的数据框。
但是使用第一个有一个问题,有时我需要将mutate行名作为新列。
我想知道是否有更好的方法来做到这一点。
非常感谢!
我们可以使用as.data.frame.matrix
df2 <- as.data.frame.matrix(table(df$pov_level, df$State))\nRun Code Online (Sandbox Code Playgroud)\n-输出
\n> df2\n CA CO KT MS NY PA SC TX UT WI\nVery Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0\nLow Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0\nMedium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1\nHigh Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0\nVery High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0\n> str(df2)\n'data.frame': 5 obs. of 10 variables:\n $ CA: int 0 0 1 0 0\n $ CO: int 0 1 0 0 0\n $ KT: int 0 0 0 0 1\n $ MS: int 0 0 0 1 0\n $ NY: int 0 0 0 0 1\n $ PA: int 0 0 1 0 0\n $ SC: int 0 0 0 1 0\n $ TX: int 0 0 0 1 0\n $ UT: int 1 0 0 0 0\n $ WI: int 0 0 1 0 0\nRun Code Online (Sandbox Code Playgroud)\nrownames_to_column可以使用( tibble)从行名创建列
library(tibble)\nlibrary(dplyr)\ndf2 %>%\n rownames_to_column("pov_levels")\n pov_levels CA CO KT MS NY PA SC TX UT WI\n1 Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0\n2 Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0\n3 Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1\n4 High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0\n5 Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0\nRun Code Online (Sandbox Code Playgroud)\n或者如果我们想使用tidyverse
library(tidyr)\npivot_wider(df, names_from = State, values_from = Pov_rt,\n values_fn = length, values_fill = 0)\n# A tibble: 5 \xc3\x97 11\n pov_level CA WI TX MS NY KT UT CO PA SC\n <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>\n1 Medium Poverty (20-30%) 1 1 0 0 0 0 0 0 1 0\n2 High Poverty (30-40%) 0 0 1 1 0 0 0 0 0 1\n3 Very High Poverty (>40%) 0 0 0 0 1 1 0 0 0 0\n4 Very Low Poverty (<10%) 0 0 0 0 0 0 1 0 0 0\n5 Low Poverty (10-20%) 0 0 0 0 0 0 0 1 0 0\nRun Code Online (Sandbox Code Playgroud)\n或者更直接地使用tabyl
library(janitor)\ndf %>%\n tabyl(pov_level, State)\n pov_level CA CO KT MS NY PA SC TX UT WI\n Very Low Poverty (<10%) 0 0 0 0 0 0 0 0 1 0\n Low Poverty (10-20%) 0 1 0 0 0 0 0 0 0 0\n Medium Poverty (20-30%) 1 0 0 0 0 1 0 0 0 1\n High Poverty (30-40%) 0 0 0 1 0 0 1 1 0 0\n Very High Poverty (>40%) 0 0 1 0 1 0 0 0 0 0\nRun Code Online (Sandbox Code Playgroud)\n