swe*_*ity 3 string variables split r dataframe
我对 R 比较陌生。我的问题并不完全像标题那么简单。这是一个示例df:
id amenities
1 wireless internet, air conditioning, pool, kitchen
2 pool, kitchen, washer, dryer
3 wireless internet, kitchen, dryer
4
5 wireless internet
Run Code Online (Sandbox Code Playgroud)
这就是我想要df的样子:
id wireless internet air conditioning pool kitchen washer dryer
1 1 1 1 1 0 0
2 0 0 1 1 1 1
3 1 0 0 1 0 1
4 0 0 0 0 0 0
5 1 0 0 0 0 0
Run Code Online (Sandbox Code Playgroud)
重现数据的示例代码
df <- data.frame(id = c(1, 2, 3, 4, 5),
amenities = c("wireless internet, air conditioning, pool, kitchen",
"pool, kitchen, washer, dryer",
"wireless internet, kitchen, dryer",
"",
"wireless internet"),
stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
FWIW,这是一个基本的 R 方法(假设df包含问题中所示的数据)
dat <- with(df, strsplit(amenities, ', '))
df2 <- data.frame(id = factor(rep(df$id, times = lengths(dat)),
levels = df$id),
amenities = unlist(dat))
df3 <- as.data.frame(cbind(id = df$id,
table(df2$id, df2$amenities)))
Run Code Online (Sandbox Code Playgroud)
这导致
> df3
id air conditioning dryer kitchen pool washer wireless internet
1 1 1 0 1 1 0 1
2 2 0 1 1 1 1 0
3 3 0 1 1 0 0 1
4 4 0 0 0 0 0 0
5 5 0 0 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
分解正在发生的事情:
dat <- with(df, strsplit(amenities, ', '))将amenities变量拆分为', ',结果是
> dat
[[1]]
[1] "wireless internet" "air conditioning" "pool"
[4] "kitchen"
[[2]]
[1] "pool" "kitchen" "washer" "dryer"
[[3]]
[1] "wireless internet" "kitchen" "dryer"
[[4]]
character(0)
[[5]]
[1] "wireless internet"
Run Code Online (Sandbox Code Playgroud)第二行将dat其转换为向量,然后我们id通过重复原始id值与该 的设施数量一样多的次数来添加 和 列id。这导致
> df2
id amenities
1 1 wireless internet
2 1 air conditioning
3 1 pool
4 1 kitchen
5 2 pool
6 2 kitchen
7 2 washer
8 2 dryer
9 3 wireless internet
10 3 kitchen
11 3 dryer
12 5 wireless internet
Run Code Online (Sandbox Code Playgroud)使用该table()函数创建列联表,然后添加一id列。
| 归档时间: |
|
| 查看次数: |
7732 次 |
| 最近记录: |