wag*_*eeh 7 r dplyr tidyr tidyverse
所以我有这个数据集
# A tibble: 268 x 1
`Which of these social media platforms do you have an account in right now?`
<chr>
1 Facebook, Instagram, Twitter, Snapchat, Reddit, Signal
2 Reddit
3 Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora
4 Facebook, Instagram, Twitter, Snapchat
5 Facebook, Instagram, TikTok, Snapchat
6 Facebook, Instagram, Twitter, Linkedin, Snapchat
7 Facebook, Instagram, TikTok, Linkedin, Snapchat, Reddit
8 Facebook, Instagram, Snapchat
9 Linkedin, Reddit
10 Facebook, Instagram, Twitter, TikTok
# ... with 258 more rows
Run Code Online (Sandbox Code Playgroud)
我想将其分成多个列,每个变量上有“是”和“否”,如下所示
# A tibble: 268 x 8
Id Facebook Instagram Reddit Signal Snapchat TikTok Twitter
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 No No No No No No Yes
2 2 Yes Yes No No Yes No Yes
3 3 No Yes No Yes No Yes No
4 4 No Yes No No Yes No No
5 5 No Yes No Yes Yes Yes Yes
6 6 No Yes No No No No No
7 7 No No Yes Yes No Yes Yes
8 8 No No Yes No No No Yes
9 9 No No Yes No Yes Yes No
10 10 No Yes Yes Yes Yes No Yes
Run Code Online (Sandbox Code Playgroud)
所以我写了这段代码来做到这一点
library(tidyverse)
library(tidytext)
Survey %>%
mutate(Id = row_number(), HasAccount = "Yes") %>%
unnest_tokens(Network, `Which of these social media platforms do you have an account in right now?`, to_lower = F) %>%
spread(Network, HasAccount, fill = "No")
Run Code Online (Sandbox Code Playgroud)
但我收到这个错误
Erreur : Must extract column with a single valid subscript.
x Subscript `var` has size 268 but must be size 1.
Run Code Online (Sandbox Code Playgroud)
> dput(head(Survey[1:5]))
structure(list(Horodateur = structure(c(1619171956.596, 1619172695.039,
1619173104.83, 1619174548.534, 1619174557.538, 1619174735.457
), tzone = "UTC", class = c("POSIXct", "POSIXt")), `To_which_gender_you_identify_the_most?` = c("Male",
"Female", "Male", "Female", "Female", "Female"), What_is_your_age_group = c("[18-24[",
"[10,18[", "[18-24[", "[18-24[", "[18-24[", "[25,34["), How_much_time_do_you_spend_on_social_media = c("1-5 hours",
"1-5 hours", ">10 hours", "5-10 hours", "5-10 hours", "1-5 hours"
), `Which_of_these_social_media_platforms_do_you_have_an_account_in_right_now?` = c("Facebook, Instagram, Twitter, Snapchat, Reddit, Signal",
"Reddit", "Facebook, Instagram, Twitter, Linkedin, Snapchat, Reddit, Quora",
"Facebook, Instagram, Twitter, Snapchat", "Facebook, Instagram, TikTok, Snapchat",
"Facebook, Instagram, Twitter, Linkedin, Snapchat")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
编辑:根据@CSJCampbell 的回答编辑了问题。编辑:添加了我正在使用的数据集的片段。
第一个参数mutate必须是 data.frame。您没有命名 data.frame df,因此该函数df被传递给mutate.
args(df)\n# function (x, df1, df2, ncp, log = FALSE) \n# NULL\nRun Code Online (Sandbox Code Playgroud)\n编辑:更新后,您添加了dput数据输出。运行你的代码给我错误:
Survey %>%\n mutate(Id = row_number(), HasAccount = "Yes") %>%\n unnest_tokens(Network, `Which of these social media platforms do you have an account in right now?`, to_lower = F)\n# Error in check_input(x) : \n# Input must be a character vector of any length or a list of character\n# vectors, each of which has a length of 1.\nRun Code Online (Sandbox Code Playgroud)\n您的dput列以下划线命名:
colnames(Survey)[5]\n# "Which_of_these_social_media_platforms_do_you_have_an_account_in_right_now?"\nRun Code Online (Sandbox Code Playgroud)\n重命名列:
\nSurvey %>%\n transmute(Id = row_number(), HasAccount = "Yes", \n Platforms = `Which_of_these_social_media_platforms_do_you_have_an_account_in_right_now?`) %>% \n unnest_tokens(Network, Platforms) %>% \n spread(Network, HasAccount, fill = "No")\n# # A tibble: 6 x 10\n# Id facebook instagram linkedin quora reddit\n# <int> <chr> <chr> <chr> <chr> <chr> \n# 1 1 Yes Yes No No Yes \n# 2 2 No No No No Yes \n# 3 3 Yes Yes Yes Yes Yes \n# 4 4 Yes Yes No No No \n# 5 5 Yes Yes No No No \n# 6 6 Yes Yes Yes No No \n# # \xe2\x80\xa6 with 4 more variables: signal <chr>,\n# # snapchat <chr>, tiktok <chr>, twitter <chr>\nRun Code Online (Sandbox Code Playgroud)\n
| 归档时间: |
|
| 查看次数: |
34989 次 |
| 最近记录: |