DN1*_*DN1 2 r dplyr data.table
我遇到一个问题,我通常将一列中的多行字符串数据折叠到一列中,但由于某种原因,代码没有按照我的预期进行。
我的数据如下所示:
Genes Source Type
1: LZIC Source1 Secondary
2: LZIC Source2 Lead
3: KIF1B Source1 Secondary
4: CASZ1 Source1 Secondary
5: CASZ1 Source4 Secondary
Run Code Online (Sandbox Code Playgroud)
我想通过基因进行压缩,我使用本网站上类似问题的代码来执行此操作,例如:
source <- df %>%
group_by(Genes) %>%
summarize(text = str_c(Source, collapse = ", "))
type <- df %>%
group_by(Genes) %>%
summarize(text = str_c(Type, collapse = ", "))
Run Code Online (Sandbox Code Playgroud)
但是,这些的输出看起来并不像我期望的那样,对于我创建的每个变量,我都会得到一行,其中所有源或类型都作为字符串,而没有其他内容。
我想要得到的输出是:
Genes Source Type
1: LZIC Source1, Source1 Secondary, Lead
2: KIF1B Source1 Secondary
3: CASZ1 Source1, Source4 Secondary, Secondary
Run Code Online (Sandbox Code Playgroud)
我的代码有问题吗?在其他情况下它对我有用。我也尝试过修改代码以同时进行两列压缩,但分别失败了。
输入数据:
structure(list(Genes = c("LZIC", "CDC14A", "KIF1B", "CASZ1",
"CASZ1"), Source = c("BPICE_UKBfinemapCommon(1210)", "GxL_Fuentes_Educ_T2nov_TRANS",
"BPICE_UKBfinemapCommon(1210)", "BPICE_UKBfinemapCommon(1210)",
"BPICE_UKBfinemapCommon(1210)"), Type = c("Secondary", "Lead",
"Secondary", "Secondary", "Secondary")), row.names = c(NA, -5L
), class = c("data.table", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
尝试这个。您可以使用dplyr分组依据Genes,然后使用summarise_all()andtoString()函数以获得预期结果。这是我使用您共享的数据的代码df:
library(dplyr)
#Code
newdf <- df %>% group_by(Genes) %>% summarise_all(toString)
Run Code Online (Sandbox Code Playgroud)
输出:
# A tibble: 3 x 3
Genes Source Type
<chr> <chr> <chr>
1 CASZ1 Source1, Source4 Secondary, Secondary
2 KIF1B Source1 Secondary
3 LZIC Source1, Source2 Secondary, Lead
Run Code Online (Sandbox Code Playgroud)
或者使用base R:
#Code2
newdf <- aggregate(cbind(Source,Type)~Genes,df,toString)
Run Code Online (Sandbox Code Playgroud)
输出:
Genes Source Type
1 CASZ1 Source1, Source4 Secondary, Secondary
2 KIF1B Source1 Secondary
3 LZIC Source1, Source2 Secondary, Lead
Run Code Online (Sandbox Code Playgroud)