小编Haa*_*kas的帖子

根据另一列中的值使用颜色有条件地填充特定列中的单元格

我有以下数据框:

col1 <- rep(c("A","B","C","D"),10)
col2 <- rep(c(1,0),10)
col3 <- rep(c(0,1),10)
col4 <- rep(c(1,0),10)
col5 <- rep(c(0,1),10)

test_df <- data.frame(col1, col2, col3, col4, col5, stringsAsFactors = F)
Run Code Online (Sandbox Code Playgroud)

我想根据 col1 中的值为多列中的特定行单元格着色,并在表中的两列之间添加一条垂直线(表示限制)(基于 col1 中的相同值)

例如,如果 col1 == "A",那么我想将 col2 和 col5 中的单元格着色为灰色,与 col1 == A 位于同一行。

在虚拟代码中:

if col1 == A: color columns(col2, col5), vert.line between col3 and col4
if col1 == B: color columns(col2, col3, col5), vert.line between col4 and col5
if col1 == C: color columns(col2, col4, col5), vert.line between col3 …
Run Code Online (Sandbox Code Playgroud)

r html-table

5
推荐指数
1
解决办法
2041
查看次数

使用因子变量 R 手动更改 y 轴刻度线标签

我有以下数据:

id <- rep(1:100)
A <- rep(c(0.12 ,0.25, 0.5, 1, 2), each = 20)
B <- rep(c(0.06, 0.03, 0.015, 0.12), each = 25)
C <- rep(c(0.015, 0.03, 0.06, 0.12, 0.25), each = 20)

df <- data.frame(id,A,B,C,stringsAsFactors = F)
Run Code Online (Sandbox Code Playgroud)

我将 A、B 和 C 分成两列。请注意,A、B 和 C 列实际上是因子,我只是避免将它们指定为创建小提琴图的因子。

library(dplyr)

df_edited <- df %>%
  gather(key, value, -id, factor_key = F)
Run Code Online (Sandbox Code Playgroud)

我用这些数据创建了以下图:

library(ggplot2)

factor_breaks <- c(0.015,0.03,0.06,0.12,0.25,0.5,1,2)
factor_levels <- c("0.015","0.03","0.06","0.12","0.25","0.5","1","2")

ggplot(df_edited, aes(key, value))+
  geom_violin()+
  scale_y_continuous(labels = factor_levels, breaks = factor_breaks)
Run Code Online (Sandbox Code Playgroud)

这将创建以下情节: 在此输入图像描述

是否可以使 y 轴标签安全地均匀放置(如下图所示),并且仍然确保小提琴图正确?

ggplot(df_edited, …
Run Code Online (Sandbox Code Playgroud)

r ggplot2

5
推荐指数
1
解决办法
1234
查看次数

自动计算数据框的摘要统计信息并创建新表

我有以下数据帧:

col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
          "chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
          "low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)

test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
Run Code Online (Sandbox Code Playgroud)

我想最终得到像这个表(值是随机的):

Species  Pop.density  %Resistance  CI_low  CI_high   Total samples
avi      low          2.0          1.2     2.2       30
avi      med          0            0       0.5       20
avi      high         3.5          2.9     4.2       10
chi      low          0.5          0.3     0.7       20
chi      med          2.0          1.9     2.1       150
chi      high         6.5          6.2     6.6       175
Run Code Online (Sandbox Code Playgroud)

%电阻列基于上面的col3,其中1 =耐受,0 =不耐受.我尝试过以下方法:

library(dplyr)
test_data<-test_data %>%
  count(col1,col2,col3) %>%
  group_by(col1, col2) %>%
  mutate(perc_res = prop.table(n)*100)
Run Code Online (Sandbox Code Playgroud)

我尝试了这个,它几乎可以解决这个问题,因为我得到了col1和2中每个值的总数为1和0的百分比,但是总样本是错误的,因为我计算所有三列,当时正确的计数仅适用于col1和2.

对于置信区间,我将使用以下内容: …

r dplyr

4
推荐指数
1
解决办法
97
查看次数

选择数据框中向量除第一个元素之外的所有元素

我有一些数据看起来像这样:

X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F
Run Code Online (Sandbox Code Playgroud)

我想生成一个包含每个向量的第一个元素(“A”)的列,以及另一个包含所有其余值(“B”、“C”等)的列:

X1              Col1    Col2
A,B,C,D,E       A       B,C,D,E
A,B             A       B
A,B,C,D         A       B,C,D
A,B,C,D,E,F     A       B,C,D,E,F
Run Code Online (Sandbox Code Playgroud)

我已经尝试过以下方法:

library(dplyr)

testdata <- data.frame(X1 = c("A,B,C,D,E",
                              "A,B",
                              "A,B,C,D",
                              "A,B,C,D,E,F")) %>%
  mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
         Col2 = sapply(strsplit(X1, ","), "[", -1))
Run Code Online (Sandbox Code Playgroud)

然而,我似乎无法摆脱 Col2 中值周围讨厌的向量括号。有办法做到这一点吗?

split r subset dataframe

3
推荐指数
1
解决办法
786
查看次数

复杂的正则表达式与各种模式匹配

我有一个包含以下信息的列的数据框:

    c("GYRA.Flq_NC_002695.1.916822_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", 
"GYRB.CARD_pvgb_AP009048_3760295_3762710_ARO_3003303_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRB_RequiresSNPConfirmation", 
"MARR.CARD_pvgb_U00096_1619119_1619554_ARO_3003378_Escherichia_Multi_drug_resistance_MDR_regulator_MARR_RequiresSNPConfirmation", 
"PARC.Flq_M58408_gene_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", 
"SOXS.CARD_pvgb_U00096_4277468_4277933_ARO_3003381_Escherichia_Multi_drug_resistance_MDR_regulator_SOXS_RequiresSNPConfirmation", 
"TOLC.CARD_phgb_FJ768952_0_1488_ARO_3000237_tolC_Multi_drug_resistance_Multi_drug_efflux_pumps_TOLC", 
"parE.CARD_pvgb_NC_007779_3172159_3174052_ARO_3003316_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation", 
"GYRA.Flq_CP001918.1_gene3562_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", 
"PARC.Flq_NC_003197.1.1254697_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", 
"GYRA.Flq_NC_003197.1.1253794_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", 
"parE.CARD_pvgb_NC_003197_3343961_3345854_ARO_3003317_Salmonella_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation", 
"ACRR.CARD_pvgb_NC_014121_1270697_1271351_ARO_3003374_Enterobacter_Multi_drug_resistance_MDR_regulator_ACRR_RequiresSNPConfirmation"
)
Run Code Online (Sandbox Code Playgroud)

我想要做的是获取上面每个条目中的特定ID号,标记如下,并为数据框中的每一行创建一个具有此数字的新列.

"GYRA.Flq_ NC_002695.1.916822 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "GYRB.CARD_pvgb_ AP009048_3760295_3762710 _ARO_3003303_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRB_RequiresSNPConfirmation", "MARR.CARD_pvgb_ U00096_1619119_1619554 _ARO_3003378_Escherichia_Multi_drug_resistance_MDR_regulator_MARR_RequiresSNPConfirmation", "PARC.Flq_ M58408 _gene_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", "SOXS.CARD_pvgb_ U00096_4277468_4277933 _ARO_3003381_Escherichia_Multi_drug_resistance_MDR_regulator_SOXS_RequiresSNPConfirmation", "TOLC.CARD_phgb_ FJ768952_0_1488 _ARO_3000237_tolC_Multi_drug_resistance_Multi_drug_efflux_pumps_TOLC", "parE.CARD_pvgb_ NC_007779_3172159_3174052 _ARO_3003316_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation","GYRA.Flq_ CP001918.1 _gene3562_Fluoroquinolones_Fluoroquinolone_resis tant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "PARC.Flq_ NC_003197.1.1254697 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", "GYRA.Flq_ NC_003197.1.1253794 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "parE.CARD_pvgb_ NC_003197_3343961_3345854 _ARO_3003317_Salmonella_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation", "ACRR.CARD_pvgb_ NC_014121_1270697_1271351 _ARO_3003374_Enterobacter_Multi_drug_resistance_MDR_regulator_ACRR_RequiresSNPConfirmation"

我尝试了以下命令:

library(dplyr)
df %>% mutate(ref_name2 = sub("[A-z]+.[A-z]+.[A-z]+.([A-z][A-z].[0-9]+.[0-9].[0-9]+)", "\\1", ref_name),
         ref_name2 = sub("\\_ARO.*", "", ref_name2),
         ref_name2 = sub("\\_Fluoro.*", "", ref_name2),
         ref_name2 = sub("\\_gene.*", "", ref_name2))
Run Code Online (Sandbox Code Playgroud)

但这只是部分匹配上面的字符串,也删除了我想要的一些字母.有没有比多个sub/gsub调用更简单的方法?

我最终想要的是:

c(NC_002695.1.916822, AP009048_3760295_3762710, U00096_1619119_1619554, …
Run Code Online (Sandbox Code Playgroud)

regex r

1
推荐指数
1
解决办法
34
查看次数

如何从数据框中简单地提取许多重复行

如何轻松生成此序列?

c(1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,
   2,3,2,4,2,5,2,6,2,7,2,8,2,9,2,10)
Run Code Online (Sandbox Code Playgroud)

有没有简单的方法来写这个?

r

0
推荐指数
2
解决办法
130
查看次数

标签 统计

r ×6

dataframe ×1

dplyr ×1

ggplot2 ×1

html-table ×1

regex ×1

split ×1

subset ×1