我有以下数据框:
col1 <- rep(c("A","B","C","D"),10)
col2 <- rep(c(1,0),10)
col3 <- rep(c(0,1),10)
col4 <- rep(c(1,0),10)
col5 <- rep(c(0,1),10)
test_df <- data.frame(col1, col2, col3, col4, col5, stringsAsFactors = F)
Run Code Online (Sandbox Code Playgroud)
我想根据 col1 中的值为多列中的特定行单元格着色,并在表中的两列之间添加一条垂直线(表示限制)(基于 col1 中的相同值)
例如,如果 col1 == "A",那么我想将 col2 和 col5 中的单元格着色为灰色,与 col1 == A 位于同一行。
在虚拟代码中:
if col1 == A: color columns(col2, col5), vert.line between col3 and col4
if col1 == B: color columns(col2, col3, col5), vert.line between col4 and col5
if col1 == C: color columns(col2, col4, col5), vert.line between col3 …Run Code Online (Sandbox Code Playgroud) 我有以下数据:
id <- rep(1:100)
A <- rep(c(0.12 ,0.25, 0.5, 1, 2), each = 20)
B <- rep(c(0.06, 0.03, 0.015, 0.12), each = 25)
C <- rep(c(0.015, 0.03, 0.06, 0.12, 0.25), each = 20)
df <- data.frame(id,A,B,C,stringsAsFactors = F)
Run Code Online (Sandbox Code Playgroud)
我将 A、B 和 C 分成两列。请注意,A、B 和 C 列实际上是因子,我只是避免将它们指定为创建小提琴图的因子。
library(dplyr)
df_edited <- df %>%
gather(key, value, -id, factor_key = F)
Run Code Online (Sandbox Code Playgroud)
我用这些数据创建了以下图:
library(ggplot2)
factor_breaks <- c(0.015,0.03,0.06,0.12,0.25,0.5,1,2)
factor_levels <- c("0.015","0.03","0.06","0.12","0.25","0.5","1","2")
ggplot(df_edited, aes(key, value))+
geom_violin()+
scale_y_continuous(labels = factor_levels, breaks = factor_breaks)
Run Code Online (Sandbox Code Playgroud)
是否可以使 y 轴标签安全地均匀放置(如下图所示),并且仍然确保小提琴图正确?
ggplot(df_edited, …Run Code Online (Sandbox Code Playgroud) 我有以下数据帧:
col1 <- c("avi","chi","chi","bov","fox","bov","fox","avi","bov",
"chi","avi","chi","chi","bov","bov","fox","avi","bov","chi")
col2 <- c("low","med","high","high","low","low","med","med","med","high",
"low","low","high","high","med","med","low","low","med")
col3 <- c(0,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0)
test_data <- cbind(col1, col2, col3)
test_data <- as.data.frame(test_data)
Run Code Online (Sandbox Code Playgroud)
我想最终得到像这个表(值是随机的):
Species Pop.density %Resistance CI_low CI_high Total samples
avi low 2.0 1.2 2.2 30
avi med 0 0 0.5 20
avi high 3.5 2.9 4.2 10
chi low 0.5 0.3 0.7 20
chi med 2.0 1.9 2.1 150
chi high 6.5 6.2 6.6 175
Run Code Online (Sandbox Code Playgroud)
%电阻列基于上面的col3,其中1 =耐受,0 =不耐受.我尝试过以下方法:
library(dplyr)
test_data<-test_data %>%
count(col1,col2,col3) %>%
group_by(col1, col2) %>%
mutate(perc_res = prop.table(n)*100)
Run Code Online (Sandbox Code Playgroud)
我尝试了这个,它几乎可以解决这个问题,因为我得到了col1和2中每个值的总数为1和0的百分比,但是总样本是错误的,因为我计算所有三列,当时正确的计数仅适用于col1和2.
对于置信区间,我将使用以下内容: …
我有一些数据看起来像这样:
X1
A,B,C,D,E
A,B
A,B,C,D
A,B,C,D,E,F
Run Code Online (Sandbox Code Playgroud)
我想生成一个包含每个向量的第一个元素(“A”)的列,以及另一个包含所有其余值(“B”、“C”等)的列:
X1 Col1 Col2
A,B,C,D,E A B,C,D,E
A,B A B
A,B,C,D A B,C,D
A,B,C,D,E,F A B,C,D,E,F
Run Code Online (Sandbox Code Playgroud)
我已经尝试过以下方法:
library(dplyr)
testdata <- data.frame(X1 = c("A,B,C,D,E",
"A,B",
"A,B,C,D",
"A,B,C,D,E,F")) %>%
mutate(Col1 = sapply(strsplit(X1, ","), "[", 1),
Col2 = sapply(strsplit(X1, ","), "[", -1))
Run Code Online (Sandbox Code Playgroud)
然而,我似乎无法摆脱 Col2 中值周围讨厌的向量括号。有办法做到这一点吗?
我有一个包含以下信息的列的数据框:
c("GYRA.Flq_NC_002695.1.916822_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation",
"GYRB.CARD_pvgb_AP009048_3760295_3762710_ARO_3003303_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRB_RequiresSNPConfirmation",
"MARR.CARD_pvgb_U00096_1619119_1619554_ARO_3003378_Escherichia_Multi_drug_resistance_MDR_regulator_MARR_RequiresSNPConfirmation",
"PARC.Flq_M58408_gene_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation",
"SOXS.CARD_pvgb_U00096_4277468_4277933_ARO_3003381_Escherichia_Multi_drug_resistance_MDR_regulator_SOXS_RequiresSNPConfirmation",
"TOLC.CARD_phgb_FJ768952_0_1488_ARO_3000237_tolC_Multi_drug_resistance_Multi_drug_efflux_pumps_TOLC",
"parE.CARD_pvgb_NC_007779_3172159_3174052_ARO_3003316_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation",
"GYRA.Flq_CP001918.1_gene3562_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation",
"PARC.Flq_NC_003197.1.1254697_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation",
"GYRA.Flq_NC_003197.1.1253794_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation",
"parE.CARD_pvgb_NC_003197_3343961_3345854_ARO_3003317_Salmonella_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation",
"ACRR.CARD_pvgb_NC_014121_1270697_1271351_ARO_3003374_Enterobacter_Multi_drug_resistance_MDR_regulator_ACRR_RequiresSNPConfirmation"
)
Run Code Online (Sandbox Code Playgroud)
我想要做的是获取上面每个条目中的特定ID号,标记如下,并为数据框中的每一行创建一个具有此数字的新列.
"GYRA.Flq_ NC_002695.1.916822 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "GYRB.CARD_pvgb_ AP009048_3760295_3762710 _ARO_3003303_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRB_RequiresSNPConfirmation", "MARR.CARD_pvgb_ U00096_1619119_1619554 _ARO_3003378_Escherichia_Multi_drug_resistance_MDR_regulator_MARR_RequiresSNPConfirmation", "PARC.Flq_ M58408 _gene_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", "SOXS.CARD_pvgb_ U00096_4277468_4277933 _ARO_3003381_Escherichia_Multi_drug_resistance_MDR_regulator_SOXS_RequiresSNPConfirmation", "TOLC.CARD_phgb_ FJ768952_0_1488 _ARO_3000237_tolC_Multi_drug_resistance_Multi_drug_efflux_pumps_TOLC", "parE.CARD_pvgb_ NC_007779_3172159_3174052 _ARO_3003316_Escherichia_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation","GYRA.Flq_ CP001918.1 _gene3562_Fluoroquinolones_Fluoroquinolone_resis tant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "PARC.Flq_ NC_003197.1.1254697 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_PARC_RequiresSNPConfirmation", "GYRA.Flq_ NC_003197.1.1253794 _Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_GYRA_RequiresSNPConfirmation", "parE.CARD_pvgb_ NC_003197_3343961_3345854 _ARO_3003317_Salmonella_Fluoroquinolones_Fluoroquinolone_resistant_DNA_topoisomerases_parE_RequiresSNPConfirmation", "ACRR.CARD_pvgb_ NC_014121_1270697_1271351 _ARO_3003374_Enterobacter_Multi_drug_resistance_MDR_regulator_ACRR_RequiresSNPConfirmation"
我尝试了以下命令:
library(dplyr)
df %>% mutate(ref_name2 = sub("[A-z]+.[A-z]+.[A-z]+.([A-z][A-z].[0-9]+.[0-9].[0-9]+)", "\\1", ref_name),
ref_name2 = sub("\\_ARO.*", "", ref_name2),
ref_name2 = sub("\\_Fluoro.*", "", ref_name2),
ref_name2 = sub("\\_gene.*", "", ref_name2))
Run Code Online (Sandbox Code Playgroud)
但这只是部分匹配上面的字符串,也删除了我想要的一些字母.有没有比多个sub/gsub调用更简单的方法?
我最终想要的是:
c(NC_002695.1.916822, AP009048_3760295_3762710, U00096_1619119_1619554, …Run Code Online (Sandbox Code Playgroud) 如何轻松生成此序列?
c(1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,
2,3,2,4,2,5,2,6,2,7,2,8,2,9,2,10)
Run Code Online (Sandbox Code Playgroud)
有没有简单的方法来写这个?