这是我的数据:
> head(data)
id C1 C2 C3 B1 B2 B3 Name
12 3 12 8 1 3 12 Agar
14 4 11 9 5 12 14 LB
18 7 17 6 7 14 16 YEF
20 9 15 4 3 11 17 KAN
Run Code Online (Sandbox Code Playgroud)
所以我使用reshape2包中的融合函数来重新组织我的数据.现在它看起来像这样:
dt <- melt(data, measure.vars=2:7)
> head(dt)
n v variable value rt
1 id Name p C1 1
2 12 Agar p 3 2
3 14 LB p 4 3
4 18 YEF p 7 6
5 …Run Code Online (Sandbox Code Playgroud) 对于一些我想介绍给我的数据的功能,我需要在数据框中输入一个数值。现在它们是因子格式。
有没有简单的方法可以将整个数据帧“转换”为数字?
'dput'的一部分:
"0.966968221", "0.971526427", "0.975908363", "0.976354638",
"0.983503732", "0.984850291", "0.985224666", "0.987182132",
"0.987468192", "0.988309086", "0.994685984", "0.996238630",
"0.997917853", "0.998762891", "0.999968143", "1.000000000"
), class = "factor")), .Names = c("10", "33.95", "58.66",
"84.42", "110.21", "134.16", "164.69", "199.1", "234.35", "257.19",
"361.84", "432.74", "506.34", "581.46", "651.71", "732.59", "817.56",
"896.24", "971.77", "1038.91"), row.names = c("at1g01050.1",
"at1g01080.1", "at1g01090.1", "at1g01320.2", "at1g01470.1", "at1g01800.1"
), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
data.frame中值的类别:
> class(tbl_alles[103,5])
[1] "factor"
> class(tbl_alles[553,12])
[1] "factor"
Run Code Online (Sandbox Code Playgroud)
到目前为止,我已经尝试过:
第一次尝试:
tbl_alles <- sapply(tbl_alles, as.numeric) ## Changing the values in the data frame …Run Code Online (Sandbox Code Playgroud) 我非常绝望,甚至我准备失去更多的代表点,但我不得不问。(是的,我阅读了一些关于它的主题)。
我创建了一个只有 2 列的数据框,我想放入矩阵(我不知道如何从整个数据中只选择 2 列):
tbl_corel <- tbl_end[,c("diff", "abund_mean")]
Run Code Online (Sandbox Code Playgroud)
在下一步中,我创建了空矩阵:
## Creating a empty matrix to check the correlation between diff and abund_mean
mat_corel <- matrix(0, ncol = 2)
colnames(mat_corel) <- c("diff", "abund_mean")
Run Code Online (Sandbox Code Playgroud)
我尝试使用该函数用数据填充矩阵:
mat_corel <- matrix(tbl_corel), nrow = 676,ncol = 2)
Run Code Online (Sandbox Code Playgroud)
当然,我必须手动检查我的数据框中有多少行......它不起作用。也试过这个功能:
mat_corel[ as.matrix(tbl_corel) ] <- 1
Run Code Online (Sandbox Code Playgroud)
它不起作用。我会很感激你的帮助。
diff abund_mean
1 0 3444804.80
2 0 847887.02
3 0 93654.19
4 0 721692.76
5 0 382711.04
6 1 428656.66
Run Code Online (Sandbox Code Playgroud) 当我能够使用不同的功能时,我真的无法理解.我总是有同样的问题......"它不适用于原子矢量,数据帧,矩阵......等等.
有人可以解释我如何减去两列矩阵或data.frame或任何东西......
这是我的数据:
id cond S1.pre S2.pre S1.post S2.post V1.pre V2.pre V1.post V2.post
1 aer 21 31 25 35 7 1 19 4
2 aer 15 26 21 29 13 11 16 14
3 aer 18 27 23 31 8 2 3 3
4 aer 17 31 18 39 13 11 15 14
5 aer 15 26 16 29 26 15 32 20
Run Code Online (Sandbox Code Playgroud)
我想减去列S1.post - S1.pre.
这就是我试过的:
> diff <- data[,"S1.post"] - data[,"S1.pre"]
Error in data[, "S1.post"] - data[, "S1.pre"] …Run Code Online (Sandbox Code Playgroud) 这就是我的数据的样子:
structure(list(`Name1` = c("Mark",
NA, NA, NA, NA, NA), Name2 = c(NA, "Stefan",
"Clara", NA, NA, NA), `Name3` = c(NA, NA,
NA, "Max", "Pete", "Gabe"), `Name4` = c("Titan",
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_
), `Name5` = c(NA_character_, NA_character_,
NA_character_, NA_character_, "Tom", NA_character_),
Name6 = c(NA_character_, "Narq", NA_character_,
NA_character_, "Seba", NA_character_), Name7 = c(NA_character_,
NA_character_, "Greg", NA_character_, NA_character_,
NA_character_), Name8 = c(NA_character_,
NA_character_, NA_character_, "Terry", NA_character_,
NA_character_), Name9 = c(NA_character_,
NA_character_, NA_character_, NA_character_, "Coaty",
NA_character_), Name10 = c(NA_character_,
NA_character_, "Meg", NA_character_, …Run Code Online (Sandbox Code Playgroud) 我有如下数据:
structure(c(170007558.204312, 3151225505.1608, 3228057474.07417,
131519574.092116, 2149477968.81888, 1215136556.10718, 160433707.919651,
5956246992.50776, 2558167135.01689, 3245672969.97675, 169100005.594611,
354825870.40362, 1576805307.20395, 416870647.054276, 3399878725.25131,
370231854.581136, 1122345506.21081, 2305206508.74322, 2232159732.1229,
47308024.505238, 1241395335.9693, 2436980532.07484, 1128618969.34889,
3100422173.38636, 288672329.474137, 2987525983.71596, 3287998115.95645,
152127227.856302, 1994141536.64711, 1239229228.43808, 145289220.860244,
5376086563.26477, 2288378963.83637, 3084446977.22353, 63805766.33001,
336627137.967236, 1459357039.40439, 338887231.409886, 2712985868.45896,
351047105.326338, 1097447659.97404, 2042978821.82768, 2197665385.69067,
38049639.2725552, 1145898075.14945, 2394369287.02634, 941453724.349293,
2879533609.52787), .Dim = c(24L, 2L), .Dimnames = list(c("Mark",
"Chris", "Tom", "Tim", "Hank", "Taylor",
"Moniqe", "Rasp", "Greg", "Mephist", "Daniel",
"Moussa", "Ivan", "Treate", "Argen", "Tupol",
"Gotrek", "Marcel", "Gotae", "Ernsten", "Alfred",
"Katrin", "Paul", "Marten"), NULL)) …Run Code Online (Sandbox Code Playgroud) 我想找出这两个表之间重叠的对:
> dput(data1)
structure(list(Name.x = c("MDH1", "MDH1", "IDH2", "IDH2", "IDH2",
"IDH2", "IDH2", "IDH2", "IDH2", "SCOALB", "SCOALB", "CSY4", "CSY4",
"CSY4", "CSY4", "CSY4", "FUM1", "FUM1", "IDH6", "IDH6", "IDH6",
"ODC1-1", "ODC1-1", "ODC1-1", "ODC1-1", "ODC1-1", "ODC2-1", "ODC2-1",
"ODC2-1", "ACO2", "IDH1", "IDH1", "IDH1", "IDH1", "ODC2-2"),
Name.y = c("SCOALB", "SCOALA-1", "CSY4", "IDH6", "ODC1-1",
"ODC2-1", "IDH1", "ODC2-2", "ODC1-2", "SCOALA-1", "SCOALA-2",
"IDH6", "SDH2-1", "IDH1", "IDH5", "ICDH", "ODC1-1", "ODC1-2",
"ACO2", "IDH1", "IDH5", "ODC2-1", "IDH1", "IDH5", "ODC2-2",
"ODC1-2", "IDH1", "ODC2-2", "ODC1-2", "IDH1", "IDH5", "SCOALA-2",
"ODC2-2", "ODC1-2", "ODC1-2")), .Names = c("Name.x", …Run Code Online (Sandbox Code Playgroud) 如何有效地从此字符向量中删除重复项?
> dput(data[1:30])
c("AT2G27020 AT3G26340", "AT1G56450 AT3G26340", "AT1G13060 AT3G26340",
"AT3G22630 AT3G26340", "AT3G22110 AT3G26340", "AT2G05840 AT3G26340",
"AT1G47250 AT3G26340", "AT1G79210 AT3G26340", "AT2G27020 AT5G40580",
"AT3G27430 AT5G40580", "AT4G31300 AT5G40580", "AT3G14290 AT5G40580",
"AT3G22630 AT5G40580", "AT3G22110 AT5G40580", "AT5G35590 AT5G40580",
"AT2G05840 AT5G40580", "AT3G60820 AT5G40580", "AT1G79210 AT5G40580",
"AT2G27020 AT3G27430", "AT2G27020 AT4G31300", "AT1G53850 AT2G27020",
"AT2G27020 AT5G66140", "AT2G27020 AT3G51260", "AT1G21720 AT2G27020",
"AT1G56450 AT2G27020", "AT1G13060 AT2G27020", "AT2G27020 AT3G22630",
"AT2G27020 AT4G14800", "AT2G27020 AT3G22110", "AT2G27020 AT5G35590"
)
Run Code Online (Sandbox Code Playgroud)
我曾尝试使用简单的功能为:unique和duplicated,但遗憾的是它没有工作。
那是我的坏事。重复是指相同的AGI,因此将其中一些存储在“”中无关紧要。我想将每个“ ATXG ...”插入向量中一次。一开始我不知道向量包含成对的...对不起。
我想根据有效值的数量对数据框进行子集化。我想只保留至少有 3 个有效值的行。我试图找到这样的主题(我很确定有一个),但我没有找到。今天的大脑功能非常低下。
> dput(aa)
structure(list(Names = c("A11DS1", "DDSAI2", "ADDA1",
"TT0FGR8", "TRASD1", "DDAWT0", "YYRRP1", "GFSAX5", "123US2", "FXCH1",
"A3KN83", "A4D1P6", "A5YKK6", "ASDASC98", "ASDASDG6", "A6NFQ2", "GFDAHQ2",
"A6NHR9", "A6NIH7", "P62308"), `116_1` = c(0, 849990000,
1281200000, 1.198e+09, 68748000, 0, 0, 0, 61641000, 43582000,
19723000, 40042000, 428120000, 152520000, 168380000, 0, 228920000,
792460000, 0, 453570000), `116_2` = c(0, 926040000, 1500800000,
1242700000, 48212000, 0, 47242000, 46062000, 30757000, 53163000,
0, 52400000, 463870000, 99146000, 150810000, 31183000, 0, 1.079e+09,
43208000, 421410000), `116_3` = c(742270000, 734460000, 1377700000,
1390500000, 52647000, 59797000, 0, …Run Code Online (Sandbox Code Playgroud) 这是一个数据示例:
exp_data <- structure(list(Seq = c("AAAARVDS", "AAAARVDSSSAL",
"AAAARVDSRASDQ"), Change = structure(c(19L, 20L, 13L), .Label = c("",
"C[+58]", "C[+58], F[+1152]", "C[+58], F[+1152], L[+12], M[+12]",
"C[+58], L[+2909]", "L[+12]", "L[+370]", "L[+504]", "M[+12]",
"M[+1283]", "M[+1457]", "M[+1491]", "M[+16]", "M[+16], Y[+1013]",
"M[+16], Y[+1152]", "M[+16], Y[+762]", "M[+371]", "M[+386], Y[+12]",
"M[+486], W[+12]", "Y[+12]", "Y[+1240]", "Y[+1502]", "Y[+1988]",
"Y[+2918]"), class = "factor"), `Mass` = c(1869.943,
1048.459, 707.346), Size = structure(c(2L, 2L, 2L), .Label = c("Matt",
"Greg",
"Kieran"
), class = "factor"), `Number` = c(2L, 2L, 2L)), row.names = c(244L,
392L, …Run Code Online (Sandbox Code Playgroud) 所以我想计算每列的总和,结果会发生变化.我不知道如何处理它.
我的数据:
> head(df)
K 2 5 4 2
L 2 1 4 1
M 1 3 4 3
N 3 2 1 1
Sum 7 8 11 13
Run Code Online (Sandbox Code Playgroud)
所以你看到结果不合适.第一列的总和在第二列中,第一列中的总和是最后一列的总和.怎么处理?
我使用该代码计算总和:
df <- suppressWarnings(rbind(data, Sum=colSums(data[, -1])))
Run Code Online (Sandbox Code Playgroud)
我的数据如何:
> dput(head(data,4))
structure(list(Name = structure(c(95L, 331L, 161L, 156L
), .Label = c(" 1-deoxy-D-xylulose 5-phosphate reductoisomerase ",
" 2-cysteine peroxiredoxin B ", " 2-oxoacid dehydrogenases acyltransferase family protein ",
" 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein ",
" 26S proteasome, regulatory subunit Rpn7;Proteasome …Run Code Online (Sandbox Code Playgroud)