自昨晚以来我一直忙于这个问题,我无法弄清楚如何去做.
我想要做的是将df1字符串与df2字符串匹配并获得类似的字符串
我做的就像这样
# a function to arrange the data to have IDs for each string
normalize <- function(x, delim) {
x <- gsub(")", "", x, fixed=TRUE)
x <- gsub("(", "", x, fixed=TRUE)
idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", as.character(x)))+1)
names <- unlist(strsplit(as.character(x), delim))
return(setNames(idx, names))
}
# a function to arrange the second df
lookup <- normalize(df2[,1], ",")
# a function to match them and give the IDs
process <- function(s) {
lookup_try <- lookup[names(s)]
found <- which(!is.na(lookup_try))
pos <- …Run Code Online (Sandbox Code Playgroud) 实际上我试图绘制一个图,但它放置并显示彼此的所有列(线),因此它不具有代表性.我尝试制作模拟数据并向您展示我如何绘制它,并向您展示我想要的内容
我不知道如何制作如下所示的示例数据,但这里是我做的
set.seed(1)
M <- matrix(rnorm(20),20,5)
x <- as.matrix(sort(runif(20, 5.0, 7.5)))
df <- as.data.frame(cbind(x,M))
Run Code Online (Sandbox Code Playgroud)
在创建数据框之后,我将通过熔化并使用ggplot来绘制所有列与第一个列的关系
require(ggplot2)
require(reshape)
dff <- melt(df , id.vars = 'V1')
b <- ggplot(dff, aes(V1,value)) + geom_line(aes(colour = variable))
Run Code Online (Sandbox Code Playgroud)
我想在每一行之间有特定的距离(在这种情况下我们有6),如下所示.在一个维度,它是V1,在另一个维度,它是列的数量.我不关心功能,我只想要照片
我正在尝试从NCBI网站获取FASTA文件,我使用以下功能
getncbiseq <- function(accession){
dbs <- c()
for (i in 1:numdbs){
db <- dbs[i]
choosebank(db)
resquery <- try(query(".tmpquery", paste("AC=", accession)),silent = TRUE)
if (!(inherits(resquery, "try-error"))){
queryname <- "query2"
thequery <- paste("AC=",accession,sep="")
query(`queryname`,`thequery`)
# see if a sequence was retrieved:
seq <- getSequence(query2$req[[1]])
closebank()
return(seq)
}
closebank()
}
print(paste("ERROR: accession",accession,"was not found"))
}
Run Code Online (Sandbox Code Playgroud)
当我尝试检索序列时
mydata <- getncbiseq("NC_001477")
Run Code Online (Sandbox Code Playgroud)
getSequence(query2 $ req [[1]])中的错误:找不到对象'query2'
还有缩短这些循环功能的更好方法吗?
如果我用
query('queryname','the query')
#or
query("queryname","thequery")
Run Code Online (Sandbox Code Playgroud)
我收到另一个错误
query(“ queryname”,“ thequery”)中的错误:无效的请求:“(^)处的未知列表:\”(^)thequery \“”
我试图在尽可能多的列和组合中找到相同的字符串。例如,我有这样的数据
df<-structure(list(first = c("SNTM1", "STTTT2", "STOLA", "STOMQ",
"STR2", "SUPTY1", "TBNHSG", "TEYAH", "TMEIL1", "TMEIL2", "TMEIL3",
"TNIL", "TREUK", "TTRK", "TRRFK", "UBA52", "YIPF1", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA), second = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, "SNTLK", "STTTFSG", "STOIU", "STOMQ", "STR25",
"SUPYHGS", "TBHYDG", "TEHDYG", "TMEIL1", "YIPF1", NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), second2 …Run Code Online (Sandbox Code Playgroud) 如果你能告诉我如何将代码从Java翻译成python,我会很高兴.
一个人应该手动吗?有没有自动转换它的工具?
这是我的数据
df<- structure(list(name = structure(c(2L, 12L, 1L, 16L, 14L, 10L,
9L, 5L, 15L, 4L, 8L, 13L, 7L, 6L, 3L, 11L), .Label = c("All",
"Bab", "boro", "bra", "charli", "delta", "few", "hora", "Howe",
"ist", "kind", "Kiss", "myr", "No", "TT", "where"), class = "factor"),
value = c(1.251, -1.018, -1.074, -1.137, 1.018, 1.293, 1.022,
-1.008, 1.022, 1.252, -1.005, 1.694, -1.068, 1.396, 1.646,
1.016)), .Names = c("name", "value"), class = "data.frame", row.names = c(NA,
-16L))
Run Code Online (Sandbox Code Playgroud)
这是我所做的
d <- dist(as.matrix(df$value),method = "euclidean")
#compute cluster membership
hcn …Run Code Online (Sandbox Code Playgroud) 我正在尝试减小 heatmap.2 颜色键的大小
现在是这样的
key=T, # add the key color
key.title =NA,
cexRow = 0.75,
cexCol=0.75,
Run Code Online (Sandbox Code Playgroud)
我想降低它的高度并变成这样
这就是我绘制热图的方式
heatmap.2(mat_data,
key=T, # add the key color
key.xlab="label",
key.title =NA,
cexRow = 0.75,
cexCol=0.75,
#lhei = c(5,5),
#cellnote = mat_data, # will display the values
main = "title to be shonwn ", # heat map title
#notecol=NA, # change font color of cell labels to black
density.info="none", # turns off density plot inside color legend
trace="none", # turns off trace lines inside …Run Code Online (Sandbox Code Playgroud) 我有这样的数据
data<- structure(list(sample = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A",
"B"), class = "factor"), y = c(0.99999652, 0.99626012, 0.94070452,
0.37332406, 0.57810894, 0.37673758, 0.22784684, 0.35358141, 0.21253558,
0.17715703, 0.99999652, 0.86403956, 0.64054516, 0.18448824, 0.40362691,
0.10791682, 0.06985696, 0.07384465, 0.0433271, 0.02875159), time = c(100L,
150L, 170L, 180L, 190L, 220L, 260L, 270L, 300L, 375L, 100L, 150L,
170L, 180L, 190L, 220L, 260L, 270L, 300L, 375L), x = c(0.9999965,
0.9981008, 0.9940164, …Run Code Online (Sandbox Code Playgroud) 我问了一个不太清楚的问题.所以我试着以一种可以理解的方式解释它.这是我的数据
我的数据看起来像这样
看起来像这样
# V1 V2 V3
#1 Q9UNZ5 Q9Y2W1
#2 Q9ULV4;Q6QEF8
#3 Q9UNZ5
#4 Q9H6F5
#5 Q9H2K0 Q9ULV4;Q6QEF8
#6 Q9GZZ1 Q9UKD2
#7 Q9H6F5 Q9GZZ1 Q9GZZ1
#8 Q9GZZ1 Q9NYF8
#9 Q9BWS9
Run Code Online (Sandbox Code Playgroud)
我想删除所有这些中的重复字符串,例如,V1我们第一次拥有所有字符串,所以我们不删除任何东西只是安排他们有
Q9ULV4
Q6QEF8
Q9H6F5
Q9GZZ1
Q9BWS9
Run Code Online (Sandbox Code Playgroud)
然后我们用第一列检查第二列字符串,然后删除那些重复的列并再次排列它们.对于第三列,我们检查第一列和第二列的字符串,如果相似,则我们删除然后排列它们.所以输出应该如下所示.
Q9ULV4 Q9UNZ5 Q9Y2W1
Q6QEF8 Q9H2K0 Q9UKD2
Q9H6F5 Q9NYF8
Q9GZZ1
Q9BWS9
Run Code Online (Sandbox Code Playgroud)
它与我提出的任何问题都不相似; 所以,如果仍然不清楚,请评论,我试着解释一下
这是我的数据
df<- structure(list(name = structure(c(2L, 12L, 1L, 16L, 14L, 10L,
9L, 5L, 15L, 4L, 8L, 13L, 7L, 6L, 3L, 11L), .Label = c("All",
"Bab", "boro", "bra", "charli", "delta", "few", "hora", "Howe",
"ist", "kind", "Kiss", "myr", "No", "TT", "where"), class = "factor"),
value = c(1.251, -1.018, -1.074, -1.137, 1.018, 1.293, 1.022,
-1.008, 1.022, 1.252, -1.005, 1.694, -1.068, 1.396, 1.646,
1.016)), .Names = c("name", "value"), class = "data.frame", row.names = c(NA,
-16L))
Run Code Online (Sandbox Code Playgroud)
我检查了之前所有的答案,但我被卡住了,我真的不知道是否可以做到这一点,这可能非常简单,所以如果这不是一个正确的问题,我已经道歉了。如果你给我一个提示,我会自己做