我需要帮助合并数据(mydf)中具有相同名称(即起始列)的行,并连接“ALT”列中的内容,从而根据起始列中的相似值删除所有重复行。我想合并行并连接“ALT”列中用逗号分隔的内容,并得到如下所示的结果。感谢您的帮助。
> mydf
chr start end REF ALT TYPE refGene
chr10 chr10:176131 176131 C A snp nonsynonymous SNV
chr10 chr10:159149 159149 C G snp:17659149 nonsynonymous SNV
chr10 chr10:159149 159149 C T snp:17659149 nonsynonymous SNV
chr10 chr10:241469 241469 T C snp splicing
> result
chr start end REF ALT TYPE refGene
chr10 chr10:176131 176131 C A snp nonsynonymous SNV
chr10 chr10:159149 159149 C G,T snp:17659149 nonsynonymous SNV
chr10 chr10:241469 241469 T C snp splicing
Run Code Online (Sandbox Code Playgroud)
DPUT 在这里:
structure(list(chr = c("chr3", "chr3", "chr3", …Run Code Online (Sandbox Code Playgroud) 我有一个数据框,mydf其中n列具有相同的列名称name.我想将它们更改为name1 name2 and name3 ..name-nth列.我如何在R中做到这一点?
我有这个VCF格式的文件,我想在R中读取此文件。但是,此文件包含一些我想跳过的多余行。我想在行以匹配行开始的结果中得到类似的结果#CHROM。
这是我尝试过的:
chromo1<-try(scan(myfile.vcf,what=character(),n=5000,sep="\n",skip=0,fill=TRUE,na.strings="",quote="\"")) ## find the start of the vcf file
skip.lines<-grep("^#CHROM",chromo1)
column.labels<-read.delim(myfile.vcf,header=F,nrows=1,skip=(skip.lines-1),sep="\t",fill=TRUE,stringsAsFactors=FALSE,na.strings="",quote="\"")
num.vars<-dim(column.labels)[2]
Run Code Online (Sandbox Code Playgroud)
myfile.vcf
#not wanted line
#unnecessary line
#junk line
#CHROM POS ID REF ALT
11 33443 3 A T
12 33445 5 A G
Run Code Online (Sandbox Code Playgroud)
结果
#CHROM POS ID REF ALT
11 33443 3 A T
12 33445 5 A G
Run Code Online (Sandbox Code Playgroud) 我有一个名为的数据帧dd2.我需要粘贴的价值观Left.Gene.Symbols和Right.Gene.Symbols我可以简单地使用下面的代码做的,但我不想沿着是否有遗漏值粘贴来港定居.我希望它看起来像在combination列中所示result.
mycode的
#to remove NAs
dd2[dd2 == 'NA'] <- NA
#pasting values together
result <- cbind(dd2,combination = paste(dd2[,"Left.Gene.Symbols"],dd2[,"Right.Gene.Symbols"],sep="*"))
Run Code Online (Sandbox Code Playgroud)
数据
dd2<- structure(c("AMLM12001KP", "AMLM12001KP", "AMLM12001KP", "AMLM12001KP",
"AMLM12001KP", "AK2", "HFM1", "HFM1", "HFM1", "HFM1", NA, "PPT",
NA, "GGT", NA), .Dim = c(5L, 3L), .Dimnames = list(NULL, c("customer_sample_id",
"Left.Gene.Symbols", "Right.Gene.Symbols")))
Run Code Online (Sandbox Code Playgroud)
结果
customer_sample_id Left.Gene.Symbols Right.Gene.Symbols combination
[1,] "AMLM12001KP" "AK2" NA AK2*
[2,] "AMLM12001KP" "HFM1" "PPT" HFM1*PPT
[3,] "AMLM12001KP" "HFM1" NA HFM1*
[4,] "AMLM12001KP" "HFM1" "GGT" HFM1*GGT …Run Code Online (Sandbox Code Playgroud) 我有这个矢量(它的大小很大)myvec.我需要将它们分开匹配/并创建另一个结果向量resvector.我怎样才能在R中完成这项工作?
myvec<-c("IID:WE:G12D/V/A","GH:SQ:p.R172W/G", "HH:WG:p.S122F/H")
resvector
IID:WE:G12D, IID:WE:G12V,IID:WE:G12A,GH:SQ:p.R172W,GH:SQ:p.R172G,HH:WG:p.S122F,HH:WG:p.S122H
Run Code Online (Sandbox Code Playgroud) 我有这个载体myvec.我希望在第二个':'之后删除所有内容并获得结果.如何在第n个':'之后删除字符串?
myvec<- c("chr2:213403244:213403244:G:T:snp","chr7:55240586:55240586:T:G:snp" ,"chr7:55241607:55241607:C:G:snp")
result
chr2:213403244
chr7:55240586
chr7:55241607
Run Code Online (Sandbox Code Playgroud) 我有很多文件名,如下所示:
txt= "MA0051_IRF2.xml"
Run Code Online (Sandbox Code Playgroud)
我想提取IRF2"_"和"."之间的内容.我如何在R中执行此操作?
假设我需要myexecutable在R中运行系统executable()文件.我想打印一条消息"如果没有安装,请安装myexecutable来运行此proprogram".我怎么在R?
我有一个像这样运行的 bash 命令:
esearch -db protein -query "AVA17449.1" | elink -target nuccore | efetch -format ft
Run Code Online (Sandbox Code Playgroud)
但我想在 R 中这样做(这不起作用)
output <- system("esearch -db protein -query "AVA17449.1" | elink -target nuccore | efetch -format ft")
Run Code Online (Sandbox Code Playgroud)
在 R 中调用此命令的正确方法是什么?
PS esearch 可以使用以下命令安装
cd ~
/bin/bash
perl -MNet::FTP -e \
'$ftp = new Net::FTP("ftp.ncbi.nlm.nih.gov", Passive => 1);
$ftp->login; $ftp->binary;
$ftp->get("/entrez/entrezdirect/edirect.tar.gz");'
gunzip -c edirect.tar.gz | tar xf -
rm edirect.tar.gz
builtin exit
export PATH=$PATH:$HOME/edirect >& /dev/null || setenv PATH "${PATH}:$HOME/edirect"
./edirect/setup.sh
Run Code Online (Sandbox Code Playgroud) 我是python的新手并试图在下面编写类似的东西(代码A),所以它确实像代码B一样.我想利用数学运算符的用户输入作为do_what变量.我们如何在python中编写这段代码(A),这样它就像代码B一样工作?
代码A.
num1 = input("Enter a number: ")
num2 = input("Enter another number: ")
do_what = input("Enter a calculation symbol for calculation you want to perform: ")
result = float(num1) do_what float(num2)
print("result is: " + str(result))
Run Code Online (Sandbox Code Playgroud)
代码B.
num1 = input("Enter a number: ")
num2 = input("Enter another number: ")
result = int(num1) + int(num2)
print("result is: " + str(result))
Run Code Online (Sandbox Code Playgroud)