我正在尝试grep
在当前目录中的40k文件,我收到此错误.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
Run Code Online (Sandbox Code Playgroud)
一般如何grep
成千上万的文件?
谢谢Upendra
我试图使用列表从Pandas数据帧中提取行,但无法完成.这是一个例子
# df
alleles chrom pos strand assembly# center protLSID assayLSID
rs#
TP3 A/C 0 3 + NaN NaN NaN NaN
TP7 A/T 0 7 + NaN NaN NaN NaN
TP12 T/A 0 12 + NaN NaN NaN NaN
TP15 C/A 0 15 + NaN NaN NaN NaN
TP18 C/T 0 18 + NaN NaN NaN NaN
test = ['TP3','TP12','TP18']
df.select(test)
Run Code Online (Sandbox Code Playgroud)
这就是我试图用列表中的元素做的事情,我收到了这个错误TypeError: 'Index' object is not callable
.我究竟做错了什么?
我有一个包含三列的数据框,我正在尝试使用 Seaborn 库绘制线图,但它向我抛出一个错误,说'DataFrame' object has no attribute 'get'
. 这是我的测试数据框
Age variable value
31 Overall 69.76751118
31 Potential 69.76751118
31 Growth 0
34 Overall 68.91176471
34 Potential 68.91176471
34 Growth 0
28 Overall 69.05803996
28 Potential 69.05803996
28 Growth 0.24643197
Run Code Online (Sandbox Code Playgroud)
这就是我在读取 csv 文件后尝试使用 seaborn 线图所做的
test = spark.read.csv("test.csv", inferSchema=True, header=True)
sns.lineplot(x = "Age", y = "value", hue = "variable", data = test)
Run Code Online (Sandbox Code Playgroud)
我得到的错误是这个
AttributeError: 'DataFrame' object has no attribute 'get'
Run Code Online (Sandbox Code Playgroud)
但是,当我将数据框转换为 Pandas 数据框并使用完全相同的 seaborn 代码时
test_df = test.toPandas()
sns.lineplot(x …
Run Code Online (Sandbox Code Playgroud) 我在使用grep命令匹配特定列时遇到问题.我有一个这样的测试文件(test.txt)..
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 841 A 6 ,$,.,,. BJJJJE
Bra001325 866 C 2 ,. HJ
Run Code Online (Sandbox Code Playgroud)
我想提取所有那些866
在第二列中有数字的行.当我使用grep
命令时,我获得包含该数字的所有行
grep "866" test.txt
Bra001325 835 T 13 c$c$c$c$c$cccccCcc !!!!!68886676
Bra001325 836 C 8 ,,,,,.,, 68886676
Bra001325 866 C 2 ,. HJ
Run Code Online (Sandbox Code Playgroud)
如何使用grep命令匹配特定列?
我有一个file_in.txt
包含以下名称的文件 ( ):
aphid_splitseq.1.fasta.annot.xml
aphid_splitseq.2.fasta.annot.xml
aphid_splitseq.3.fasta.annot.xml
aphid_splitseq.4.fasta.annot.xml
aphid_splitseq.5.fasta.annot.xml
Run Code Online (Sandbox Code Playgroud)
我还有另一个文件 ( file_out.txt
) 具有以下名称:
aphid_splitseq_1
aphid_splitseq_2
aphid_splitseq_3
aphid_splitseq_4
aphid_splitseq_5
Run Code Online (Sandbox Code Playgroud)
现在我想要这样的陈述
java -cp *:ext/*: es.blast2go.prog.B2GAnnotPipe -in aphid_splitseq.1.fasta.annot.xml -out results/aphid_splitseq_1 -prop b2gPipe.properties -v -annot -dat
Run Code Online (Sandbox Code Playgroud)
基本上,我要循环通过每个的file_in.txt
和file_out.txt
并取代的值-in
和-out
用i
和j
分别。
我在 Bash 中尝试过,但它似乎不起作用:
aphid_splitseq.1.fasta.annot.xml
aphid_splitseq.2.fasta.annot.xml
aphid_splitseq.3.fasta.annot.xml
aphid_splitseq.4.fasta.annot.xml
aphid_splitseq.5.fasta.annot.xml
Run Code Online (Sandbox Code Playgroud) 我正在尝试将当前工作目录挂载到 Docker 容器上,但无法正常工作。这是我的 Dockerfile
FROM ubuntu:14.04.3
MAINTAINER Upendra Devisetty
RUN apt-get update && apt-get install -y g++ \
make \
git \
zlib1g-dev \
python \
wget \
curl \
python-matplotlib
ENV BINPATH /usr/bin
ENV HISAT2GIT https://upendra_35@bitbucket.org/upendra_35/evolinc.git
RUN git clone "$HISAT2GIT"
RUN chmod +x evolinc/evolinc-part-I.sh && cp evolinc/evolinc-part-I.sh $BINPATH
RUN wget -O- http://cole-trapnell-lab.github.io/cufflinks/assets/downloads/cufflinks-2.2.1.Linux_x86_64.tar.gz | tar xzvf -
RUN wget -O- https://github.com/TransDecoder/TransDecoder/archive/2.0.1.tar.gz | tar xzvf -
RUN wget -O- http://seq.cs.iastate.edu/CAP3/cap3.linux.x86_64.tar | tar vfx -
RUN curl ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.31+-x64-linux.tar.gz > ncbi-blast-2.2.31+-x64-linux.tar.gz
RUN tar …
Run Code Online (Sandbox Code Playgroud) 我在我的CentOs VM上安装了Openstack,当我尝试查看已启动实例的列表时,我收到此错误
$ openstack server list
Ignoring domain related config user_domain_name because identity API version is 2.0
Ignoring domain related config user_domain_name because identity API version is 2.0
Ignoring domain related config user_domain_name because identity API version is 2.0
Ignoring domain related config user_domain_name because identity API version is 2.0
Expecting to find domain in user - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in …
Run Code Online (Sandbox Code Playgroud) 我想计算数据帧的一列中的正值和负值的数量.我怎么办R?
例如,这里是数据框
logFC logCPM LR PValue FDR
Bra15066 -5.630822 5.184586 73.79927 8.647868e-18 4.060866e-13
Bra18809 -13.227825 7.158572 72.13478 2.009902e-17 4.719048e-13
Bra45310 5.848073 5.244367 65.61483 5.482472e-16 8.581530e-12
Bra44666 -4.270590 4.852193 63.75671 1.407731e-15 1.652605e-11
Bra34292 -12.917379 4.198073 61.84715 3.711794e-15 3.485968e-11
Bra38258 -5.239433 4.816886 57.98476 2.641567e-14 2.067378e-10
Run Code Online (Sandbox Code Playgroud)
现在我想计算logFC列中正值的数量与负值的比较.
基本上我想看到负数的5个计数和上面的df的正数的1个.我怎么办R?
我试图用任何内容(基本上删除)替换序列末尾的三个字母代码,sed
但对于多个正则表达式模式效果不佳。这是序列的示例
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTGA
Run Code Online (Sandbox Code Playgroud)
当我尝试regex
单独使用时sed
它有效
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA" | sed 's/TAA$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
Run Code Online (Sandbox Code Playgroud)
但是,当我尝试包含多个正则表达式时,它不起作用
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" |
sed 's/(TAG$|TAA$|TGA$)//'
Run Code Online (Sandbox Code Playgroud)
有人可以指出我哪里做错了吗?
我有一个如下的df,我想使用geom_line生成时间序列图。这是我的数据摘要:
summary(data.t.m)
sample side time day variable value
HA2015_E10AF.bam: 1 E:69 1 :12 F:72 nc.counts:138 Min. : 4.346
HA2015_E10BF.bam: 1 W:69 2 :12 S:66 1st Qu.: 6.949
HA2015_E10CF.bam: 1 3 :12 Median : 8.529
HA2015_E11AF.bam: 1 4 :12 Mean : 9.085
HA2015_E11AS.bam: 1 5 :12 3rd Qu.:10.501
HA2015_E11BF.bam: 1 6 :12 Max. :23.047
(Other) :132 (Other):66
Run Code Online (Sandbox Code Playgroud)
这是生成折线图的代码:
plt <- ggplot(data.t.m, aes(time, value, group = side, colour = side))
plt <- plt + stat_summary(fun.y = "mean", geom="line", size = 2, position=position_dodge(0.95)) …
Run Code Online (Sandbox Code Playgroud)