你好,我有 2 个数据框
DF1
query Qstart Qend Col3 Col4
ABEL1 1 50 A B
ABEL2 2 51 P O
ABEL3 3 52 S E
ABEL4 4 53 Q L
ABEL5 5 54 A J
Run Code Online (Sandbox Code Playgroud)
和DF2
seqnames start end
ABEL2 2 51
ABEL3 3 52
ABEL5 5 54
Run Code Online (Sandbox Code Playgroud)
并且我只想在 seqnames、start 和 end 对应于 DF2 时保留在 DF 行中
这里只保留了 Abel2,3 和 5,我得到:
NEW_DF
query Qstart Qend Col3 Col4
ABEL2 2 51 P O
ABEL3 3 52 S E
ABEL5 5 54 A …Run Code Online (Sandbox Code Playgroud) 您好,我有一个文件,例如
文件.txt
>LO_D
AHAHAHAHHAHAH
>LEIO_DS
DHHDHDHDHDH
>LODJ_jdjd
DJDJHDHDHD
>LO_D
AAAAAAA
>LO_D
HHAHAHAHAHAH
Run Code Online (Sandbox Code Playgroud)
我想在每个>LO_D元素之后添加一个数字
然后我应该得到:
>LO_D_1
AHAHAHAHHAHAH
>LEIO_DS
DHHDHDHDHDH
>LODJ_jdjd
DJDJHDHDHD
>LO_D_2
AAAAAAA
>LO_D_3
HHAHAHAHAHAH
Run Code Online (Sandbox Code Playgroud) 我有一个目录,其中有几个文件,例如:
OK_file1.txt
OK_file2.txt
OK_file3.txt
OK_file4.txt
Run Code Online (Sandbox Code Playgroud)
在这个文件中,我有一些内容,例如:
OK_file1.txt
error: 89 DUE TO TIME LIMIT ***
OK_file2.txt
Job_done
OK_file3.txt
Job_done
OK_file4.txt
error: 34 DUE TO TIME LIMIT ***
Run Code Online (Sandbox Code Playgroud)
我想解析这些文件中的每一个,只error在一个名为:的新文件中列出带有字符的文件:Jobs_error.txt
例如,这个文件应该是
OK_file1.txt
OK_file4.txt
Run Code Online (Sandbox Code Playgroud)
有没有人有想法?
你好,我有一个巨大的文件,这是一个头部:
>Sequence1:p
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence2:ok
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence3/lo
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence:LJ
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence3/lo
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
ATTGGAGAGA
>Sequence:YU
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
ATTAGAG
>Sequence:LJ
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
ATTGGAGAGA
....
Run Code Online (Sandbox Code Playgroud)
如您所见,该文件由几个序列组成:
它们总是以它的名字 : 开头,>name
后跟字母。
在这里,我想通过名称删除重复的序列
在示例中:
>Sequence:LJ并且>Sequence3/lo出现两次。
然后我只想继续并获得一个没有重复序列的新文件:
>Sequence1:p
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence2:ok
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence3/lo
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
>Sequence:LJ
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
ATTGGAGAGA
>Sequence:YU
AAAAAAACCCCCTTTGGGGAGAGAGAGGAACACAGATAATGATAAGTAGATATGATTATAGTAG
CAGAYAGTATGAGTAGTAAGTGAATTAGTAGTAGTAGATGATGA
ATTAGAG
Run Code Online (Sandbox Code Playgroud)
有人对 bash 代码或其他东西有什么想法吗?