使用python替换特定行中的字符串

Question

使用python替换特定行中的字符串

我正在编写一个python脚本来替换具有特定扩展名(.seq)的目录中的每个文本文件中的字符串.替换的字符串应该只来自每个文件的第二行,并且输出是一个新的子目录(称之为干净),其文件名与原始文件相同,但带有*.clean后缀.输出文件包含与原始文本完全相同的文本,但替换了字符串.我需要替换所有这些字符串:'K','Y','W','M','R','S'和'N'.

这是我在谷歌搜索后想出来的.这是非常混乱的(编程的第二周),它停止将文件复制到干净的目录而不替换任何东西.我真的很感激任何帮助.

谢谢!

import os, shutil

os.mkdir('clean')

for file in os.listdir(os.getcwd()):
    if file.find('.seq') != -1:
        shutil.copy(file, 'clean')

os.chdir('clean')

for subdir, dirs, files in os.walk(os.getcwd()):
    for file in files:
        f = open(file, 'r')
        for line in f.read():
            if line.__contains__('>'): #indicator for the first line. the first line always starts with '>'. It's a FASTA file, if you've worked with dna/protein before.
                pass
            else:
                line.replace('M', 'N')
                line.replace('K', 'N')
                line.replace('Y', 'N')
                line.replace('W', 'N')
                line.replace('R', 'N')
                line.replace('S', 'N')

Run Code Online (Sandbox Code Playgroud)

Answer 1

Joã*_*ela 7

一些说明:

string.replace并且re.sub不在原位,因此您应该将返回值分配回变量.
glob.glob 更适合在匹配定义模式的目录中查找文件...
也许你应该在创建之前检查目录是否已经存在(我只是假设这个,这可能不是你想要的行为)
该with声明负责以安全的方式关闭文件.如果你不想使用它,你必须使用try finally.
在你的例子中你忘了把sufix *.clean;)
你实际上没有写文件的地方,你可以像我在我的例子中那样做或者使用fileinput模块(直到今天我都不知道)

这是我的例子:

import re
import os
import glob

source_dir=os.getcwd()
target_dir="clean"
source_files = [fname for fname in glob.glob(os.path.join(source_dir,"*.seq"))]

# check if target directory exists... if not, create it.
if not os.path.exists(target_dir):
    os.makedirs(target_dir)

for source_file in source_files:
   target_file = os.path.join(target_dir,os.path.basename(source_file)+".clean")
   with open(source_file,'r') as sfile:
      with open(target_file,'w') as tfile:
         lines = sfile.readlines()
         # do the replacement in the second line.
         # (remember that arrays are zero indexed)
         lines[1]=re.sub("K|Y|W|M|R|S",'N',lines[1])
         tfile.writelines(lines)

print "DONE"

Run Code Online (Sandbox Code Playgroud)

希望能帮助到你.

Answer 2

MAK*_*MAK 5

此时应更换line.replace('M', 'N')使用line=line.replace('M', 'N').replace返回原始字符串的副本,并替换相关的子字符串.

更好的方法(IMO)是使用re.

import re

line="ABCDEFGHIJKLMNOPQRSTUVWXYZ"
line=re.sub("K|Y|W|M|R|S",'N',line)
print line

Run Code Online (Sandbox Code Playgroud)

Answer 3

bal*_*pha 5

以下是一些一般提示：

不要用于find检查文件扩展名（例如，这也将匹配“ file1.seqdata.xls”）。至少使用file.endswith('seq')，或者更好的是，os.path.splitext(file)[1]
事实上，不要完全这样做。这就是你想要的：
```
import glob
seq_files = glob.glob("*.seq")
```
Run Code Online (Sandbox Code Playgroud)

不要复制文件，仅使用一个循环会更容易：

for filename in seq_files:
    in_file = open(filename)
    out_file = open(os.path.join("clean", filename), "w")
    # now read lines from in_file and write lines to out_file

Run Code Online (Sandbox Code Playgroud)

不要使用line.__contains__('>'). 你的意思是
```
if '>' in line:
```
Run Code Online (Sandbox Code Playgroud)
（这将__contains__在内部调用）。但实际上，您想知道该行是否以“>”开头，而不是该行中是否有一个“>”开头，无论是否在开头。所以更好的方法是这样的：
```
if line.startswith(">"):
```
Run Code Online (Sandbox Code Playgroud)
我不熟悉您的文件类型；如果">"检查确实只是为了确定第一行，那么有更好的方法可以做到这一点。
你不需要这个if块（你只需要pass）。写起来比较干净
```
if not something:
    do_things()
other_stuff()
```
Run Code Online (Sandbox Code Playgroud)
代替
```
if something:
    pass
else:
    do_things()
other_stuff()
```
Run Code Online (Sandbox Code Playgroud)

祝你学习 Python 愉快！

归档时间：	16 年，2 月前
查看次数：	40189 次
最近记录：	16 年，2 月前