Python:重新格式化一组文本文件的简洁/优雅方式?

use*_*609 2 python ascii process

我编写了一个python脚本来处理给定目录中的一组ASCII文件.我想知道是否有更简洁和/或"pythonesque"方式来做,而不会失去可读性?

Python代码

import os
import fileinput
import glob
import string

indir='./'
outdir='./processed/'

for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
    fin=open(indir+filename,'r')   # input file
    fout=open(outdir+filename,'w') # out: processed file

    lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
    fout.write(next(lines)) # just copy the first line (the header) to output

    for line in lines:
        val=iter(string.split(line,' '))
        fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
        for x in val: # iterate over the rest of the numbers in the line
            fout.write('{0:10.6f}'.format(float(val.next()))),  # the rest of the values in the line has a different format 
        fout.write('\n')

    fin.close()
    fout.close()
Run Code Online (Sandbox Code Playgroud)

一个例子:

输入:

;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398
Run Code Online (Sandbox Code Playgroud)

处理:

;;; This line is the header line
-5.00  1.003466  0.786494  0.437988  0.087808
-4.99  1.002548  0.785774  0.437586  0.087727
-4.98  1.001632  0.785055  0.437185  0.087647
-4.97  1.000717  0.784338  0.436785  0.087567
-4.96  0.999805  0.783622  0.436386  0.087486
Run Code Online (Sandbox Code Playgroud)

And*_*lke 5

除了一些细微的变化,由于Python随时间的变化,这看起来很好.

你混合了两种不同风格的next(); 旧的方式是it.next(),新的是下一个(它).您应该使用字符串方法split()而不是通过字符串模块(该模块主要用于向后兼容Python 1.x).没有必要使用通过几乎无用的"fileinput"模块,因为打开文件句柄也是迭代器(该模块来自Python文件句柄之前的迭代器.)

编辑:正如@codeape所指出的,glob()返回完整路径.如果indir不是"./",那么你的代码将不起作用.我已经更改了以下内容以使用正确的listdir/os.path.join解决方案.我也比字符串格式更熟悉"%"字符串插值.

这是我在更惯用的现代Python中写这个的方法

def reformat(fin, fout):
    fout.write(next(fin)) # just copy the first line (the header) to output
    for line in fin:
        fields = line.split(' ')

        # Make a format header specific to the number of fields
        fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'

        fout.write(fmt % tuple(map(float, fields)))

basenames = os.listdir(indir)  # get a list of input ASCII files to be processed
for basename in basenames:
    input_filename = os.path.join(indir, basename)
    output_filename = os.path.join(outdir, basename)
    with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
        reformat(fin, fout)
Run Code Online (Sandbox Code Playgroud)

Python的禅是"应该有一个 - 最好只有一个 - 明显的方式".有趣的是,在过去的10多年里,你的功能是"显然"是正确的解决方案,但不再是.:)