我是Python的新手.我知道这已经被问到了,我道歉,但这种新情况的不同之处在于字符串之间的空格不相等.我有一个名为coord的文件,它包含以下空格分隔的字符串:
1 C 6.00 0.000000000 1.342650315 0.000000000
2 C 6.00 0.000000000 -1.342650315 0.000000000
3 C 6.00 2.325538562 2.685300630 0.000000000
4 C 6.00 2.325538562 -2.685300630 0.000000000
5 C 6.00 4.651077125 1.342650315 0.000000000
6 C 6.00 4.651077125 -1.342650315 0.000000000
7 C 6.00 -2.325538562 2.685300630 0.000000000
8 C 6.00 -2.325538562 -2.685300630 0.000000000
9 C 6.00 -4.651077125 1.342650315 0.000000000
10 C 6.00 -4.651077125 -1.342650315 0.000000000
11 H 1.00 2.325538562 4.733763602 0.000000000
12 H 1.00 2.325538562 -4.733763602 0.000000000
13 H 1.00 -2.325538562 4.733763602 0.000000000
14 H 1.00 -2.325538562 -4.733763602 0.000000000
15 H 1.00 6.425098097 2.366881801 0.000000000
16 H 1.00 6.425098097 -2.366881801 0.000000000
17 H 1.00 -6.425098097 2.366881801 0.000000000
18 H 1.00 -6.425098097 -2.366881801 0.000000000
Run Code Online (Sandbox Code Playgroud)
请注意第一列中每个字符串开头之前的空格.所以为了将它转换为csv,我尝试了以下内容:
with open('coord') as infile, open('coordv', 'w') as outfile:
outfile.write(infile.read().replace(" ", ", "))
# Unneeded columns are deleted from the csv
input = open('coordv', 'rb')
output = open('coordcsvout', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
if row:
writer.writerow(row)
input.close()
output.close()
with open("coordcsvout","rb") as source:
rdr= csv.reader( source )
with open("coordbarray","wb") as result:
wtr= csv.writer(result)
for r in rdr:
wtr.writerow( (r[5], r[6], r[7]) )
Run Code Online (Sandbox Code Playgroud)
当我运行脚本时,我在脚本的第一部分获得了coordv的以下内容,这当然是非常错误的:
, 1, C, , , 6.00, , 0.000000000, , 1.342650315, , 0.000000000
, 2, C, , , 6.00, , 0.000000000, -1.342650315, , 0.000000000
, 3, C, , , 6.00, , 2.325538562, , 2.685300630, , 0.000000000
, 4, C, , , 6.00, , 2.325538562, -2.685300630, , 0.000000000
, 5, C, , , 6.00, , 4.651077125, , 1.342650315, , 0.000000000
, 6, C, , , 6.00, , 4.651077125, -1.342650315, , 0.000000000
, 7, C, , , 6.00, -2.325538562, , 2.685300630, , 0.000000000
, 8, C, , , 6.00, -2.325538562, -2.685300630, , 0.000000000
, 9, C, , , 6.00, -4.651077125, , 1.342650315, , 0.000000000
, 10, C, , , 6.00, -4.651077125, -1.342650315, , 0.000000000
, 11, H, , , 1.00, , 2.325538562, , 4.733763602, , 0.000000000
, 12, H, , , 1.00, , 2.325538562, -4.733763602, , 0.000000000
, 13, H, , , 1.00, -2.325538562, , 4.733763602, , 0.000000000
, 14, H, , , 1.00, -2.325538562, -4.733763602, , 0.000000000
, 15, H, , , 1.00, , 6.425098097, , 2.366881801, , 0.000000000
, 16, H, , , 1.00, , 6.425098097, -2.366881801, , 0.000000000
, 17, H, , , 1.00, -6.425098097, , 2.366881801, , 0.000000000
, 18, H, , , 1.00, -6.425098097, -2.366881801, , 0.000000000
Run Code Online (Sandbox Code Playgroud)
我在.replace尝试了不同的可能性,没有任何成功,到目前为止,我还没有找到任何有关如何做到这一点的信息来源.从这个coord文件中获取逗号分隔值的最佳方法是什么?我感兴趣的是在python中使用csv模块选择第4:6列,最后使用numpy导入它们,如下所示:
from numpy import genfromtxt
cocmatrix = genfromtxt('input', delimiter=',')
Run Code Online (Sandbox Code Playgroud)
如果有人可以帮我解决这个问题,我会很高兴的.
the*_*olf 12
你可以使用csv:
import csv
with open(ur_infile) as fin, open(ur_outfile, 'w') as fout:
o=csv.writer(fout)
for line in fin:
o.writerow(line.split())
Run Code Online (Sandbox Code Playgroud)
用这个替换你的第一个位。它不是超级漂亮,但它会给你一个 csv 格式。
with open('coord') as infile, open('coordv', 'w') as outfile:
for line in infile:
outfile.write(" ".join(line.split()).replace(' ', ','))
outfile.write(",") # trailing comma shouldn't matter
Run Code Online (Sandbox Code Playgroud)
如果您希望 outfile 将所有内容都放在不同的行上,您可以outfile.write("\n")在 for 循环的末尾添加
,但我不认为您遵循的代码会像那样使用它。
您可以使用python pandas,我已将您的数据写入data.csv:
import pandas as pd
>>> df = pd.read_csv('data.csv',sep='\s+',header=None)
>>> df
0 1 2 3 4 5
0 1 C 6 0.000000 1.342650 0
1 2 C 6 0.000000 -1.342650 0
2 3 C 6 2.325539 2.685301 0
3 4 C 6 2.325539 -2.685301 0
4 5 C 6 4.651077 1.342650 0
5 6 C 6 4.651077 -1.342650 0
...
Run Code Online (Sandbox Code Playgroud)
关于这一点的好处是访问您可以使用的底层numpy数组df.values:
>>> type(df.values)
<type 'numpy.ndarray'>
Run Code Online (Sandbox Code Playgroud)
要使用逗号分隔符保存数据框:
>>> df.to_csv('data_out.csv',header=None)
Run Code Online (Sandbox Code Playgroud)
Pandas是一个用于管理大量数据的优秀图书馆,作为奖励它可以很好地适应numpy.还有一个非常好的机会,这将比使用该csv模块快得多.