BaR*_*Rud 8 split numpy file python-3.x
我试图拆分格式为的文件:
@some
@garbage
@lines
@target G0.S0
@type xy
-0.108847E+02 0.489034E-04
-0.108711E+02 0.491023E-04
-0.108574E+02 0.493062E-04
-0.108438E+02 0.495075E-04
-0.108302E+02 0.497094E-04
....Unknown line numbers...
&
@target G0.S1
@type xy
-0.108847E+02 0.315559E-04
-0.108711E+02 0.316844E-04
-0.108574E+02 0.318134E-04
....Unknown line numbers...
&
@target G1.S0
@type xy
-0.108847E+02 0.350450E-04
-0.108711E+02 0.351669E-04
-0.108574E+02 0.352908E-04
&
@target G1.S1
@type xy
-0.108847E+02 0.216396E-04
-0.108711E+02 0.217122E-04
-0.108574E+02 0.217843E-04
-0.108438E+02 0.218622E-04
Run Code Online (Sandbox Code Playgroud)
这种@target Gx.Sy组合是独一无二的,每组数据总是被定义&.
我已设法将文件拆分为块:
#!/usr/bin/env python3
import os
import sys
import itertools as it
import numpy as np
import matplotlib.pyplot as plt
try:
filename = sys.argv[1]
print(filename)
except IndexError:
print("ERROR: Required filename not provided")
with open(filename, "r") as f:
for line in f:
if line.startswith("@target"):
print(line.split()[-1].split("."))
x=[];y=[]
with open(filename, "r") as f:
for key,group in it.groupby(f,lambda line: line.startswith('@target')):
print(key)
if not key:
group = list(group)
group.pop(0)
# group.pop(-1)
print(group)
for i in range(len(group)):
x.append(group[i].split()[0])
y.append(group[i].split()[1])
nx=np.array(x)
ny=np.array(y)
Run Code Online (Sandbox Code Playgroud)
我有两个问题:
1)真实数据之前的前导码行也被分组,因此如果有任何前导码,则脚本不起作用.无法预测将会有多少行; 但是我想在和之后分组@target
2)我想将数组命名为G0 [S0,S0]和G1 [S1,S2]; 但我不能这样做.
请帮助
更新:我试图将这些数据存储在G0 [S0,S1,...],G1 [S0,S1,..]的嵌套np数组中,以便我可以在matplotlib中使用它.
下面的函数可以完成工作:
import numpy as np
from collections import defaultdict
def read_without_preamble(filename):
with open(filename, 'r') as f:
lines = f.readlines()
for i, line in enumerate(lines):
if line.startswith('@target'):
return lines[i:]
def split_into_chunks(lines):
chunks = defaultdict(dict)
for line in lines:
if line.startswith('@target'):
GS_str = line.strip().split()[-1].split('.')
G, S = map(lambda x: int(x[1:]), GS_str)
chunks[G][S] = []
elif line.startswith('@type xy'):
pass
elif line.startswith('&'):
chunks[G][S] = np.asarray(chunks[G][S])
else:
xy_str = line.strip().split()
chunks[G][S].append(map(float, xy_str))
return chunks
Run Code Online (Sandbox Code Playgroud)
要将文件分割成块,您只需运行以下代码:
try:
filename = sys.argv[1]
print(filename)
except IndexError:
print("ERROR: Required filename not provided")
data = read_without_preamble(filename)
chunks = split_into_chunks(data)
Run Code Online (Sandbox Code Playgroud)
chunks是一个字典,其中的键是G( 或0)1:
In [415]: type(chunks)
Out[415]: dict
In [416]: for k in chunks.keys(): print(k)
0
1
Run Code Online (Sandbox Code Playgroud)
Dictionary 的值chunks是另一个字典,其中键是S(本例中为0、1或2),值是包含 的数值数据的 NumPy 数组Gi.Sn。您可以像这样访问此数据块:chunks[i][n],其中索引i和分别是和n的值。GS
In [417]: type(chunks[0])
Out[417]: dict
In [418]: for k in chunks[0].keys(): print(k)
0
1
2
In [419]: type(chunks[1][2])
Out[419]: numpy.ndarray
In [420]: chunks[1][2]
Out[420]:
array([[ -1.08851000e+01, 2.53058000e-05],
[ -1.08715000e+01, 2.55353000e-05],
[ -1.08579000e+01, 2.57745000e-05],
[ -1.08443000e+01, 2.60225000e-05],
[ -1.08306000e+01, 2.62617000e-05],
[ -1.08170000e+01, 2.65097000e-05],
[ -1.08034000e+01, 2.67666000e-05]])
Run Code Online (Sandbox Code Playgroud)
chunks[i][n].shape[0]适用2于任何i和n,但chunks[i][n].shape[1]可以取任何值,即数字数据的行数可能因块而异。
这是我在示例运行中使用的文件。它由六个块组成,即G0.S0、G0.S1、G0.S2、G1.S0、G1.S1和G1.S2。
@some
@garbage
@lines
@target G0.S0
@type xy
-0.108851E+02 0.127435E-03
-0.108715E+02 0.127829E-03
-0.108579E+02 0.128191E-03
-0.108443E+02 0.128502E-03
-0.108306E+02 0.128726E-03
-0.108170E+02 0.128838E-03
-0.108034E+02 0.128751E-03
&
@target G0.S1
@type xy
-0.108851E+02 0.472694E-04
-0.108715E+02 0.474233E-04
-0.108579E+02 0.475837E-04
-0.108443E+02 0.477448E-04
-0.108306E+02 0.479052E-04
-0.108170E+02 0.480669E-04
-0.108034E+02 0.482279E-04
&
@target G0.S2
@type xy
-0.108851E+02 0.253654E-04
-0.108715E+02 0.255956E-04
-0.108579E+02 0.258346E-04
-0.108443E+02 0.260825E-04
-0.108306E+02 0.263303E-04
-0.108170E+02 0.265781E-04
-0.108034E+02 0.268349E-04
&
@target G1.S0
@type xy
-0.108851E+02 0.108786E-03
-0.108715E+02 0.109216E-03
-0.108579E+02 0.109651E-03
-0.108443E+02 0.110116E-03
-0.108306E+02 0.110552E-03
-0.108170E+02 0.111011E-03
-0.108034E+02 0.111489E-03
&
@target G1.S1
@type xy
-0.108851E+02 0.278045E-04
-0.108715E+02 0.278711E-04
-0.108579E+02 0.279384E-04
-0.108443E+02 0.280050E-04
-0.108306E+02 0.280723E-04
-0.108170E+02 0.281395E-04
-0.108034E+02 0.282074E-04
&
@target G1.S2
@type xy
-0.108851E+02 0.253058E-04
-0.108715E+02 0.255353E-04
-0.108579E+02 0.257745E-04
-0.108443E+02 0.260225E-04
-0.108306E+02 0.262617E-04
-0.108170E+02 0.265097E-04
-0.108034E+02 0.267666E-04
&
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
743 次 |
| 最近记录: |