我有一个txt文件,其中包含此格式的数据。前三行重复一遍又一遍。
name=1
grade=A
class=B
name=2
grade=D
class=A
Run Code Online (Sandbox Code Playgroud)
我想以表格格式输出数据,例如:
name | grade | class
1 | A | B
2 | D | A
Run Code Online (Sandbox Code Playgroud)
我正在努力设置标头并仅遍历数据。到目前为止,我尝试过的是:
def myfile(filename):
with open(file1) as f:
for line in f:
yield line.strip().split('=',1)
def pprint_df(dframe):
print(tabulate(dframe, headers="keys", tablefmt="psql", showindex=False,))
#f = pd.DataFrame(myfile('file1')
df = pd.DataFrame(myfile('file1'))
pprint_df(df)
Run Code Online (Sandbox Code Playgroud)
该输出是
def myfile(filename):
with open(file1) as f:
for line in f:
yield line.strip().split('=',1)
def pprint_df(dframe):
print(tabulate(dframe, headers="keys", tablefmt="psql", showindex=False,))
#f = pd.DataFrame(myfile('file1')
df = pd.DataFrame(myfile('file1'))
pprint_df(df)
Run Code Online (Sandbox Code Playgroud)
并不是我真正想要的。
您可以使用熊猫来读取文件并处理数据。您可以使用此:
import pandas as pd
df = pd.read_table(r'file.txt', header=None)
new = df[0].str.split("=", n=1, expand=True)
new['index'] = new.groupby(new[0])[0].cumcount()
new = new.pivot(index='index', columns=0, values=1)
Run Code Online (Sandbox Code Playgroud)
new 输出:
0 class grade name
index
0 B A 1
1 A D 2
Run Code Online (Sandbox Code Playgroud)
小智 5
我知道您有足够的答案,但这是使用字典的另一种方法:
import pandas as pd
from collections import defaultdict
d = defaultdict(list)
with open("text_file.txt") as f:
for line in f:
(key, val) = line.split('=')
d[key].append(val.replace('\n', ''))
df = pd.DataFrame(d)
print(df)
Run Code Online (Sandbox Code Playgroud)
这将为您提供以下输出:
name grade class
0 1 A B
1 2 D A
Run Code Online (Sandbox Code Playgroud)
只是换个角度。
小智 3
此解决方案假定文本格式如您所描述的那样,但您可以修改它以使用不同的单词来表示新行的开头。在这里,我们假设新行以该name字段开始。我在myfile()下面修改了你的函数,希望它能给你一些想法:)
def myfile(filename):
d_list = []
with open(filename) as f:
d_line = {}
for line in f:
split_line = line.rstrip("\n").split('=') # Strip \n characters and split field and value.
if (split_line[0] == 'name'):
if d_line:
d_list.append(d_line) # Append if there is previous line in d_line.
d_line = {split_line[0]: split_line[1]} # Start a new dictionary to collect the next lines.
else:
d_line[split_line[0]] = split_line[1] # Add the other 2 fields to the dictionary.
d_list.append(d_line) # Append the last line.
return pd.DataFrame(d_list) # Turn the list of dictionaries into a DataFrame.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
147 次 |
| 最近记录: |