将重复的“键=值”对的文件读入DataFrame

Question

将重复的“键=值”对的文件读入DataFrame

我有一个txt文件，其中包含此格式的数据。前三行重复一遍又一遍。

name=1
grade=A
class=B
name=2
grade=D
class=A

Run Code Online (Sandbox Code Playgroud)

我想以表格格式输出数据，例如：

name | grade | class
1    | A     | B
2    | D     | A

Run Code Online (Sandbox Code Playgroud)

我正在努力设置标头并仅遍历数据。到目前为止，我尝试过的是：

def myfile(filename):
    with open(file1) as f:
        for line in f:
            yield line.strip().split('=',1)

def pprint_df(dframe):
    print(tabulate(dframe, headers="keys", tablefmt="psql", showindex=False,))

#f = pd.DataFrame(myfile('file1')
df = pd.DataFrame(myfile('file1'))
pprint_df(df)

Run Code Online (Sandbox Code Playgroud)

该输出是

def myfile(filename):
    with open(file1) as f:
        for line in f:
            yield line.strip().split('=',1)

def pprint_df(dframe):
    print(tabulate(dframe, headers="keys", tablefmt="psql", showindex=False,))

#f = pd.DataFrame(myfile('file1')
df = pd.DataFrame(myfile('file1'))
pprint_df(df)

Run Code Online (Sandbox Code Playgroud)

并不是我真正想要的。

Answer 1

lui*_*igi 7

您可以使用熊猫来读取文件并处理数据。您可以使用此：

import pandas as pd
df = pd.read_table(r'file.txt', header=None)
new = df[0].str.split("=", n=1, expand=True)
new['index'] = new.groupby(new[0])[0].cumcount()
new = new.pivot(index='index', columns=0, values=1)

Run Code Online (Sandbox Code Playgroud)

new 输出：

0     class grade name
index                 
0         B     A    1
1         A     D    2

Run Code Online (Sandbox Code Playgroud)

Answer 2

小智 5

我知道您有足够的答案，但这是使用字典的另一种方法：

import pandas as pd
from collections import defaultdict
d = defaultdict(list)

with open("text_file.txt") as f:
    for line in f:
        (key, val) = line.split('=')
        d[key].append(val.replace('\n', ''))

df = pd.DataFrame(d)
print(df)

Run Code Online (Sandbox Code Playgroud)

这将为您提供以下输出：

name grade class
0    1     A     B
1    2     D     A

Run Code Online (Sandbox Code Playgroud)

只是换个角度。

Answer 3

小智 3

此解决方案假定文本格式如您所描述的那样，但您可以修改它以使用不同的单词来表示新行的开头。在这里，我们假设新行以该name字段开始。我在myfile()下面修改了你的函数，希望它能给你一些想法:)

def myfile(filename):
    d_list = []
    with open(filename) as f:
        d_line = {}
        for line in f:
            split_line = line.rstrip("\n").split('=')  # Strip \n characters and split field and value.
            if (split_line[0] == 'name'):
                if d_line:
                    d_list.append(d_line)  # Append if there is previous line in d_line.
                d_line = {split_line[0]: split_line[1]}  # Start a new dictionary to collect the next lines.
            else:
                d_line[split_line[0]] = split_line[1]  # Add the other 2 fields to the dictionary.
        d_list.append(d_line) # Append the last line.
    return pd.DataFrame(d_list)  # Turn the list of dictionaries into a DataFrame.

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，3 月前
查看次数：	147 次
最近记录：	6 年，3 月前