我需要在 python 中读取一个 .dat 文件,它总共有 12 列和数百万行行。我需要将第 2,3 和第 4 列与第 1 列分开进行计算。因此,在加载该 .dat 文件之前,是否需要删除所有其他不需要的列?如果没有,我如何有选择地声明该列并让 python 进行数学计算?
.dat 文件的一个例子是 data.dat
我是 python 的新手,所以如果能提供一些打开、阅读和计算的说明,我们将不胜感激。
我已经根据您的建议添加了我用作初学者的代码:
from sys import argv
import pandas as pd
script, filename = argv
txt = open(filename)
print "Here's your file %r:" % filename
print txt.read()
def your_func(row):
return row['x-momentum'] / row['mass']
columns_to_keep = ['mass', 'x-momentum']
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
Run Code Online (Sandbox Code Playgroud)
还有我遇到的错误:
Traceback (most recent call last):
File "flash.py", line 18, in <module>
dataframe = pd.read_csv('~/Pictures', delimiter="," , usecols=columns_to_keep)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 529, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 295, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 612, in __init__
self._make_engine(self.engine)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/trina/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1119, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 518, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5030)
ValueError: No columns to parse from file
Run Code Online (Sandbox Code Playgroud)
查看您的flash.dat文件后,很明显您需要在处理之前进行一些清理。以下代码将其转换为 CSV 文件:
import csv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]
# write it as a new CSV file
with open("./flash.csv", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
Run Code Online (Sandbox Code Playgroud)
现在,使用 Pandas 计算新列。
import pandas as pd
def your_func(row):
return row['x-momentum'] / row['mass']
columns_to_keep = ['#time', 'x-momentum', 'mass']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)
print dataframe
Run Code Online (Sandbox Code Playgroud)