使用pandas处理实时传入数据的最佳/ pythonic方法是哪种?
每隔几秒钟我就会收到以下格式的数据点:
{'time' :'2013-01-01 00:00:00', 'stock' : 'BLAH',
'high' : 4.0, 'low' : 3.0, 'open' : 2.0, 'close' : 1.0}
Run Code Online (Sandbox Code Playgroud)
我想将它附加到现有的DataFrame,然后对其进行一些分析.
问题是,只是使用DataFrame.append追加行可能导致所有复制的性能问题.
一些人建议预先分配一个大的DataFrame并在数据进入时更新它:
In [1]: index = pd.DatetimeIndex(start='2013-01-01 00:00:00', freq='S', periods=5)
In [2]: columns = ['high', 'low', 'open', 'close']
In [3]: df = pd.DataFrame(index=t, columns=columns)
In [4]: df
Out[4]:
high low open close
2013-01-01 00:00:00 NaN NaN NaN NaN
2013-01-01 00:00:01 NaN NaN NaN NaN
2013-01-01 00:00:02 NaN NaN NaN NaN
2013-01-01 00:00:03 NaN NaN NaN NaN
2013-01-01 …Run Code Online (Sandbox Code Playgroud) 如果我有一个空的数据帧:
columns = ['Date', 'Name', 'Action','ID']
df = pd.DataFrame(columns=columns)
Run Code Online (Sandbox Code Playgroud)
有没有办法将新行附加到这个新创建的数据框?目前我必须创建一个字典,填充它,然后在最后将字典附加到数据框.有更直接的方式吗?
您好,我正在尝试使用 pandas 在当前数据的每行后面插入 3 个空行,然后导出数据。例如,当前数据样本可以是:
name profession
Bill cashier
Sam stock
Adam security
Run Code Online (Sandbox Code Playgroud)
理想情况下我想要实现的目标:
name profession
Bill cashier
Nan Nan
Nan Nan
Nan Nan
Sam stock
Nan Nan
Nan Nan
Nan Nan
Adam security
Nan Nan
Nan Nan
Nan Nan
Run Code Online (Sandbox Code Playgroud)
我已经尝试过 itertools,但是我不确定如何使用此方法在每行之后精确地获取三个空行。任何帮助、指导、样品都将不胜感激!
我无法想出一种方法来进一步减少该程序的内存使用量.这是我迄今为止最有效的实现:
columns = ['eventName', 'sessionId', "eventTime", "items", "currentPage", "browserType"]
df = pd.DataFrame(columns=columns)
l = []
for i, file in enumerate(glob.glob("*.log")):
print("Going through log file #%s named %s..." % (i+1, file))
with open(file) as myfile:
l += [json.loads(line) for line in myfile]
tempdata = pd.DataFrame(l)
for column in tempdata.columns:
if not column in columns:
try:
tempdata.drop(column, axis=1, inplace=True)
except ValueError:
print ("oh no! We've got a problem with %s column! It don't exist!" % (badcolumn))
l = []
df = df.append(tempdata, …Run Code Online (Sandbox Code Playgroud)