我已经确定了一个pandas命令
timeseries.loc[z, x] = y
Run Code Online (Sandbox Code Playgroud)
负责迭代中花费的大部分时间.现在我正在寻找更好的方法来加速它.循环覆盖甚至不是50k元素(生产目标是〜250k或更多),但已经需要一个悲伤的20秒.
这是我的代码(忽略上半部分,它只是计时助手)
def populateTimeseriesTable(df, observable, timeseries):
"""
Go through all rows of df and
put the observable into the timeseries
at correct row (symbol), column (tsMean).
"""
print "len(df.index)=", len(df.index) # show number of rows
global bf, t
bf = time.time() # set 'before' to now
t = dict([(i,0) for i in range(5)]) # fill category timing with zeros
def T(i):
"""
timing helper: Add passed time to category 'i'. Then set 'before' to now. …Run Code Online (Sandbox Code Playgroud)