小编Alt*_*ets的帖子

pandas df.loc [z,x] = y如何提高速度？

我已经确定了一个pandas命令

timeseries.loc[z, x] = y

Run Code Online (Sandbox Code Playgroud)

负责迭代中花费的大部分时间.现在我正在寻找更好的方法来加速它.循环覆盖甚至不是50k元素(生产目标是〜250k或更多),但已经需要一个悲伤的20秒.

这是我的代码(忽略上半部分,它只是计时助手)

def populateTimeseriesTable(df, observable, timeseries):
    """
    Go through all rows of df and 
    put the observable into the timeseries 
    at correct row (symbol), column (tsMean).
    """

    print "len(df.index)=", len(df.index)  # show number of rows

    global bf, t
    bf = time.time()                       # set 'before' to now
    t = dict([(i,0) for i in range(5)])    # fill category timing with zeros

    def T(i):
        """
        timing helper: Add passed time to category 'i'. Then set 'before' to now. …

Run Code Online (Sandbox Code Playgroud)

python optimization time-series pandas

Alt*_*ets

2019 10-16

11
推荐指数

2
解决办法

5942
查看次数