我有一个相当复杂的预测代码,每列使用wls超过20列和数百万数据.现在我使用iterrow循环日期,然后根据这些日期中的日期和值,提取不同大小的数据进行计算.在我的制作中运行需要几个小时,我将代码简化为以下内容:
import pandas as pd
import numpy as np
from datetime import timedelta
df=pd.DataFrame(np.random.randn(1000,2), columns=list('AB'))
df['dte'] = pd.date_range('9/1/2014', periods=1000, freq='D')
def calculateC(A, dte):
if A>0: #based on values has different cutoff length for trend prediction
depth=10
else:
depth=20
lastyear=(dte-timedelta(days=365))
df2=df[df.dte<lastyear].head(depth) #use last year same date data for basis of prediction
return df2.B.mean() #uses WLS in my model but for simplification replace with mean
for index, row in df.iterrows():
if index>365:
df.loc[index,'C']=calculateC(row.A, row.dte)
Run Code Online (Sandbox Code Playgroud)
我读到iterrow是主要原因,因为它不是使用Pandas的有效方法,我应该使用vector方法.但是,我似乎无法根据条件(日期,不同长度和值范围)找到一种向量的方法.有办法吗?
我会尝试 pandas.DataFrame.apply(func, axis=1)
def calculateC2(row):
if row.name >365: # row.name is the index of the row
if row.A >0: #based on values has different cutoff length for trend prediction
depth=10
else:
depth=20
lastyear=(row.dte-timedelta(days=365))
df2=df[df.dte<lastyear].B.head(depth) #use last year same date data for basis of prediction
print row.name,np.mean(df2) #uses WLS in my model but for simplification replace with mean
df.apply(calculateC2,axis=1)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
528 次 |
| 最近记录: |