ver*_*era 6 python types resampling pandas reindex
我正在使用pandas 0.17.0并且df与此类似:
df.head()
Out[339]:
A B C
DATE_TIME
2016-10-08 13:57:00 in 5.61 1
2016-10-08 14:02:00 in 8.05 1
2016-10-08 14:07:00 in 7.92 0
2016-10-08 14:12:00 in 7.98 0
2016-10-08 14:17:00 out 8.18 0
df.tail()
Out[340]:
A B C
DATE_TIME
2016-11-08 13:42:00 in 8.00 0
2016-11-08 13:47:00 in 7.99 0
2016-11-08 13:52:00 out 7.97 0
2016-11-08 13:57:00 in 8.14 1
2016-11-08 14:02:00 in 8.16 1
Run Code Online (Sandbox Code Playgroud)
以下内容dtypes:
print (df.dtypes)
A object
B float64
C int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
当我重新索引我df的分钟间隔时,所有列都会int64更改为float64.
index = pd.date_range(df.index[0], df.index[-1], freq="min")
df2 = df.reindex(index)
print (df2.dtypes)
A object
B float64
C float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
另外,如果我尝试重新采样
df3 = df.resample('Min')
这int64将成为一个float64由于某种原因我松开我的object专栏.
print (df3.dtypes)
print (df3.dtypes)
B float64
C float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
由于我希望在后续步骤(在将其df与另一个连接之后df)中根据这种区别对列进行不同的插值,因此我需要它们来维护它们的原始dtype.我的实际上df每种类型的列都多得多,因此我正在寻找一种不依赖于按标签单独调用列的解决方案.
有没有办法在dtype整个重建索引中保持它们?或者是否有一种方法可以在dtype以后分配它们(它们是除了NAN之外仅包含整数的唯一列)?有谁能够帮我?
这是不可能的,因为如果你NaN在某个列中得到至少一个值,int则转换为float.
index = pd.date_range(df.index[0], df.index[-1], freq="min")
df2 = df.reindex(index)
print (df2)
A B C
2016-10-08 13:57:00 in 5.61 1.0
2016-10-08 13:58:00 NaN NaN NaN
2016-10-08 13:59:00 NaN NaN NaN
2016-10-08 14:00:00 NaN NaN NaN
2016-10-08 14:01:00 NaN NaN NaN
2016-10-08 14:02:00 in 8.05 1.0
2016-10-08 14:03:00 NaN NaN NaN
2016-10-08 14:04:00 NaN NaN NaN
2016-10-08 14:05:00 NaN NaN NaN
2016-10-08 14:06:00 NaN NaN NaN
2016-10-08 14:07:00 in 7.92 0.0
2016-10-08 14:08:00 NaN NaN NaN
2016-10-08 14:09:00 NaN NaN NaN
2016-10-08 14:10:00 NaN NaN NaN
2016-10-08 14:11:00 NaN NaN NaN
2016-10-08 14:12:00 in 7.98 0.0
2016-10-08 14:13:00 NaN NaN NaN
2016-10-08 14:14:00 NaN NaN NaN
2016-10-08 14:15:00 NaN NaN NaN
2016-10-08 14:16:00 NaN NaN NaN
2016-10-08 14:17:00 out 8.18 0.0
print (df2.dtypes)
A object
B float64
C float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
但是,如果使用的参数fill_value中reindex,dtypes都没有改变:
index = pd.date_range(df.index[0], df.index[-1], freq="min")
df2 = df.reindex(index, fill_value=0)
print (df2)
A B C
2016-10-08 13:57:00 in 5.61 1
2016-10-08 13:58:00 0 0.00 0
2016-10-08 13:59:00 0 0.00 0
2016-10-08 14:00:00 0 0.00 0
2016-10-08 14:01:00 0 0.00 0
2016-10-08 14:02:00 in 8.05 1
2016-10-08 14:03:00 0 0.00 0
2016-10-08 14:04:00 0 0.00 0
2016-10-08 14:05:00 0 0.00 0
2016-10-08 14:06:00 0 0.00 0
2016-10-08 14:07:00 in 7.92 0
2016-10-08 14:08:00 0 0.00 0
2016-10-08 14:09:00 0 0.00 0
2016-10-08 14:10:00 0 0.00 0
2016-10-08 14:11:00 0 0.00 0
2016-10-08 14:12:00 in 7.98 0
2016-10-08 14:13:00 0 0.00 0
2016-10-08 14:14:00 0 0.00 0
2016-10-08 14:15:00 0 0.00 0
2016-10-08 14:16:00 0 0.00 0
2016-10-08 14:17:00 out 8.18 0
print (df2.dtypes)
A object
B float64
C int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
更好的是使用method='ffill在reindex:
index = pd.date_range(df.index[0], df.index[-1], freq="min")
df2 = df.reindex(index, method='ffill')
print (df2)
A B C
2016-10-08 13:57:00 in 5.61 1
2016-10-08 13:58:00 in 5.61 1
2016-10-08 13:59:00 in 5.61 1
2016-10-08 14:00:00 in 5.61 1
2016-10-08 14:01:00 in 5.61 1
2016-10-08 14:02:00 in 8.05 1
2016-10-08 14:03:00 in 8.05 1
2016-10-08 14:04:00 in 8.05 1
2016-10-08 14:05:00 in 8.05 1
2016-10-08 14:06:00 in 8.05 1
2016-10-08 14:07:00 in 7.92 0
2016-10-08 14:08:00 in 7.92 0
2016-10-08 14:09:00 in 7.92 0
2016-10-08 14:10:00 in 7.92 0
2016-10-08 14:11:00 in 7.92 0
2016-10-08 14:12:00 in 7.98 0
2016-10-08 14:13:00 in 7.98 0
2016-10-08 14:14:00 in 7.98 0
2016-10-08 14:15:00 in 7.98 0
2016-10-08 14:16:00 in 7.98 0
2016-10-08 14:17:00 out 8.18 0
print (df2.dtypes)
A object
B float64
C int64
dtype: object
Run Code Online (Sandbox Code Playgroud)
如果使用resample,您可以A通过unstack和返回列stack,但不幸的是仍然存在以下问题float:
df3 = df.set_index('A', append=True)
.unstack()
.resample('Min', fill_method='ffill')
.stack()
.reset_index(level=1)
print (df3)
A B C
DATE_TIME
2016-10-08 13:57:00 in 5.61 1.0
2016-10-08 13:58:00 in 5.61 1.0
2016-10-08 13:59:00 in 5.61 1.0
2016-10-08 14:00:00 in 5.61 1.0
2016-10-08 14:01:00 in 5.61 1.0
2016-10-08 14:02:00 in 8.05 1.0
2016-10-08 14:03:00 in 8.05 1.0
2016-10-08 14:04:00 in 8.05 1.0
2016-10-08 14:05:00 in 8.05 1.0
2016-10-08 14:06:00 in 8.05 1.0
2016-10-08 14:07:00 in 7.92 0.0
2016-10-08 14:08:00 in 7.92 0.0
2016-10-08 14:09:00 in 7.92 0.0
2016-10-08 14:10:00 in 7.92 0.0
2016-10-08 14:11:00 in 7.92 0.0
2016-10-08 14:12:00 in 7.98 0.0
2016-10-08 14:13:00 in 7.98 0.0
2016-10-08 14:14:00 in 7.98 0.0
2016-10-08 14:15:00 in 7.98 0.0
2016-10-08 14:16:00 in 7.98 0.0
2016-10-08 14:17:00 out 8.18 0.0
print (df3.dtypes)
A object
B float64
C float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
我尝试修改以前的答案,以转换为`int:
int_cols = df.select_dtypes(['int64']).columns
print (int_cols)
Index(['C'], dtype='object')
index = pd.date_range(df.index[0], df.index[-1], freq="s")
df2 = df.reindex(index)
for col in df2:
if col == int_cols:
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].interpolate(inplace=True)
else:
df2[col].ffill(inplace=True)
#print (df2)
print (df2.dtypes)
A object
B float64
C int32
dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
494 次 |
| 最近记录: |