熊猫：从多索引中的日期中选择

Question

熊猫：从多索引中的日期中选择

假设我有 MultiIndex 系列

date        foo
2006-01-01  1         12931926.310
            3         11084049.460
            5         10812205.359
            7          9031510.239
            9          5324054.903
2007-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915

Run Code Online (Sandbox Code Playgroud)

如果它不是 MultiIndex，我可以选择那些带有 year2007到df.loc['2007']. 我在这里怎么做？我的自然猜测是df.loc['2007', :]，但这给了我一个空洞Series([], name: FINLWT21, dtype: float64)。

最终目标

最终，我也有兴趣替换不同日期的所有行而不是2007年份的行2007。

也就是说，我的预期输出是

date        foo
2006-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915
2007-01-01  1         11086082.624
            3         12028419.560
            5         11957253.031
            7         10643307.061
            9          6034854.915

Run Code Online (Sandbox Code Playgroud)

我试图实施@unutbu 的解决方案，但是

mySeries.loc[dateIndex.year != 2007] = mySeries.loc[dateIndex.year == 2007]

Run Code Online (Sandbox Code Playgroud)

将自然地将值（由于 RHS 上不存在）设置为NaN. 通常，这些问题由

mySeries.loc[dateIndex.year != 2007] = mySeries.loc[dateIndex.year == 2007].values

Run Code Online (Sandbox Code Playgroud)

，但鉴于我10在左侧有值（在我的真实数据集中有更多值），但只有5在右侧，我得到

ValueError: cannot set using a list-like indexer with a different length than the value

Run Code Online (Sandbox Code Playgroud)

我现在想到的唯一替代方法是迭代第一个索引，然后对每个子组使用上一个命令，但这似乎不是最有效的解决方案。

Answer 1

unu*_*tbu 5

鉴于系列

In [207]: series
Out[212]: 
date        foo
2006-01-01  1      12931926.310
            3      11084049.460
            5      10812205.359
            7       9031510.239
            9       5324054.903
2007-01-01  1      11086082.624
            3      12028419.560
            5      11957253.031
            7      10643307.061
            9       6034854.915
Name: val, dtype: float64

Run Code Online (Sandbox Code Playgroud)

你可以date用

dateindex = series.index.get_level_values('date')
# Ensure the dateindex is a DatetimeIndex (as opposed to a plain Index)
dateindex = pd.DatetimeIndex(dateindex)

Run Code Online (Sandbox Code Playgroud)

现在可以使用布尔条件选择年份等于 2007 的行：

# select rows where year equals 2007
series2007 = series.loc[dateindex.year == 2007]

Run Code Online (Sandbox Code Playgroud)

如果foo每个日期的值以相同的顺序循环通过相同的值，那么您可以将系列中的所有值替换为 2007 年的值

N = len(series)/len(series2007)
series[:] = np.tile(series.loc[dateindex.year == 2007].values, N)

Run Code Online (Sandbox Code Playgroud)

使用np.tileand 的一个优点.values是它会相对快速地生成所需的值数组。一个（可能的）缺点是这忽略了索引，因此它依赖于foo值在每个日期以相同顺序循环通过相同值的假设。

更健壮（但更慢）的方法是使用连接：

df = series.reset_index('date')
df2007 = df.loc[dateindex.year==2007]
df = df.join(df2007, rsuffix='_2007')
df = df[['date', 'val_2007']]
df = df.set_index(['date'], append=True)
df = df.swaplevel(0,1).sort_index()

Run Code Online (Sandbox Code Playgroud)

产量

In [304]: df.swaplevel(0,1).sort_index()
Out[304]: 
                    val_2007
date       foo              
2006-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915
2007-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915
2008-01-01 1    11086082.624
           3    12028419.560
           5    11957253.031
           7    10643307.061
           9     6034854.915

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，9 月前
查看次数：	2635 次
最近记录：	8 年，2 月前