Jos*_*der 3 python multi-index pandas
我有一个数据框,它似乎是多索引的一个简单用例:我有 ISO 周数和日期作为索引,我想按特定周进行过滤。按照docs 中的说明 ,看起来我应该能够通过传递一串周数来建立索引。但是,这给了我一个关键错误。
MCVE:
data = {'foo': {('2016_32', '2016-08-07'): 0.14285714285714285,
('2016_32', '2016-08-08'): 0.14285714285714285,
('2016_32', '2016-08-09'): 0.14285714285714285,
('2016_32', '2016-08-10'): 0.14285714285714285,
('2016_32', '2016-08-11'): 0.14285714285714285,
('2016_32', '2016-08-12'): 0.14285714285714285,
('2016_32', '2016-08-13'): 0.14285714285714285,
('2016_36', '2016-09-04'): 0.14285714285714285,
('2016_36', '2016-09-05'): 0.14285714285714285,
('2016_36', '2016-09-06'): 0.14285714285714285,
('2016_36', '2016-09-07'): 0.14285714285714285,
('2016_36', '2016-09-08'): 0.14285714285714285,
('2016_36', '2016-09-09'): 0.14285714285714285},
'bar': {('2016_32', '2016-08-07'): np.nan,
('2016_32', '2016-08-08'): np.nan,
('2016_32', '2016-08-09'): np.nan,
('2016_32', '2016-08-10'): np.nan,
('2016_32', '2016-08-11'): np.nan,
('2016_32', '2016-08-12'): np.nan,
('2016_32', '2016-08-13'): np.nan,
('2016_36', '2016-09-04'): 0.0,
('2016_36', '2016-09-05'): 0.0,
('2016_36', '2016-09-06'): 0.0,
('2016_36', '2016-09-07'): 0.0,
('2016_36', '2016-09-08'): 0.0,
('2016_36', '2016-09-09'): 0.0}}
df = pd.DataFrame(data)
df['2016_32']
KeyError: '2016_32'
Run Code Online (Sandbox Code Playgroud)
一般供选择Multiindex使用DataFrame.xs:
#default first level should be omit
print (df.xs('2016_32'))
#select by second level
#print (df.xs('2016-09-07', level=1))
foo bar
2016-08-07 0.142857 NaN
2016-08-08 0.142857 NaN
2016-08-09 0.142857 NaN
2016-08-10 0.142857 NaN
2016-08-11 0.142857 NaN
2016-08-12 0.142857 NaN
2016-08-13 0.142857 NaN
Run Code Online (Sandbox Code Playgroud)
或者loc:
#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])
Run Code Online (Sandbox Code Playgroud)
列和行中 MultiIndex 中选择的差异:
np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
bar baz foo qux
one two one two one two one two
A E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
B E 0 3 1 7 0 0 8 2
F 6 7 7 4 2 7 7 5
C E 7 3 1 7 3 9 7 3
F 8 2 0 8 5 2 2 0
Run Code Online (Sandbox Code Playgroud)
#select by column bar level
print (df['bar'])
one two
A E 8 1
F 3 1
B E 0 3
F 6 7
C E 7 3
F 8 2
#select by column bar and then by `one`
print (df['bar']['one'])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: one, dtype: int32
#select by tuples for columns select
print (df[('bar', 'one')])
A E 8
F 3
B E 0
F 6
C E 7
F 8
Name: (bar, one), dtype: int32
Run Code Online (Sandbox Code Playgroud)
对于按行选择(索引中的多索引),请使用loc:
print (df.loc['A'])
bar baz foo qux
one two one two one two one two
E 8 1 5 8 3 5 3 3
F 3 1 3 6 6 1 0 2
print (df.loc['A'].loc['F'])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: F, dtype: int32
print (df.loc[('A', 'F')])
bar one 3
two 1
baz one 3
two 6
foo one 6
two 1
qux one 0
two 2
Name: (A, F), dtype: int32
Run Code Online (Sandbox Code Playgroud)